Sunday, April 17, 2016

Question Answering datasets

To extend the list of conversational datasets there is a collection of Question Answering (QA) datasets. A question-answer pair is a very short conversation which can be also  used to train chatbots. If you want to use the chatbot for giving information for customers, like automated customer support or automated sales agent on your website, this type of datasets can be particularly useful.

The WikiQA corpus is a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

Usually on TREC (Text REtrieval Conference) there is a QA task which has some kind of datasets associated with it. Most of the datasets are focusing on factoid QA task but the one in 2015 is a kind of live QA. The task was to answer questions on Yahoo Answers.

Manually-generated factoid question/answer pairs with difficulty ratings from Wikipedia articles. Dataset includes articles, questions, and answers.

There are some manually curated QA datasets from Yahoo Answers from Yahoo.

You also can download the Stack Overflow questions and answers. It's a domain specific but huge dataset.

No comments: