English»Data Sets and State-of-the-art (SOTA)»Question Answering (QA) | searchivarius.org
log in | about 

AQUA-RAT (Algebra Question Answering with Rationales)   - algebraic word problems with rationales.
ARC, the AI2 Reasoning Challenge  
CMU Question-Answer Dataset  Noah Smith et al.
Community QA data set  
CoQa   - A Conversational Question Answering Challenge.
FigureQA   - an annotated figure dataset for visual reasoning.
Great Auk knowledge master web-site  
HotpotQA   - A Dataset for Diverse, Explainable Multi-hop Question Answering
Jeopardy! games  
Jimmy Lin's collections   - Various software and data sets, including MIT Aranea, MIT 109 (reusable collection for TREC 2002), and Pourpre scoring script for automatically evaluating complex questions.
LC-QuAD   - a corpus for complex question answering over knowledge graphs (a data set of natural language queries with corresponding SPARQL queries).
MovieQA   - aims to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity.
MS MARCO   - A Reading Comprehension Dataset for the Artificial Intelligence research community.
Open Question Answering Over Curated and Extracted Knowledge Bases from KDD 2014  
QAQC   - Question Answering in Context. A link to data.
Question answering dataset featured in "Teaching Machines to Read and Comprehend"  
Question-Generation Corpus (from MS research)   - This corpus contains candidate fill-in-the-blank questions and answers generated from sentences taken from articles on Wikipedia's listing of vital articles and popular pages, along with ratings of the question quality from multiple judges, as well as unique judge IDs.
Quiz Ball data  
Quiz-Zone   - quality quiz questions.
RecipeQA   - a dataset for multimodal comprehension of cooking recipes. It consists of over 36K question-answer pairs automatically generated from approximately 20K unique recipes with step-by-step instructions and images.
SQuAD   - The Stanford Question Answering Dataset (about 100K QA pairs based on Wikipedia passages)
Text REtrieval Conference (TREC) QA Track  
TriviaQA   - A Large Scale Dataset for Reading Comprehension and Question Answering
Visual Genome   - an ongoing effort to connect structured image concepts to language
WikiSQL   - A large annotated semantic parsing corpus for developing natural language interfaces.