Jimmy Lin's collections - Various software and data sets, including MIT Aranea, MIT 109 (reusable collection for TREC 2002), and Pourpre scoring script for automatically evaluating complex questions.
LC-QuAD - a corpus for complex question answering over knowledge graphs (a data set of natural language queries with corresponding SPARQL queries).
MovieQA - aims to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity.
MS MARCO - A Reading Comprehension Dataset for the Artificial Intelligence research community.
Question-Generation Corpus (from MS research) - This corpus contains candidate fill-in-the-blank questions and answers generated from sentences taken from articles on Wikipedia's listing of vital articles and popular pages, along with ratings of the question quality from multiple judges, as well as unique judge IDs.
RecipeQA - a dataset for multimodal comprehension of cooking recipes. It consists of over 36K question-answer pairs automatically generated from approximately 20K unique recipes with step-by-step instructions and images.
SQuAD - The Stanford Question Answering Dataset (about 100K QA pairs based on Wikipedia passages)