Amazon QA data - Question and Answer data from Amazon, totaling around 1.4 million answered questions.
Cornell NLVR - is a language grounding dataset. It contains 92,244 pairs of natural language statements grounded in synthetic images. The task is to determine whether a sentence is true or false about an image.
MS MARCO - A Reading Comprehension Dataset for the Artificial Intelligence research community.
NewsQA - a machine reading comprehension data set similar to SQuAD.
NLIWOD - Collection of tools, utilities, datasets and approaches towards realizing natural language interfaces for the Web of Data. Currently focus is on Question Answering (QA) utilities.