English»Data Sets and State-of-the-art (SOTA)»Text Mining and NLP»Catalogs/lists | searchivarius.org
log in | about 

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.  
A Survey of Current Datasets for Vision and Language Research  Ferraro F, Mostafazadeh N, Huang TH, Vanderwende L, Devlin J, Galley M, Mitchell M.
Allen AI sets   - includes, among other sets, Aristo project example data sets
Awesome NLP (Keon Kim)  Keon Kim - A curated list of resources dedicated to Natural Language Processing (NLP)
Awesome NLP datasets  
Conversation AI data sets  
Datasets for Natural Language Processing  Karthik Narasimhan
he Extreme Classification Repository: Multi-label Datasets & Code  
NLP-datasets (Nicolas Iderhoff)  Nicolas Iderhoff
PolyAI conversational datasets  
SensEval & SemEval data  
The LDC Corpus Catalog (top ten datasets)   - The LDC's Catalog contains hundreds of corpora of language data, including TIPSTER and Google n-gram collection.