|
|
|
|
|
|
|
|
|
|
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. |
A Survey of Current Datasets for Vision and Language Research Ferraro F, Mostafazadeh N, Huang TH, Vanderwende L, Devlin J, Galley M, Mitchell M. |
Allen AI sets - includes, among other sets, Aristo project example data sets
|
Awesome NLP (Keon Kim) Keon Kim - A curated list of resources dedicated to Natural Language Processing (NLP)
|
Awesome NLP datasets |
Conversation AI data sets |
Datasets for Natural Language Processing Karthik Narasimhan |
he Extreme Classification Repository: Multi-label Datasets & Code |
NLP-datasets (Nicolas Iderhoff) Nicolas Iderhoff |
PolyAI conversational datasets |
SensEval & SemEval data |
The LDC Corpus Catalog (top ten datasets) - The LDC's Catalog contains hundreds of corpora of language data, including TIPSTER and Google n-gram collection.
|
|
|
|
|
|
|
|
|
|
|