 |
|
|
|
 |
 |
 |
 |
 |
 |
 |
| A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. |
| A Survey of Current Datasets for Vision and Language Research Ferraro F, Mostafazadeh N, Huang TH, Vanderwende L, Devlin J, Galley M, Mitchell M. |
| Allen AI sets - includes, among other sets, Aristo project example data sets
|
| Awesome NLP (Keon Kim) Keon Kim - A curated list of resources dedicated to Natural Language Processing (NLP)
|
| Awesome NLP datasets |
| Conversation AI data sets |
| Datasets for Natural Language Processing Karthik Narasimhan |
| he Extreme Classification Repository: Multi-label Datasets & Code |
| NLP-datasets (Nicolas Iderhoff) Nicolas Iderhoff |
| PolyAI conversational datasets |
| SensEval & SemEval data |
| The LDC Corpus Catalog (top ten datasets) - The LDC's Catalog contains hundreds of corpora of language data, including TIPSTER and Google n-gram collection.
|
|
|
 |
 |
 |
 |
 |
 |
 |
|