English»Data Sets and State-of-the-art (SOTA)»Text Mining and NLP»Catalogs/lists

Directory

A Survey of Current Datasets for Vision and Language Research Ferraro F, Mostafazadeh N, Huang TH, Vanderwende L, Devlin J, Galley M, Mitchell M.

Allen AI sets - includes, among other sets, Aristo project example data sets

Awesome NLP (Keon Kim) Keon Kim - A curated list of resources dedicated to Natural Language Processing (NLP)

The LDC Corpus Catalog (top ten datasets) - The LDC's Catalog contains hundreds of corpora of language data, including TIPSTER and Google n-gram collection.