| 100+ Interesting Data Sets for Statistics Robert Seaton |
| 1940 USA census |
| 6 Dataset Lists Curated by Data Scientists |
| A comprehensive list of data sets for machine learning |
| Allen Brain Observatory - standardized in vivo survey of physiological activity in the mouse visual cortex.
|
| Data Depot - DataDepot is a set of tools for collaboratively uploading, sharing, and analyzing data. You can use DataDepot to track personal data, to explore public data, and to engage with scientific data.
|
| Datasets for Data Mining, Analytics and Knowledge Discovery |
| LinkData.org - a data publishing community/hub website.
|
| Linked Data @ VU |
| Mathematical Retrieval Project |
| Million Song Dataset |
| Nomao datasets - Data deduplication, learning to rank, online reviews, recommendations, text generation, voting networks.
|
| Open speech corpora list Josh Meyer |
| Pizza&Chili Corpus Gonzalo Navarro, Paolo Ferragina |
| Publicly Available Large Data Sets for DB Research Daniel Lemire |
| RedditSota - State-of-the-art result for all Machine Learning Problems
|
| Research Pipeline Data sets |
| Teens and Online Privacy - 2012 survey with questions about teens' attitudes towards privacy and their information management practices.
|
| Time series data: classification and clustering datasets - A very diverse set of clustering/classification data.
|
| UCI machine learning repository - UC Irvine Machine Learning Repository
|
| Yahoo Data Sets - Includes n-grams and anonymized query logs.
|