|100+ Interesting Data Sets for Statistics Robert Seaton
|1940 USA census
|6 Dataset Lists Curated by Data Scientists
|A comprehensive list of data sets for machine learning
|Allen Brain Observatory - standardized in vivo survey of physiological activity in the mouse visual cortex.
|Data Depot - DataDepot is a set of tools for collaboratively uploading, sharing, and analyzing data. You can use DataDepot to track personal data, to explore public data, and to engage with scientific data.
|Datasets for Data Mining, Analytics and Knowledge Discovery
|LinkData.org - a data publishing community/hub website.
|Linked Data @ VU
|Mathematical Retrieval Project
|Million Song Dataset
|Nomao datasets - Data deduplication, learning to rank, online reviews, recommendations, text generation, voting networks.
|Open speech corpora list Josh Meyer
|Pizza&Chili Corpus Gonzalo Navarro, Paolo Ferragina
|Publicly Available Large Data Sets for DB Research Daniel Lemire
|RedditSota - State-of-the-art result for all Machine Learning Problems
|Research Pipeline Data sets
|Teens and Online Privacy - 2012 survey with questions about teens' attitudes towards privacy and their information management practices.
|Time series data: classification and clustering datasets - A very diverse set of clustering/classification data.
|UCI machine learning repository - UC Irvine Machine Learning Repository
|Yahoo Data Sets - Includes n-grams and anonymized query logs.