English┬╗Data Sets and Test Collections | searchivarius.org
log in | contact | about 


Duplicate Detection & Record Linkage

Generic IR

Learning to Rank


Question Answering (QA)

Social Networks

Text Mining and NLP
Question Answering (QA), Sentiment Analysis and Opinion Mining, Metaphor detection

User behavior



100+ Interesting Data Sets for Statistics  Robert Seaton
1940 USA census  
6 Dataset Lists Curated by Data Scientists  
A comprehensive list of data sets for machine learning  
Allen Brain Observatory   - standardized in vivo survey of physiological activity in the mouse visual cortex.
Data Depot   - DataDepot is a set of tools for collaboratively uploading, sharing, and analyzing data. You can use DataDepot to track personal data, to explore public data, and to engage with scientific data.
Datasets for Data Mining, Analytics and Knowledge Discovery  
LinkData.org   - a data publishing community/hub website.
Linked Data @ VU  
Mathematical Retrieval Project  
Million Song Dataset  
Nomao datasets   - Data deduplication, learning to rank, online reviews, recommendations, text generation, voting networks.
Pizza&Chili Corpus  Gonzalo Navarro, Paolo Ferragina
Publicly Available Large Data Sets for DB Research  Daniel Lemire
Research Pipeline Data sets  
Teens and Online Privacy   - 2012 survey with questions about teens' attitudes towards privacy and their information management practices.
Time series data: classification and clustering datasets   - A very diverse set of clustering/classification data.
UCI machine learning repository   - UC Irvine Machine Learning Repository
Yahoo Data Sets   - Includes n-grams and anonymized query logs.