English»Data Sets and State-of-the-art (SOTA)»Generic IR | searchivarius.org
log in | about 

A full list of early (pre-TREC) of text collections   - Includes, among others, Cranfield and CACM.
CLEF tracks   - "Ten Years of CLEF Test Data": a data set for system benchmarking and research purposes on the DIRECT system.
ClueWeb09   - This data set is used in Text Retrieval Conference. Contains two datasets: A and B. A contains approximately 500 million pages in 10 languages. B is a subset of A, which contains 50 million pages.
ClueWeb12  Jamie Callan et al. - A successor of ClueWeb09
NTCIR test data  
Pre-TREC CACM Collection  
UQV100: An IR Test Collection With Query Variability   - Relevance assessments for about 5K queries crowd-sourced from 100 TREC TOPICS.
Web Research Collections - Web Track