English»Classic Information Retrieval»Evaluation | searchivarius.org
log in | contact | about 

A Case Study in Web Search using TREC Algorithms  Amit Singhal, Marcin Kaszkiel - On importance of anchor text.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation  Mark D. Smucker, James Allan, Ben Carterettу - An evaluation of the permutation/randomization test.
A Statistical Analysis of TREC-3 Data  Jean Tague-Sutcliffe, James Blustein - A seminal paper where multiple comparison adjustments were used in IR experiments.
Agreement Among Statistical Significance Tests for Information Retrieval Evaluation at Varying Sample Sizes  Mark D. Smucker, James Allan, Ben Carterette
Bias and the limits of pooling for large collections.  Buckley C. E., Dimmick, D. L., Soboroff, I. M., Voorhees E. M.
Comparing the Sensitivity of Information Retrieval Metrics  Filip Radlinski, Nick Craswell
Do TREC Web Collections Look Like the Web?  Ian Soboroff
Evaluating the performance of information retrieval systems using test collections  Paul Clough, Mark Sanderson - A survey of test collections and evaluation methods.
Expected Reciprocal Rank for Graded Relevance  Olivier Chapelle, Donald Metzler, Ya Zhang, Pierre Grinspan
Forming Test Collections with No System Pooling  
How Reliable are the Results of Large-Scale Information Retrieval Experiments?  J Zobel
Improvements That Don’t Add Up  Timothy G. Armstrong, Alistair Moffat, William Webber, Justin Zobel
Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability  Mark Sanderson, Justin Zobel
Minimal Test Collections for Retrieval Evaluation  Ben Carterette, James Allan, Ramesh Sitaraman
Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments  BENJAMIN A. CARTERETTE
Novelty and diversity in information retrieval evaluation  Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, Ian MacKinnon
Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements  Gregory Marton, Alexey Radul
On Rank Correlation in Information Retrieval Evaluation  Massimo Melucci, University of Padua
On the Robustness of Relevance Measures with Incomplete Judgments  Tanuja Bompada, Chi-Chao Chang, John Chen Ravi, Kumar Rajesh Shenoy
Power and Bias of Subset Pooling Strategies  Cormack G.V., Lynam T.R.
Predicting Query Performance  Steve Cronen-Townsend , Yun Zhou, W. Bruce Croft
Quantifying Test Collection Quality Based on the Consistency of Relevance Judgements  
Ranking Retrieval Systems without Relevance Judgments  Ian Soboroff, Charles Nicholas, Patrick Cahan
Selecting good expansion terms for pseudo-relevance feedback  Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson
Statistical inference in retrieval effectiveness evaluation  Jacques Savoy
Statistical Power in Retrieval Experimentation  William Webber, Alistair Moffat, Justin Zobel
Statistical Precision of Information Retrieval Evaluation  Gordon V. Cormack, Thomas R. Lynam
Test Collection Based Evaluation of Information Retrieval Systems  Mark Sanderson
TREC: Experiment and Evaluation in Information Retrieval  E. M. Voorhees, D. K. Harman.
Using Statistical Testing in the Evaluation of Retrieval Experiments  David Hull
Validity and power of t-test for comparing MAP and GMAP  Cormack G.V., Lynam T.R.