English»Classic Information Retrieval»Evaluation

Blog

Directory

A Case Study in Web Search using TREC Algorithms Amit Singhal, Marcin Kaszkiel - On importance of anchor text.

A Comparison of Statistical Signiﬁcance Tests for Information Retrieval Evaluation Mark D. Smucker, James Allan, Ben Carterettу - An evaluation of the permutation/randomization test.

A Statistical Analysis of TREC-3 Data Jean Tague-Sutcliffe, James Blustein - A seminal paper where multiple comparison adjustments were used in IR experiments.

Agreement Among Statistical Signiﬁcance Tests for Information Retrieval Evaluation at Varying Sample Sizes Mark D. Smucker, James Allan, Ben Carterette

Bias and the limits of pooling for large collections. Buckley C. E., Dimmick, D. L., Soboroff, I. M., Voorhees E. M.

Click Models for Web Search Aleksandr Chuklin, Ilya Markov, Maarten de Rijke

Comparing the Sensitivity of Information Retrieval Metrics Filip Radlinski, Nick Craswell

Do TREC Web Collections Look Like the Web? Ian Soboroff

Evaluating the performance of information retrieval systems using test collections Paul Clough, Mark Sanderson - A survey of test collections and evaluation methods.

Expected Reciprocal Rank for Graded Relevance Olivier Chapelle, Donald Metzler, Ya Zhang, Pierre Grinspan

Forming Test Collections with No System Pooling

How Reliable are the Results of Large-Scale Information Retrieval Experiments? J Zobel

Improvements That Don’t Add Up Timothy G. Armstrong, Alistair Moffat, William Webber, Justin Zobel

Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability Mark Sanderson, Justin Zobel

Minimal Test Collections for Retrieval Evaluation Ben Carterette, James Allan, Ramesh Sitaraman

Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments BENJAMIN A. CARTERETTE

Novelty and diversity in information retrieval evaluation Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, Ian MacKinnon

Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements Gregory Marton, Alexey Radul

On Rank Correlation in Information Retrieval Evaluation Massimo Melucci, University of Padua

On the Robustness of Relevance Measures with Incomplete Judgments Tanuja Bompada, Chi-Chao Chang, John Chen Ravi, Kumar Rajesh Shenoy

Power and Bias of Subset Pooling Strategies Cormack G.V., Lynam T.R.

Predicting Query Performance Steve Cronen-Townsend , Yun Zhou, W. Bruce Croft

Quantifying Test Collection Quality Based on the Consistency of Relevance Judgements

Ranking Retrieval Systems without Relevance Judgments Ian Soboroff, Charles Nicholas, Patrick Cahan

Selecting good expansion terms for pseudo-relevance feedback Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson

Statistical inference in retrieval effectiveness evaluation Jacques Savoy

Statistical Power in Retrieval Experimentation William Webber, Alistair Moffat, Justin Zobel

Statistical Precision of Information Retrieval Evaluation Gordon V. Cormack, Thomas R. Lynam

Test Collection Based Evaluation of Information Retrieval Systems Mark Sanderson

TREC: Experiment and Evaluation in Information Retrieval E. M. Voorhees, D. K. Harman.

Using Statistical Testing in the Evaluation of Retrieval Experiments David Hull

Validity and power of t-test for comparing MAP and GMAP Cormack G.V., Lynam T.R.