A Case Study in Web Search using TREC Algorithms Amit Singhal, Marcin Kaszkiel - On importance of anchor text.
|
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation Mark D. Smucker, James Allan, Ben Carterettу - An evaluation of the permutation/randomization test.
|
A Statistical Analysis of TREC-3 Data Jean Tague-Sutcliffe, James Blustein - A seminal paper where multiple comparison adjustments were used in IR experiments.
|
Agreement Among Statistical Significance Tests for Information Retrieval Evaluation at Varying Sample Sizes Mark D. Smucker, James Allan, Ben Carterette |
Bias and the limits of pooling for large collections. Buckley C. E., Dimmick, D. L., Soboroff, I. M., Voorhees E. M. |
Click Models for Web Search Aleksandr Chuklin, Ilya Markov, Maarten de Rijke |
Comparing the Sensitivity of Information Retrieval Metrics Filip Radlinski, Nick Craswell |
Do TREC Web Collections Look Like the Web? Ian Soboroff |
Evaluating the performance of information retrieval systems using test collections Paul Clough, Mark Sanderson - A survey of test collections and evaluation methods.
|
Expected Reciprocal Rank for Graded Relevance Olivier Chapelle, Donald Metzler, Ya Zhang, Pierre Grinspan |
Forming Test Collections with No System Pooling |
How Reliable are the Results of Large-Scale Information Retrieval Experiments? J Zobel |
Improvements That Don’t Add Up Timothy G. Armstrong, Alistair Moffat, William Webber, Justin Zobel |
Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability Mark Sanderson, Justin Zobel |
Minimal Test Collections for Retrieval Evaluation Ben Carterette, James Allan, Ramesh Sitaraman |
Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments BENJAMIN A. CARTERETTE |
Novelty and diversity in information retrieval evaluation Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, Ian MacKinnon |
Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements Gregory Marton, Alexey Radul |
On Rank Correlation in Information Retrieval Evaluation Massimo Melucci, University of Padua |
On the Robustness of Relevance Measures with Incomplete Judgments Tanuja Bompada, Chi-Chao Chang, John Chen Ravi, Kumar Rajesh Shenoy |
Power and Bias of Subset Pooling Strategies Cormack G.V., Lynam T.R. |
Predicting Query Performance Steve Cronen-Townsend , Yun Zhou, W. Bruce Croft |
Quantifying Test Collection Quality Based on the Consistency of Relevance Judgements |
Ranking Retrieval Systems without Relevance Judgments Ian Soboroff, Charles Nicholas, Patrick Cahan |
Selecting good expansion terms for pseudo-relevance feedback Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson |
Statistical inference in retrieval effectiveness evaluation Jacques Savoy |
Statistical Power in Retrieval Experimentation William Webber, Alistair Moffat, Justin Zobel |
Statistical Precision of Information Retrieval Evaluation Gordon V. Cormack, Thomas R. Lynam |
Test Collection Based Evaluation of Information Retrieval Systems Mark Sanderson |
TREC: Experiment and Evaluation in Information Retrieval E. M. Voorhees, D. K. Harman. |
Using Statistical Testing in the Evaluation of Retrieval Experiments David Hull |
Validity and power of t-test for comparing MAP and GMAP Cormack G.V., Lynam T.R. |