Traditional IR rivals neural models on the MS MARCO Document Ranking Leaderboard

Blog

Directory

Submitted by srchvrs on Tue, 12/15/2020 - 21:39

A few days ago I launched a traditional IR system into (lower layers of) the Transformer cloud. Although inferior to most BERT-based models, it outperformed several neural submissions (as well as all non-neural ones), including two submissions that used a large pretrained Transformer model for re-ranking.

My objectives were:

To provide a stronger traditional baseline;
To develop an effective first-stage retrieval system,
which can be efficient and effective without expensive index-time precomputation.

I have posted a short write-up on arxiv to describe the submitted system. The write-up comes with two notebooks, which can be used to reproduce results.

This work was possible largely due to using our own flexible retrieval toolkit FlexNeuART (intended pronunciation flex-noo-art), which was recently presented at the EMNLP OSS Workshop. FlexNeuART was also instrumental to achieving top spots on the MS MARCO document ranking leaderboard in August and November 2020.

srchvrs's blog

You are here