log in | about 

A few days ago I launched a traditional IR system into (lower layers of) the Transformer cloud. Although inferior to most BERT-based models, it outperformed several neural submissions (as well as all non-neural ones), including two submissions that used a large pretrained Transformer model for re-ranking.

My objectives were:

  • To provide a stronger traditional baseline;
  • To develop an effective first-stage retrieval system,
    which can be efficient and effective without expensive index-time precomputation.

I have posted a short write-up on arxiv to describe the submitted system. The write-up comes with two notebooks, which can be used to reproduce results.

This work was possible largely due to using our own flexible retrieval toolkit FlexNeuART (intended pronunciation flex-noo-art), which was recently presented at the EMNLP OSS Workshop. FlexNeuART was also instrumental to achieving top spots on the MS MARCO document ranking leaderboard in August and November 2020.