InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking
/ Authors
/ Abstract
Large Language Models (LLMs) have demonstrated significant strides across various information retrieval tasks, particularly as rerankers, owing to their strong generalization and knowledgetransfer capabilities acquired from extensive pretraining. In parallel, the rise of LLM-based chat interfaces has raised user expectations, encouraging users to pose more complex queries that necessitate retrieval by "reasoning" over documents rather than through simple keyword matching or semantic similarity. While some recent efforts have exploited reasoning abilities of LLMs for reranking such queries, considerable potential for improvement remains. In that regards, we introduce InsertRank, an LLM-based reranker that leverages lexical signals like BM25 scores during reranking to further improve retrieval performance. InsertRank demonstrates improved retrieval effectiveness on - BRIGHT, a reasoning benchmark spanning 12 diverse domains, and R2MED, a specialized medical reasoning retrieval benchmark spanning 8 different tasks. We conduct an exhaustive evaluation and several ablation studies and demonstrate that InsertRank consistently improves retrieval effectiveness across multiple families of LLMs, including GPT, Gemini, and Deepseek models. With Deepseek-R1, InsertRank achieves a score of 37.5 on the BRIGHT benchmark. and 51.1 on the R2MED benchmark, surpassing previous methods. In addition, we additionally demonstrate the effectiveness of InsertRank on standard benchmarks like TREC DL 19, 20 and TREC HARD, further demonstrating the robustness of this method. In addition, we also demonstrate the effectiveness of our method with BERT based retriever scores, thus illustrating how including feedback from the first stage retriever can be helpful to guide a listwise LLM reranker.
Journal: Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining