Conference paper

A Large-Scale Study of Reranker Relevance Feedback at Inference

Abstract

Neural IR systems often employ a retrieve-and-rerank framework: a bi-encoder retrieves a fixed number of candidates (e.g., k=100), which a cross-encoder then reranks. Recent studies have indicated that relevance feedback from the reranker at inference time can improve the recall of the retriever. The approach works by updating the retriever’s query representations via a distillation process that aligns it with the reranker’s predictions. While a powerful idea, the arguably narrow scope of past studies focusing on a small number of specific domains such as english question answering and entity retrieval has left a gap in our understanding of how well it generalizes. In this paper, we study inference-time reranker relevance feedback extensively across multiple retrieval domains, languages, and modalities, while also investigating aspects such as the performance and latency implications of the number of distillation updates and feedback candidates.