Authors: Apoorva Joshi, Zhenmei Shi, Akshay Goindani, Hong LiuResearch Leads: Zhenmei Shi, Akshay Goindani, Hong Liu

Large language models are increasingly being used for a broad range of tasks, including reranking, but they may not be the optimal choice when considering practical constraints like cost, latency, and accuracy in production applications.

In this blog post, we put our latest reranker model, rerank-2.5, to the test against some of the best-performing LLMs on the market to see whether LLMs are actually good rerankers. Our studies show the following:

Purpose-built rerankers, such as rerank-2.5, are up to 60x cheaper, 48x faster, and achieve up to 15% better reranking accuracy (NDCG@10) than state-of-the-art LLMs.

First-stage retrieval matters—pairing strong first-stage retrieval methods with specialized rerankers yields the best reranking quality.