The Good Tech Companies - Vector Search: A Reranker Algorithm Showdown

Episode Date: November 26, 2024

This story was originally published on HackerNoon at: https://hackernoon.com/vector-search-a-reranker-algorithm-showdown. Rerankers are ML models that take a set of sear...ch results and reorder them to improve relevance. We tested 6 of them. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #vector-search, #software-development, #generative-ai, #reranker, #reranker-analysis, #hybrid-search, #rrf, #good-company, and more. This story was written by: @datastax. Learn more about this writer by checking @datastax's about page, and for more stories, please visit hackernoon.com. Rerankers are ML models that take a set of search results and reorder them to improve relevance. We tested 6 of them.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Vector Search. A Rear Anchor Algorithm Showdown, by Datastacks. Vector Search effectively delivers semantic similarity for retrieval augmented generation, booted does poorly with short keyword searches or out-of-domain search terms. Supplementing vector retrieval with keyword searches like BM25 and combining the results with a Rearanchor is becoming the standard way to get the best of both worlds. Rear-anchors are ML models that take a set of search results and reorder them to improve relevance. They examine the query paired with
Starting point is 00:00:35 each candidate result in detail, which is computationally expensive but produces more accurate results than simple retrieval methods alone. This can be done either as a second stage in top of a single search, pull 100 results out of vector search, then ask their anchor to identify the top 10, or, more often, to combine results from different kinds of search. In this case, vector search and keyword search. But how good are off-the-shelf rear-anchors? To find out, I tested six rear-anchers on the text from the VDoRe benchmark,
Starting point is 00:01:08 using Gemini Flash to extract text from the images. Details on the datasets can be found in section 3.1 of the kernel polypaper. Notably, tabfquad and shiftproject sources are in French. The rest are in English. We tested these re-rankers. Reciprocal rank fusion, RRF. A formula for combining results from multiple sources without knowing anything about the queries or documents. It depends purely on relative ordering within each source. RRF is used in Elastic and Lama Index, among other projects. Cohere ReRank V3 and Gina ReRanker V2, probably the most popular hosted models. BGE Reranker V2 Meter 3, the highest scoring open-source model, Apache-licensed. Voyage RERANK2 and RERANK2 Lite, freshly released, in September. By a solid company. The Rerankers were fed the top 20 results from both DPR and
Starting point is 00:02:01 BM25, and their ranked NDCG at 5 was evaluated. In the results, raw vector search with embeddings from the BGEM3 model, a slabled DPR, dense passage retrieval. BGEM3 was chosen to compute embeddings because that's what the kernel poly authors used as a baseline. Here's the data on relevance, NDCG at 5. And here's how fast they are at re-ranking searches in the archive dataset. Latenci is proportional to document length. This is graphing latency, so lower is better. The self-hosted BGE model was run on both an NVIDIA 3090 using the simplest possible code lifted straight from the Hugging Face model card. And finally, here's how
Starting point is 00:02:45 much it cost with each model to re-rank the almost 3,000 searches from all six datasets. Cohere prices per search, with additional fees for long documents, while the others price per token, analysis. All the models do roughly as well on the French datasets as they do on the English ones. Cohere is significantly more expensive and offers slightly, but consistently, worse relevance than the other ML re-rankers, but it's 3x faster than the next fastest services. Additionally, Coher's standard rate limits are the most generous. Voyage RERANK2 is the king of re-ranking relevance in all datasets, for an additional hit to latency. Notably, it's the
Starting point is 00:03:25 only model that does not do worse than DPR alone in the archive dataset, which seems to be particularly tricky. Voyage RERANK2 Lite and JINA RERANKER V2 are very, very similar. They're the same speed, hosted at the same price, and close to the same relevance. With a slight edge to Voyage. But Voyage's standard rate limit is double genus, and with Voyage you get a, real, Python client instead of having to make raw HTTP requests. BGE RearAnchor V2M3 is such a lightweight model, under 600M parameters, that even on an older consumer GPU it is usably fast. Conclusion. RRF adds little to no value to hybrid search scenarios.
Starting point is 00:04:07 On half of the datasets, it performed worse than either BM25 or DPR alone. In contrast, all ML-based DRER anchors tested delivered meaningful improvements over pure vector or keyword search, with Voyage RERANK2 setting the bar for relevance. Tradeoffs are still present. Superior accuracy from Voyage RERANK2, faster processing from Cohere, or solid middle ground performance from Gina or Voyage's light model. Even the open-source BGE rear anchor, while trailing commercial options, adds significant value for teams choosing to self-host. As foundation models continue advancing, we can expect even better performance. But today's ML rear-anchors are already mature enough to deploy with confidence across multilingual content. By Jonathan Ellis.
Starting point is 00:04:54 Datastacks Thank you for listening to this HackerNoon story. Read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.