The Good Tech Companies - Benchmarking 1B Vectors with Low Latency and High Throughput
Episode Date: January 21, 2026This story was originally published on HackerNoon at: https://hackernoon.com/benchmarking-1b-vectors-with-low-latency-and-high-throughput. ScyllaDB Vector Search reaches... 1B vectors with 2ms p99 latency and 250K QPS, unifying structured data and embeddings at scale. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #scylladb-vector-search, #scylladb-ann-search, #vector-search-p99-latency, #real-time-rag-database, #high-qps-vector-database, #unified-vector-and-metadata, #usearch-vector-engine, #good-company, and more. This story was written by: @scylladb. Learn more about this writer by checking @scylladb's about page, and for more stories, please visit hackernoon.com. ScyllaDB Vector Search is now GA and delivers real-time similarity search at massive scale. Benchmarks on the yandex-deep_1b dataset show p99 latency as low as 1.7ms and throughput up to 252K QPS across 1 billion vectors. By unifying structured data and embeddings in one system, ScyllaDB eliminates dual-write pipelines while supporting production-grade AI, RAG, and recommendation workloads.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Benchmarking 1B vectors with low latency and high throughput by Skyladyby.
As AI-driven applications move from experimentation into real-time production systems,
the expectations placed on vector similarity search continue to rise dramatically.
Teams now need to support billion-scale datasets, high concurrency, strict P99 latency budgets,
and a level of operational simplicity that reduces architectural overhead rather than adding to.
SkyladyB vector search was built with these constraints in mind. It offers a unified engine for storing
structured data alongside unstructured embeddings, and it achieves performance that pushes the
boundaries of what a managed database system can deliver at scale. The results of our recent high-scale
1 billion vector benchmark show that Skyladybee demonstrates both ultra-low latency and highly
predictable behavior under load. Architecture at a glance, to achieve low single millisecond performance
across massive vector sets, SkyladyB adopts an architecture that separates the storage and indexing
responsibilities while keeping the system unified from the user's perspective. The SkyladyB nodes
store both the structured attributes and the vector embeddings in the same distributed table.
Meanwhile, a dedicated vector store service implemented in Rust and powered by the U-Search engine
optimized to support SkyladyB's predictable single-digit millisecond latencies, consumes updates from
SkylDB via CDC, and builds approximate nearest neighbor and indexes in memory. Queries are issued to the
database using a familiar CQL expression such as, they are then internally routed to the vector
store, which performs the similarity search and returns the candidate rows. This design allows
each layer-to-scale independently, optimizing for its own workload characteristics and eliminating resource
interference. Benchmarking 1 billion vectors. To evaluate real-world performance, Skyladybee ran a rigorous
benchmark using the publicly available Yandex Deep underscore 1B dataset, which contains 1 billion vectors of 96 dimensions.
The setup consisted of six nodes, three SkyladyB nodes running only 4i, 16 extra large instances,
each equipped with 64 VCPUs, and three vector store nodes running on R7i.
48 extra large instances, each with 192 VCPUs. This hardware configuration reflects realistic
production deployments where the database and vector indexing tiers are provisioned with different
resource profiles. Their results focus on two usage scenarios with distinct accuracy and latency
goals, detailed in the following sections. A full architectural deep dive, including diagrams,
performance tradeoffs, and extended benchmark results for higher dimension data sets, can be found in the
technical blog post building a low latency vector search engine for Skyladyby. These additional results
follow the same pattern seen in the 96-dimensional tests, exceptionally low latency, high throughput,
and stability across a wider range of concurrent load profiles. Scenario number one,
ultra-low latency with moderate recall. The first scenario was designed for workloads such
as recommendation engines Andriel time personalization systems, where the primary objective is extremely
low latency and the recall can be moderately relaxed. We used index parameters M equals 1-6,
EF construction equals 128, EF search equals 64 and Euclidean distance. At approximately 70%
recall and with 30 concurrent searches, the system maintained a P99 latency of only one,
7 milliseconds and AP 50 of just 1, 2 milliseconds while sustaining 25,000 queries per second,
When expanding the throughput window, still keeping P99 latency below 10 milliseconds,
the cluster reached 60,000 QPS for K equals 100 with a P50 latency of 4, 5 milliseconds, and 252,000
QPS for K equals 10 with a P 50 latency of 2.
2 milliseconds. Importantly, utilizing SkyladyB's predictable performance, this throughput scales linearly.
Adding more vector store nodes directly increases the achievable QPS.
without compromising latency or recall.
Scenario number two, high recall with slightly higher latency.
The second scenario targets systems that require near-perfect recall,
including high-fidelity semantic search and retrieval augmented generation pipelines.
Here, the index parameters were significantly increased to M equals 64,
EF construction equals 512, and EF search equals 512.
This configuration raises computer requirements but dramatically improves recall.
With 50 concurrent searches and recall approaching 98%, Skyladyby kept P99 latency below 12 milliseconds and P50 around 8 milliseconds while delivering 6,500 QPS.
When shifting the focus to maximum sustained throughput while keeping P99 latency under 20 milliseconds and P50 under 10 milliseconds, the system achieved 16,600 QPS.
Even under these settings, latency remained notably stable across values of K from 10 to 100.
demonstrating predictable behavior and environments where query limits vary dynamically.
Detailed results the table below presents the summary of the results for some representative
concurrency levels. Unified vector search without the complexity. A big advantage of integrating
vector search with Skyladyby is that it delivers substantial performance and networking cost
advantages. The vector store resides close to the data with just a single network hop
between metadata and embedding storage in the same availability zone. This locality,
combined with Skyladyb-sured per core execution model, allows the system to provide real-time
latency and massive throughput even under heavy load. The result is that teams can accomplish more
with fewer resources compared to specialized vector search systems. In addition to being fast at scale,
Skyladyby's vector search is also simpler 2Pret. Its key advantage is its ability to unify structured
and unstructured retrieval within a single dataset. This means you can store traditional attributes and
vector embedding side by side and express queries that combine semantic search with conventional search.
For example, you can ask the database to, find the top five most similar documents,
but only those belonging to this specific customer and created within the past 30 days.
This approach eliminates the common pain of maintaining separate systems for transactional data
and vector search, and it removes the operational fragility associated with sinking between
two sources of truth. This also means there is no ETL drift and no dual-righted.
risk. Instead of shipping embeddings to a separate vector database while keeping metadata in a transactional
store, Skyladybee consolidates everything into a single system. The only pipeline you need is the
computational step that generates embeddings using your preferred LLM or ML model. Once written,
the data remains consistent without extra coordination, backfills, or complex streaming jobs.
Operationally, Skyladybee simplifies the entire retrieval stack. Because it is built on Skyladyby's proven
distributed architecture, the system is highly available, horizontally scalable, and resilient
across availability zones and regions. Instead of operating two or three different technologies,
each with its own monitoring, security configurations, and failure modes, you only manage one.
This consolidation drastically reduces operational complexity while a simultaneously improving
performance. Roadmap. The product is now in gear-al availability. This includes cloud portal
provisioning, on-demand billing, a full range of instance types, and additional performance optimizations.
Self-service scaling is planned for Q1. By the end OFQ1, we will introduce native filtering capabilities,
enabling vector search queries to combine and results with traditional predicates for more precise
hybrid retrieval. Looking further ahead, the roadmap includes support for scalar and binary
quantization to reduce memory usage, TTL functionality for life cycle automation of vector data,
hybrid search combining ANN with BM25 for unified lexical and semantic relevance.
Conclusion, Skyladyby has demonstrated that it is capable of delivering industry-ledding performance
for vector search at massive scale, handling a dataset of 1 billion vectors with P99 latency
as low as 1.7 milliseconds and throughput up to 252,000 QPS. These results validate SkyladyB
vector search as a unified, high-performance solution that simplifies the operational complex
of real-time AI applications by co-locating structured data and unstructured embeddings.
The current benchmarks showcase the current state of Skyladybe's scalability. With planned
enhancements in the upcoming roadmap, including scalar quantization and sharding, these performance
limits are set to increase in the next year. Nevertheless, even now, the feature is ready for
running latency-critical workloads such as fraud detection or recommendation systems.
Thank you for listening to this hackernoon story, read by artificial intelligence.
Visit hackernoon.com to read, write, learn and publish.
