The Good Tech Companies - Benchmarking 1B Vectors with Low Latency and High Throughput

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Benchmarking 1B vectors with low latency and high throughput by Skyladyby. As AI-driven applications move from experimentation into real-time production systems, the expectations placed on vector similarity search continue to rise dramatically. Teams now need to support billion-scale datasets, high concurrency, strict P99 latency budgets, and a level of operational simplicity that reduces architectural overhead rather than adding to. SkyladyB vector search was built with these constraints in mind. It offers a unified engine for storing structured data alongside unstructured embeddings, and it achieves performance that pushes the

Starting point is 00:00:42 boundaries of what a managed database system can deliver at scale. The results of our recent high-scale 1 billion vector benchmark show that Skyladybee demonstrates both ultra-low latency and highly predictable behavior under load. Architecture at a glance, to achieve low single millisecond performance across massive vector sets, SkyladyB adopts an architecture that separates the storage and indexing responsibilities while keeping the system unified from the user's perspective. The SkyladyB nodes store both the structured attributes and the vector embeddings in the same distributed table. Meanwhile, a dedicated vector store service implemented in Rust and powered by the U-Search engine optimized to support SkyladyB's predictable single-digit millisecond latencies, consumes updates from

Starting point is 00:01:25 SkylDB via CDC, and builds approximate nearest neighbor and indexes in memory. Queries are issued to the database using a familiar CQL expression such as, they are then internally routed to the vector store, which performs the similarity search and returns the candidate rows. This design allows each layer-to-scale independently, optimizing for its own workload characteristics and eliminating resource interference. Benchmarking 1 billion vectors. To evaluate real-world performance, Skyladybee ran a rigorous benchmark using the publicly available Yandex Deep underscore 1B dataset, which contains 1 billion vectors of 96 dimensions. The setup consisted of six nodes, three SkyladyB nodes running only 4i, 16 extra large instances, each equipped with 64 VCPUs, and three vector store nodes running on R7i.

Starting point is 00:02:16 48 extra large instances, each with 192 VCPUs. This hardware configuration reflects realistic production deployments where the database and vector indexing tiers are provisioned with different resource profiles. Their results focus on two usage scenarios with distinct accuracy and latency goals, detailed in the following sections. A full architectural deep dive, including diagrams, performance tradeoffs, and extended benchmark results for higher dimension data sets, can be found in the technical blog post building a low latency vector search engine for Skyladyby. These additional results follow the same pattern seen in the 96-dimensional tests, exceptionally low latency, high throughput, and stability across a wider range of concurrent load profiles. Scenario number one,

Starting point is 00:03:04 ultra-low latency with moderate recall. The first scenario was designed for workloads such as recommendation engines Andriel time personalization systems, where the primary objective is extremely low latency and the recall can be moderately relaxed. We used index parameters M equals 1-6, EF construction equals 128, EF search equals 64 and Euclidean distance. At approximately 70% recall and with 30 concurrent searches, the system maintained a P99 latency of only one, 7 milliseconds and AP 50 of just 1, 2 milliseconds while sustaining 25,000 queries per second, When expanding the throughput window, still keeping P99 latency below 10 milliseconds, the cluster reached 60,000 QPS for K equals 100 with a P50 latency of 4, 5 milliseconds, and 252,000

Starting point is 00:03:55 QPS for K equals 10 with a P 50 latency of 2. 2 milliseconds. Importantly, utilizing SkyladyB's predictable performance, this throughput scales linearly. Adding more vector store nodes directly increases the achievable QPS. without compromising latency or recall. Scenario number two, high recall with slightly higher latency. The second scenario targets systems that require near-perfect recall, including high-fidelity semantic search and retrieval augmented generation pipelines. Here, the index parameters were significantly increased to M equals 64,

Starting point is 00:04:30 EF construction equals 512, and EF search equals 512. This configuration raises computer requirements but dramatically improves recall. With 50 concurrent searches and recall approaching 98%, Skyladyby kept P99 latency below 12 milliseconds and P50 around 8 milliseconds while delivering 6,500 QPS. When shifting the focus to maximum sustained throughput while keeping P99 latency under 20 milliseconds and P50 under 10 milliseconds, the system achieved 16,600 QPS. Even under these settings, latency remained notably stable across values of K from 10 to 100. demonstrating predictable behavior and environments where query limits vary dynamically. Detailed results the table below presents the summary of the results for some representative concurrency levels. Unified vector search without the complexity. A big advantage of integrating

Starting point is 00:05:26 vector search with Skyladyby is that it delivers substantial performance and networking cost advantages. The vector store resides close to the data with just a single network hop between metadata and embedding storage in the same availability zone. This locality, combined with Skyladyb-sured per core execution model, allows the system to provide real-time latency and massive throughput even under heavy load. The result is that teams can accomplish more with fewer resources compared to specialized vector search systems. In addition to being fast at scale, Skyladyby's vector search is also simpler 2Pret. Its key advantage is its ability to unify structured and unstructured retrieval within a single dataset. This means you can store traditional attributes and

Starting point is 00:06:09 vector embedding side by side and express queries that combine semantic search with conventional search. For example, you can ask the database to, find the top five most similar documents, but only those belonging to this specific customer and created within the past 30 days. This approach eliminates the common pain of maintaining separate systems for transactional data and vector search, and it removes the operational fragility associated with sinking between two sources of truth. This also means there is no ETL drift and no dual-righted. risk. Instead of shipping embeddings to a separate vector database while keeping metadata in a transactional store, Skyladybee consolidates everything into a single system. The only pipeline you need is the

Starting point is 00:06:51 computational step that generates embeddings using your preferred LLM or ML model. Once written, the data remains consistent without extra coordination, backfills, or complex streaming jobs. Operationally, Skyladybee simplifies the entire retrieval stack. Because it is built on Skyladyby's proven distributed architecture, the system is highly available, horizontally scalable, and resilient across availability zones and regions. Instead of operating two or three different technologies, each with its own monitoring, security configurations, and failure modes, you only manage one. This consolidation drastically reduces operational complexity while a simultaneously improving performance. Roadmap. The product is now in gear-al availability. This includes cloud portal

Starting point is 00:07:37 provisioning, on-demand billing, a full range of instance types, and additional performance optimizations. Self-service scaling is planned for Q1. By the end OFQ1, we will introduce native filtering capabilities, enabling vector search queries to combine and results with traditional predicates for more precise hybrid retrieval. Looking further ahead, the roadmap includes support for scalar and binary quantization to reduce memory usage, TTL functionality for life cycle automation of vector data, hybrid search combining ANN with BM25 for unified lexical and semantic relevance. Conclusion, Skyladyby has demonstrated that it is capable of delivering industry-ledding performance for vector search at massive scale, handling a dataset of 1 billion vectors with P99 latency

Starting point is 00:08:24 as low as 1.7 milliseconds and throughput up to 252,000 QPS. These results validate SkyladyB vector search as a unified, high-performance solution that simplifies the operational complex of real-time AI applications by co-locating structured data and unstructured embeddings. The current benchmarks showcase the current state of Skyladybe's scalability. With planned enhancements in the upcoming roadmap, including scalar quantization and sharding, these performance limits are set to increase in the next year. Nevertheless, even now, the feature is ready for running latency-critical workloads such as fraud detection or recommendation systems. Thank you for listening to this hackernoon story, read by artificial intelligence.

Starting point is 00:09:07 Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Benchmarking 1B Vectors with Low Latency and High Throughput

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.