The Good Tech Companies - ScyllaDB vs Apache Cassandra: A Decade of Evolution, Performance Gains, and New Capabilities

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Cilidibi versus Apache Cassandra, a decade of evolution, performance gains, and new capabilities, by Ciladibi. By Felipe Cardinetti Mendez in 2008, Apache Cassandra set a new standard for database scalability. Born to support Facebook's inbox search, it has since been adopted by tech giants like Uber, Netflix, and Apple, where it's run by experts who also serve as Cassandra contributors, alongside Datastacks, IBM. And as its adoption scaled, Cassandra remained true to its core mission of scaling on commodity hardware with high availability. But what about performance, simplicity, efficiency? In 2015, Ciladibi was born to go beyond Cassandra's suboptimal resource utilization.

Starting point is 00:00:50 Fresh from creating KVM and hacking the Linux kernel, the founders believed that their low-level engineering approach could squeeze considerably more power from the underlying infrastructure. The timing was ideal. Just a year earlier, Netflix had published their numbers showing how to push Apache Cassandra to 1 million right RPS. This was an impressive feat, but one that required significant infrastructure investments and tuning efforts. The idea was quite simple, in theory, at least, take Apache Cassandra scalable architecture

Starting point is 00:01:19 and re-implement it close to the metal while keeping wire protocol compatibility. Not relying on Java meant less latency variability, plus no stop the world pauses, while a unique shard per core architecture maximized surverse throughput even under heavy system load. To prevent contention, everything was made asynchronous, and all these optimizations were paired with autonomous internal schedulers for minimal operational overhead. That was 10 years ago. While I can't speak to Cassandra's current direction, Cillity B evolved quite significantly since then, shifting from just a faster Cassandra implementation to a database with its own identity and unique feature set. Spoiler. In this video, I walk you through some key differences between Cilid band how it differs from Apache Cassandra.

Starting point is 00:02:06 I discuss the differences in performance, elasticity, and capabilities such as workload prioritization. You can see how Cilidibi maps data per CPU core, scales in parallel, and de-risks topology changes, allowing it to handle millions of ops with predictable low latencies and without constant tuning and babysitting. Cillidibi's evolution, the first generation of Cilladeeby was all about raw performance. That's when we introduced the shard per core asynchronous architecture, row-based cache, and advanced schedulers that achieve predictable low latencies. Ciladiby's second generation aimed for feature parity with Cassandra, but we actually went beyond that.

Starting point is 00:02:45 For example, we introduced our materialized views and production-ready global secondary indexes, something that Cassandra still flags is experimental. Likewise, CILADB also introduced support for local secondary indexes in that same year. Those were just introduced in Cassandra 5 after at least three different indexing implementations. Moreover, our Paxos implementation for lightweight transactions eliminated much of the overhead and limitations in Cassandra's alternative implementation. The third generation marked our shift to the cloud, along with continued innovation. This is when CILADB Alternator, our DynamoDB compatible API, was introduced.

Starting point is 00:03:24 We added support for ZSTD compression in 2020, something Cassandra-only adopted it late in 2021. During this period, we dramatically improved repair speeds with row-level repair and introduced workload prioritization, more on this in the next section. The fourth generation of CILADB emerged around the time AWS announced theory 3 and instance family, with high-density nodes holding up to 60 terabytes of data, something Cassandra still struggles to handle effectively. During this period, we introduced the incremental compaction strategy, ICS, allowing users tutelize up to 70% of their storage before scaling out. This later evolved into a hybrid compaction strategy, and we now support 90% storage utilization. We also introduced

Starting point is 00:04:10 change data capture, CDC, with a fundamentally different approach from Cassandra. And we significantly extended the CQL protocol with concepts such as shard awareness, bypass cache, per query configurable timeouts, and much more. Finally, we arrive at the fifth generation of Cilidibi, which is still unfolding. This phase represents our path towards strong consistency and elasticity with raft and tablets. For more about the significance of this, read-on capabilities that set Cilladee be apart. Our engineers have introduced lots of interesting features over the past decade. Based on my interactions with former Cassandra users, I think these are the most interesting to discuss here. Tablets data distribution each Cilidabee table is split into smaller fragments, tablets, to evenly

Starting point is 00:04:56 distribute data and load across the system. Tablets bring elasticity to Skyla DB, allowing you to instantly double, triple, or even 10x your cluster size to accommodate unpredictable traffic surges. They also enable more efficient use of storage, reaching up to 90% utilization. Since teams can quickly scale out in response to traffic spikes, they can satisfy latency SL as without needing to over-provision, just in case. Raft-based strong consistency for metadata Raft introduces strong consistency to Ciladibis metadata. Gone are the days when a schema change could push your cluster into disagreement or you'd lose access because you forgot to update the replication factor of your authentication keyspace.

Starting point is 00:05:39 that still plagued Cassandra. Workload prioritization. Workload prioritization allows you to consolidate multiple workloads under a single cluster, each with its own SLA. Basically, it controls how different workloads compete for system resources. Teams use it to prioritize urgent application requests that require immediate response times versus others that don't tolerate slider delays, EG.

Starting point is 00:06:03 Large scans. Common use cases include balancing real-time versus batch processing, splitting rights from reeds and workload infrastructure consolidation repair-based operations repair-based operations ensure your cluster data stays in sync even during topology changes this addresses a long-standing data consistency flaw in apache cassandra where operations like replacing failed nodes can result in data loss cillity b also fully eliminates the problem of data resurrection thanks to repair-based tombstone garbage collection incremental compaction incremental compaction ICS has been the default Compaction Strategy in Silidibe for over five years. ICS greatly reduces the temporary space amplification, resulting in more disk space being

Starting point is 00:06:49 available for storing user data, and that eliminates the typical requirement of 50% free space in your drive. There is no comparable Cassandra feature. Cassandra just recently introduced Unified Compaction, which has yet to prove itself. Ro-based Cachiciladibi's row-based cache is also unique. It is enabled by default and requires no manual tuning. With the bypass cache extension, you can prevent Cache pollution by keeping important items from being invalidated. Additionally, SS table index caching significantly reduces I.O. access time when fetching data from disk.

Starting point is 00:07:24 Per shard concurrency limits and rate limiters, Skyla DB includes per shard concurrency limits and rate limiters per partition to protect against unexpected spikes. Whether dealing with a misbehaving client or a flood of requests to a specific. key, Silidi-B ensures resilience where Cassandra often falls short. DynamoD-B compatibility Sili-B also offers a Dynamo-D-B-compatible layer, further distancing itself from its Apache Cassandra origins. This lets teams run their DynamoD-B workloads on any cloud or on-prem, without code changes, and with 50% lower cost. This has helped quite a few teams consolidate multiple workloads on SilidiB. What's next? At the recent Monster Scale Summit, CEO, co-founding door lore shared a peek at what's next facility be. A few highlights, ready now. See this blog

Starting point is 00:08:13 post and product page for details. The ability to safely run at 90% storage utilization. Support for clusters with mixed instance type nodes, dynamic provisioning and flex credit, vector search, short term, strongly consistent tables, fault injection service, transparent repairs, object and tiered storage, raft for strongly consistent tables, longer term multi-key transactions, analytics and transformations with UDFs, automated large partition balancing, immutable infrastructure for greater stability and reliability, a replication mode for more flexible and efficient infrastructure changes. For details, watch the complete talk here. To close, Cillotiby is faster than Cassandra. I'll share my latest benchmark results here soon.

Starting point is 00:08:59 But both Cillotiby and Cassandra have evolved to the point that Cillityby is no longer, just a faster Cassandra. We've evolved beyond Cicastra. We've evolved beyond Cicester. Cassandra, if your project needs more predictable performance and or could benefit from the elasticity, efficiency, and simplicity optimizations we've been focusing on for years now, you might also want to consider evolving beyond Cassandra. To learn more about Ciladibi, visit HTTPS colon slash www. Silladibi.com. You can access free database books, masterclasses, and more at HTTPS colon slash resources. Cilidibi.com. Thank you for listening to this Hackernoon story, read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - ScyllaDB vs Apache Cassandra: A Decade of Evolution, Performance Gains, and New Capabilities

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.