The Good Tech Companies - Real-Time Write Heavy Database Workloads: Considerations & Tips
Episode Date: December 16, 2025This story was originally published on HackerNoon at: https://hackernoon.com/real-time-write-heavy-database-workloads-considerations-and-tips. Key architectural and tuni...ng strategies for real-time write-heavy databases, covering storage engines, compaction, batching, and latency trade-offs. Check more stories related to cloud at: https://hackernoon.com/c/cloud. You can also check exclusive content about #real-time-databases, #nosql-database-tuning, #write-heavy-workloads, #lsm-tree-architecture, #scylladb-performance, #high-throughput-data, #low-latency-distribution, #good-company, and more. This story was written by: @scylladb. Learn more about this writer by checking @scylladb's about page, and for more stories, please visit hackernoon.com. Real-time, write-heavy database workloads present a unique set of performance challenges that differ significantly from read-heavy systems. These workloads are characterized by extremely high ingestion rates (often exceeding 50,000 operations per second), a greater volume of writes than reads, and strict latency requirements—frequently demanding single-digit millisecond P99 performance. Such conditions are common in modern systems like IoT platforms, online gaming engines, logging and monitoring pipelines, e-commerce platforms, ad tech bidding systems, and real-time financial exchanges.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Real-time write-heavy database workloads, considerations and tips, by Skyladyby.
Let's look at the performance-related complexities that teams commonly face with write-heavy workloads
and discuss your options for tackling them right-heavy database workloads bring a distinctly different
set of challenges than read-heavy ones.
For example, scaling rights can be costly, especially if you pay per operation and rights are
5x more costly than reads.
Locking can add delays and reduce throughput.
I-O bottlenecks can lead to right amplification and complicate crash recovery.
Database back pressure can throttle the incoming load.
While cost matters, quite a lot, in many cases, it's not a topic we want to cover here.
Rather, let's focus on the performance-related complexities that teams commonly face and discuss your options for tackling them.
What do we mean by a real-time-right heavy workload?
First, let's clarify what we mean by a real-time-right-heavy workload.
right heavy workload. We're talking about workloads that ingest a large amount of data,
e.g, over 50k ops, involve more rights than reads, are bound by strict latency SLAs,
e.g, single digit millisecond P99 latency. In the wild, they occur across everything from
online gaming to real-time stock exchanges. A few specific examples. Internet of things,
IoT. Workloads tend to involve small but frequent append only rights of time series data.
Here, the ingestion rate is primarily determined by the number of endpoints collecting data.
Think of smart home sensors or industrial monitoring equipment constantly sending data streams
to be processed and stored. Logging and monitoring systems also deal with frequent data
ingestion, but they don't have a fixed ingestion rate. They may not necessarily append only,
as well as may be prone to hotspots, such as when one endpoint misbehaves.
Online gaming platforms need to process real-time user interactions, including game state changes,
player actions, and messaging. The workload tends to be spiky, with sudden surges in activity.
They're extremely latency-sensitive since even small delays can impact the gaming experience.
E-commerce and retail workloads are typically update-heavy and often involve batch processing.
These systems must maintain accurate inventory levels, process customer reviews, track order status, and manage shopping cart operations, which usually require reading existing data before making updates.
Ad tech and real-time bidding systems require split-second decisions. These systems handle complex bid processing, including impression tracking and auction results, while simultaneously monitoring user interactions such as clicks and conversions.
They must also detect fraud in real-time and manage sophisticated.
audience segmentation for targeted advertising. Real-time stock exchange systems must support high-frequency
trading operations, constant stock price updates, and complex order matching processes, all while
maintaining absolute data consistency and minimal latency. Next, let's look at key architectural and
configuration considerations that impact right performance. Storage engine architecture. The choice
of storage engine architecture fundamentally impacts right performance in databases. Two primary
approaches exist LSM trees and B-trees. Databases known to handle rights efficiently, such as
Skylidibe, Apache Cassandra, H-base, and Google Bigtable, use log-structured merge trees, LSM.
This architecture is ideal for handling large volumes of rights. Since rights are immediately
appended to memory, this allows for very fast initial storage. Once the Memtable in memory fills
up, the recent rights are flushed to disk insorted order. That reduces
is the need for random I.O. For example, here's what the SkyladyB right path looks like. With
B-tree structures, each right operation requires locating and modifying anode in the tree. And that
involves both sequential and random I.O. As the data set grows, the tree can require additional
nodes in rebalancing, leading to more disk IO, which can impact performance. B-trees are generally
better suited for workloads involving joins and ad hoc queries. Payload size. Payload size also impacts
performance. With small payloads, throughput is good but CPU processing is the primary bottleneck.
As the payload size increases, you get lower overall throughput and disk utilization also increases.
Ultimately, a small right usually fits in all the buffers and everything can be processed
quite quickly. That's why it's easy to get high throughput. For larger payload ads,
you need to allocate larger buffers or multiple buffers. The larger payloads, the more resources,
network and disk, are required to service those payloads. Compression. Disc utilization is something
to watch closely with a right heavy workload. Although storage is continuously becoming cheaper,
it's still not free. Compression can help keep things in check, so choose your compression strategy
wisely. Faster compression speeds are important for right heavy workloads, would also consider
your available CPU and memory resources. Be sure to look at the compression chunk size parameter.
Compression basically splits your data into smaller blocks or chunks, and then compresses each block separately.
When tuning this setting, realize that larger chunks are better foreeds while smaller ones are better
for writes and take your payload size into consideration. Compaction. For LSM-based databases,
the Compaction strategy you select also influences right performance. Compaction involves merging
multiple SS tables into fewer, more organized files, to optimize read performance, reclaim
disk space, reduce data fragmentation, and maintain overall system efficiency. When
selecting compaction strategies, you could aim for low read amplification, which makes reads as
efficient as possible. Or, you could aim for low right amplification by avoiding
compaction from being too aggressive. Or, you could prioritize low space amplification and have
compaction purge data as efficiently as possible. For example, Skylitby offers several
Compaction Strategies, and Cassandra offers similar ones, size-tiered Compaction Strategy, STCS.
Triggered when the system has enough for by default, similarly sized SS tables.
Leveled Compaction Strategy, LCS, the system uses small, fixed size, by default 160 megabytes
S-S tables distributed across different levels.
Incremental Compaction Strategy, ICS, shares the same read and write amplification factors as
STCS, but it fixes its 2x temporary space amplification issue by breaking huge S-stables into
SS table runs, which are comprised of a sorted set of smaller, 1 gigabyte by default, non-overlapping
SS tables. Time window compaction strategy, TWCS, designed for time series data. For write heavy workloads,
we warn users to avoid leveled compaction at all costs. That strategy is designed for read heavy use
cases. Using it can result in a regrettable 40x right amplification. Batching, in databases like
SkyladyB and Cassandra, batching can actually be a bit of ATRAP, especially for right heavy workloads.
If you're used to relational databases, batching might seem like a good option for handling a high
volume of rights, but it can actually slow things down if it's not done carefully. Mainly,
that's because large or unstructured batches end up creating a lot of coordination and network
overhead between nodes. However, that's really not what you want in a distributed database like
Skylidi. Here's how to think about batching when you're dealing with heavy rights. Batch by the
partition key. Group your rights by the partition key so the batch goes to a coordinator node that
also owns the data. That way, the coordinator doesn't have to reach out to other nodes for extra
data. Instead, it just handles its own, which cuts down on unnecessary network traffic. Keep batches
small and targeted, breaking up large batches into smaller ones by partition keeps things efficient.
It avoids overloading the network and lets each node work on only the data it owns. You still get
the benefits of batching, but without the overhead that can bog things down. Stick to unlogged
batches. Considering you follow the earlier points, it's best to use unlogged batches. Logged batches add
extra consistency checks, which can really slow down the right. So, if you're in a right-heavy
situation, structure your batches carefully to avoid the delays that big cross-node batches
can introduce. Rapping up, we offered quite a few warnings, but don't worry. It was easy to compile
a list of lessons learned because so many teams are extremely successful working with the L-Time
right heavy workloads. Now you know many of their secrets, without having to experience their
mistakes. Slightly smiling face if you want to learn more, here are some first-hand
perspectives from teams who tackled quite interesting right heavy challenges. Zillow.
Consuming records from multiple data producers, which resulted in out-of-order rights that could
result in incorrect updates. Traction. Preparing for 10x growth in high-frequency data rights from
IoT devices. Fanatics. Heavy right operations like handling orders, shopping carts, and product
updates for this online sports retailer. Also, take a look at the following video, where we go into
even greater depth on these right heavy challenges and also walk you through what these workloads
look like on Skylidibi. Thank you for listening to this Hackernoon story, read by artificial intelligence.
Visit hackernoon.com to read, write, learn and publish.
