Software Huddle - Building Kafka without Disks with Richie Artoul and Ryan Worl from WarpStream Labs

Episode Date: January 16, 2024

In this episode, We spoke with the founders of WarpStream Labs, Richard Artoul and Ryan Worl. WarpStream is a fascinating rethink of Kafka -- how could you simplify and improve the Kafka design by sli...ghtly tweaking your constraints? The result is very compelling -- a Kafka-compatible API that bypasses local disk by writing everything directly to S3. For the tradeoff of a slightly higher end-to-end latency, you can get a Kafka cluster that's much cheaper and way easier to operate. Richie and Ryan have been working on high-scale data systems for years and were the engineers behind Husky, Datadog's custom-built database for logs and metrics. In this episode, they walk us through their experience building WarpStream. They touch on all the hard parts of building your own system (including why it's gotten easier!), as well as some of the difficult problems they had to solve for full compatibility with existing Kafka client libraries. They also touch on using FoundationDB, their thoughts on S3 Express One Zone, and whether AWS's cross-AZ network costs are a scam. Lots of interesting thoughts here from a really sharp team.

Transcript
Discussion (0)
Starting point is 00:00:00 It's just crazy what you can do today. Our team is very small. Our company does not have that many people. But if you lean on good primitives like AWS and their network load balancers and their object storage and some of their data stores, you can build mature, battle-tested infrastructure
Starting point is 00:00:20 that would have required hundreds of people to build just 10 years ago. What's your billing model? How will people use WarpStream? Is it fully hosted service? Is it run-your-own? What's it look like there? WarpStream is designed with a data plane control plane split. The idea with the BYOC product is the data plane runs in your cloud account and the control plane runs in ours. There's essentially no networking fees between the two cloud accounts except for metadata,
Starting point is 00:00:47 which is super small, on the order of tens to hundreds of kilobytes per second. And also you get all these nice privacy and security benefits because the data literally doesn't leave your cloud account. We call that BYOC. We're still kind of going back and forth on the pricing for that.
Starting point is 00:01:04 I think most likely what it's going to end up being is just usage-based pricing on two dimensions. Would it make sense for a cloud provider to offer a FoundationDB primitive, like a managed version, or does it sort of not work because it's so low-level or something like that? I would love if someone did. That would be cool. It's definitely a big engineering challenge.
Starting point is 00:01:42 And I imagine anybody that gets close to that, they would probably want to start from scratch and just build a new system that had the same semantics instead of using FoundationDB directly. Hey folks, this is Alex DeBrie. This is one of my favorite episodes. I just spoke with the founders of Warpstream Labs, Richie Artul, Ryan Wuerl. They're super smart guys, I love what they're doing. Basically what they're doing is taking Kafka and existing technology and saying, how could you make it better if you change
Starting point is 00:01:58 sort of one constraint or use a different piece of technology there? I think that's just a super fascinating space to play in. So we talked about their approach, what are the implications and takeaways of it, lots of practical stuff if you're using them or thinking about using them. But also just a lot of foundational conceptual stuff around build-first-buy when you're building a data storage system, or FoundationDB, or we even talked about cross-AZ network costs in AWS, and is that a record or not? So I love this episode. I thought it was a lot of fun. Check it out. As always, if you have any
Starting point is 00:02:28 comments on the show or anything, reach out to me, reach out to Sean Faulkner and let us know what you'd like to see, who you'd like to see anything like that. And with that, let's get to the show. Richie, Ryan, welcome to the show. Thanks for having us, man. Thanks. Yeah, absolutely. So you two are the co-founders of WarpStream Labs. Super excited to talk about this because I think it's really fascinating and some of the writing and stuff you've done around Kafka and what you're doing at WarpStream is very cool. For those of the audience that doesn't know you,
Starting point is 00:02:56 can you give maybe just a little bit of your background, maybe starting with Richie? Yeah, sure. I'm Richie. I'm one of the co-founders of WarpStream. Before I founded WarpStream with Ryan, we both worked together at Datadog on basically a columnar database for observability data called Husky.
Starting point is 00:03:16 And there's some good blog posts about that out there that you can go read. Yeah, my story is pretty similar to richie um for the last five years ish we've been working together um either at datadog or uh on on warp stream um yeah we're really excited to talk to you awesome yeah husky's a really cool thing i'll share some some links in the show notes like blog post also a really good video about Husky and how it works. So cool stuff there. But let's move on and talk about WarpStream, which I think is a really cool sort of interesting entrant into like the Kafka space. So maybe just give me like the quick rundown of what WarpStream is. Yeah, Workstream is a Kafka protocol compatible data streaming system that stores data directly in object storage, and object storage is the only storage for data in
Starting point is 00:04:16 Workstream. The main difference between Workstream and a system like open source Apache Kafka is that the, because the primary storage location for the data is in S3, there's no like tiered storage or worrying about your brokers running out of disk or anything like that. You just get to use the infinitely scalable and extremely reliable storage that is provided by all the hyperscaler cloud providers today. Awesome. I love it. So I think you're in this category that I really like
Starting point is 00:04:52 right now where it's like, hey, there's an existing tech or protocol or something like that. And something has changed, which might be like maybe some piece of technology has changed and really improved and changed how we build things. or maybe like a constraint or requirement is not true for like a large segment of of the customer base and that can really change how we how we how we you know architect things and design things and so like as i look at warp stream it's sort of like you know kafka comprise all these things including like you know end-to-end latency on events of maybe some 100 milliseconds sub 200 milliseconds and it turns out that like a lot of people don't need quite that low of latency. That's nice, it's cool and all that stuff, but if you
Starting point is 00:05:30 expand that range by an order of magnitude, most people are still fine. And that just opens up a lot of capabilities, including using object storage. Is that the right way to think about WarpStream? Yeah, I think that's a really good way to think about it. The numbers we usually tell people on the produce side is between 400 to 600 milliseconds at the P99. That's between your client deciding to send a batch of data
Starting point is 00:05:58 and having it acknowledged as fully durable. There's some knobs you can tune there if you're willing to spend more money. Then roughly two seconds, second and a half to two seconds, depending on how you tune it, P99 from the consumer decided to write some data, or sorry, the producer decided to write some data, and the consumer received it. And it turns out for a huge majority of use cases, that's totally fine. That's still
Starting point is 00:06:23 very real time. And so that's totally fine. That's still very real time. That's part of the observation. The other thing I've noticed too is that we'll have a lot of people come talk to us and they're like, I have this one use case, it's super latency critical so I can't use you there. But I have this other use case, it's my telemetry data, my log data, whatever.
Starting point is 00:06:45 That's not so latency sensitive. And so I could maybe use you there. And then we talked to them a little bit more. And when they realized that the end-to-end latency is only two seconds, they're like, ah, that other latency sensitive use case. I like that it's low latency, but am I willing to pay 10 times more for it to be low latency? So even a lot of the low latency use cases,
Starting point is 00:07:13 we've found that people are willing to be more flexible once they see the benefits of not having to manage brokers and not paying for interzone networking and all that type of stuff. But yeah, that was definitely one of the core observations we had. And a lot of that came out of, you know, I've been working in the observability space for like eight years. And so we're kind of coming at the data in Kafka space from, I think, a different background and a different angle. And I think that's helped a little bit too. And now with the announcement of S3 Express One Zone, we also have the option to run on even lower latency storage.
Starting point is 00:07:49 So I think regardless of what the latency characteristics are at work stream today on S3 standard, we'll be making a new release at some point early next year that supports S3 Express One Zone that will make those latency characteristics look a lot more like a traditional open source cluster or one of the other providers. I think that'll get really interesting too because you'll get a knob basically that's like I care about latency and I'm willing to pay more money or I don't, but you won't have to change the architecture. And I think that's a
Starting point is 00:08:26 really neat thing, basically. To not have to have completely two different solutions to hit different parts of the price performance curve. It's just like a config flag, right? Right to this bucket or right to that bucket or something like that. Yeah, amazing. I think you all have
Starting point is 00:08:41 done, you wrote a blog post right after Express 1.7 came out and think it was like the most astute one I think a lot of people sort of misunderstood what what one zone was for and who's gonna be using it, you know types I think I think it's more for you know, infrastructure providers data infrastructure providers like yourself using it for interesting things and I thought that that Yeah, y'all noticing that it just makes it so much easier to have a higher latency but lower cost system, or a lower latency and higher cost system very easily without having to have a whole different storage subsystem,
Starting point is 00:09:13 basically. I think that was really interesting and astute. I want to talk a little bit more about just how WarpStream, like what's happening with WarpStream. Because when I first heard no local disks, it's all object storage, I'm like, are they just writing every single segment to S3? You're just going to have tons of files all over the place.
Starting point is 00:09:33 It's going to be super slow. So just without getting too deep into it, what's happening on one of these WarpStream broker agent type things when some request comes in? How do you do that without any local disks and just going straight to object storage? Yeah, so the basic idea is that your Kafka client has settings that allow it to batch records that are being sent for all the topic partitions that you're writing
Starting point is 00:10:04 to. So that's step one. Your producer client library already does a little bit of batching for you. When a batch of data is sent from the client to one of the agents, as we call them, which is like a stateless broker, basically, the agent also buffers records in memory from batches that were sent from multiple producer clients. And those records are batched together. And every 250 milliseconds by default, we write a new file to object storage. Once the file has been written to object storage, we send the metadata for that file back to our control plane. And the control plane does the ordering
Starting point is 00:10:46 to decide which records are assigned which offsets in the Kafka semantics of ordered data per partition. Offsets are monotonically increasing, all that stuff. So we fan in records and batches of data sent from multiple clients on the agent, and then the agent writes files to object storage, and we assign the offsets inside the metadata store. Once that happens, you're correct that there will be a lot of files in S3 if that was all we did. So we also perform background compaction, where the agent will download the small files, merge them into bigger files, and replace them in the control lane. And that happens in the background. It's not in the critical path of the requests, but it's performed by the agents.
Starting point is 00:11:42 On the consume side, the small files creates its own problem. We wrote a blog post about our distributed file cache. But the basic idea is there is that the agents form a consistent hash ring, more or less, within each availability zone. So your client can request to read data for a specific topic partition, and that request is routed by a round-round into some agent. And then the agent will use the distributed file cache to read from all of those tiny little files.
Starting point is 00:12:16 Ideally, just, you know, it's read once from S3, pulled into the distributed cache, and then all of the reads for that topic partition and all of the other topic partitions in that same file because they're all merged together. There's no relationship between the files and which topic partitions they contain. The mapping is all handled by the metadata store. So yeah, that's basically the full end-to-end flow. We fan in records with the brokers,
Starting point is 00:12:43 we do background compaction, and then on the read side we have a distributed cache to make sure the gets are efficient. I think the architecture he's describing too, especially the combining multiple topic partitions into one file, so each agent is basically making one file every 250 milliseconds. I think a lot of the disappointment around S3 Express OneZone was a lot of the disappointment around S3 Express One Zone was like, a lot of people wanted it to be like EBS+++, you know? And it's not what it is. It's like, you still, you know, I think there's a lot of people kind of still stuck
Starting point is 00:13:18 in the category of lift and shifting existing systems into the cloud and then trying to bolt S3 at the end of a traditional data storage system. And if that's what you're doing, S3 Express won't help you at all. But if you've redesigned the entire storage engine from the ground up around object storage, then the S3 Express one-zone pricing actually makes a lot of sense. Because it's like, if you just try and flush to disk
Starting point is 00:13:52 every couple of milliseconds like you do with EBS with S3 Express one zone, it's still going to be extremely expensive and it won't work. But if you have a storage system like WarpStreams where everything has been designed around minimizing the number of API operations that you're performing, and you can just pay more money to make those API requests fast, you end up with something that's actually useful. Very cool.
Starting point is 00:14:17 And just so I understand it, we talked about loosening the latency requirements on that stuff. And as I see it, there are two main benefits of this. Number one is just cost, right? You're not doing any inter-AZ cost, which like Kafka is just sort of notorious for paying. Just like the sort of communication across brokers for replication just being prohibitively expensive for these large workloads. So cost number one. But then just also like ease of operation, right? Because you don't have these more stateful brokers, they're stateless.
Starting point is 00:14:48 So around that, every single agent can handle every partition of every topic in your cluster. And then so is throughput just essentially a function of how many of these agents I'm running and I can just scale them up and down with my traffic pretty easily? Yeah, that's exactly how one of our customers runs it in production today. They deploy the agents in Kubernetes and use the built-in auto-scaling functionality to add and remove agents just based on CPU usage. How quickly can I spin up a new node and have it start actually taking traffic from producers or consumers? It's basically as fast as you can launch a Docker container. There's some nuance in the Kafka protocol, which is that there's a service discovery mechanism
Starting point is 00:15:41 in the Kafka protocol itself. So basically, your clients are constantly, not constantly, it depends on how you tune them, but querying Kafka through the protocol. In this case, it's WarpStream asking what brokers exist and what partitions are assigned to each of those brokers. And I think this is one of the interesting things about WarpStream too, is that in a traditional Kafka cluster, when you hit that metadata endpoint,
Starting point is 00:16:07 what it's going to tell you is, okay, there's a thousand partitions, and roughly, let's say you have six brokers. One-sixth of those partitions, the leader for that partition is this broker. When you query for metadata in a warp stream cluster, let's say you have six agents, it'll tell you there's six agents, but it'll pick one of them.
Starting point is 00:16:28 And it'll be like, that agent is the leader for all of these partitions. And so all the traffic from one client will be routed essentially to a single agent. And when we make that selection, the control plane does that, and we can do essentially round-robin load balancing. And so you end up almost with a kind of HTTP round-robin load balancing strategy, but within the Kafka protocol. Now those connections are sticky, right, because of the way the Kafka protocol works. And so the way that you can control how fast you can react to changes in the
Starting point is 00:16:59 kind of quote-unquote cluster topology is just tuning your client to how often it requests a new view of the cluster. So you can tune that as low as a couple seconds. Most clients default to five minutes because most clusters are fairly static. But you can tune that really low and then they'll react really quickly. That's amazing. I do want to talk about the Kafka protocol stuff because I think there's some interesting stuff there. But just on the scaling up and down type stuff, I'm not that familiar with how people are running super high-scale Kafka clusters right now, but are those scaling up and down frequently, or are those pretty static?
Starting point is 00:17:35 You're provisioning for peak, and you're paying for it, and you scale it up one time a year or something like that if you need to. The number of organizations with the Kafka maturity and tooling that they can even scale a Kafka cluster down is very small. Most companies have figured out how to scale up a Kafka cluster, do basic node replacements. I meet a huge number of companies who just don't know how to do upgrades. It's not their fault, too. It's just really hard. And then when you do an upgrade, usually they want to replace the broker
Starting point is 00:18:07 and there's a bunch of data being transferred. A lot of these companies, because it's so annoying to run a traditional open source Kafka cluster, they just have one because they're like, I can't deal with more than one. And so it just keeps getting bigger and bigger and bigger. And the bigger it gets, the more scary it gets to touch it. So I would say the number of people
Starting point is 00:18:29 who are actively downscaling their Kafka clusters is very, very, very small. And so the fact that we can do the automatic scaling with WarpStream is a really nice benefit. You can spend years with people, a team, building tooling, and still get to an okay place, but it takes a long time. That's amazing. Okay, so talking about the Kafka protocol one, because I thought that was a really interesting post you had, where you had to respond with this metadata request, but tune it for the different topology of just like how your thing works. So as I understand it, like in that response changing, there are two big
Starting point is 00:19:10 benefits you are looking for. Number one is reducing cross AZ traffic again, right? You want to have your clients connect to agents in the same AZ. So again, you're not paying that cross AZ, but then also directing all the rights from a producer to a single agent rather than being like, hey, you know all the partitions, here's their leaders. And now if I have 128 partitions, it's making requests to all those different brokers as part of that. That second one's really interesting. How much does that make the producer that much more efficient? Is that making a meaningful difference on the producer, or was that pretty negligible anyway that they had to make all those separate requests?
Starting point is 00:19:49 Yeah, the big benefit there is you actually get, well, it's tricky because of how the Kafka protocol works under the hood, because it's separate batches for each partition still, right? Yeah, so you don't get a compression benefit. The compression happens at the unit of a batch that is destined for an individual topic partition. But the reason why it's helpful for us is that it makes sure that your client is not exposed to the outlier latency of every node.
Starting point is 00:20:23 It's only exposed to the latency of one node at a time. So ideally, the upper percentiles of the latency will look a little bit better. It's a lot better for large clusters. Yeah, it depends on your partitioning strategy. Like if you're using a partitioning strategy that writes to all of the brokers all the time because you're using key-based routing, then it really is bad if you have to write to all the brokers. If you use one of the other partitioning strategies that just writes to a few at a time,
Starting point is 00:20:54 it's a bit better. You also get some additional benefits too, though. A common scaling limitation that people run into with really large Kafka clusters, and it doesn't even have to be a high throughput Kafka cluster, it's just a cluster that has a lot of clients, is that if you have a lot of clients that are writing and reading to a lot of partitions, you end up with this huge mesh of connections. Because if you have 1,000 connections,
Starting point is 00:21:23 if you have 1,000 clients and 50 brokers, it's M times N and you end up with this huge mesh of connections. Because if you have like a thousand connections, if you have like a thousand clients and like 50 brokers, it's M times N and you end up with this huge number of... And that just starts... There's a lot of time spent in the kernel for TCP keep-alives and you have to maintain buffers for each connection on both sides and all that kind of stuff. With the approach we take with WarpStream, or at least the default mode,
Starting point is 00:21:43 is you essentially end up with each client results in roughly one additional connection to the cluster. And so it scales a lot more like an Nginx proxy or something than it does like a traditional Kafka cluster. You can actually, we do support a mode if you really know what you're doing, where you can make the topology look like traditional Kafka,
Starting point is 00:22:08 where we assign partitions evenly to each of the agents. And that can be useful in scenarios where you have producers that tend to not write to all the partitions. For example, if you're building a sharded database and you're using WarpStream as essentially the communication layer between clients and consumers and stuff, then what you can do is, if you do that way,
Starting point is 00:22:34 then you can get the benefits of improved data locality by having all the data for a given partition being buffered in a single agent without being exposed to the outlier latency of all the agents. You can essentially flip a switch and change the topology and you can actually do it per client, which is the really weird thing. You can have one client that has a view of the cluster where one agent is the leader for all partitions and you can have another client that has a view of the cluster where the partitions
Starting point is 00:23:03 are evenly distributed and that doesn't cause any correctness problems. And then you can even take it another layer further, which we haven't built this yet, but we're planning on adding it soon, is this idea of server-side partitioning where you can have a producer that sees a topic as only having one partition and which partition records go into
Starting point is 00:23:24 is decided by the agent at commit time, basically. But the consumers can see all of the partitions. You get essentially perfectly optimal batching and compression on the right side, but the consumers still get all the benefits of concurrency
Starting point is 00:23:41 if having many partitions. You can do that within, each client can make that decision for themselves if they want. Very cool. Okay, and yeah, as I was reading that protocol article, it makes sense on why you want to fit with the Kafka protocol. It makes it super easy for migrations
Starting point is 00:23:56 for existing people. But it's a lot of background work. And I was wondering, would there be any benefit on the producer or the consumer side if you had a client that knew about your topology or just knew about WarpStream and how it worked? Other than getting rid of a bunch of server-side code, would that actually help? And it sounds like it would because
Starting point is 00:24:16 it could know just to, hey, send it all in one batch, compress it all together, don't split it up into different topics and partitions and things like that, even though they're all going to the same broker. You get some benefits of that. Have you thought about having your own client for that? Or would this server-side partitioning do that for you? I think it would mostly do that for you to use the server-side partitioning feature that we're planning on building soon. Overall, we'll probably just add an HTTP interface because it's easy.
Starting point is 00:24:50 But yeah, right now we're going to stick with the Kafka protocol just because so many different things integrate with it. It's also just a lot of work to build. You can spend a lot of time on building a good protocol and getting good bindings in every language and convincing people to rewrite their applications. I think we've probably run into more issues with the actual implementations of the Kafka clients
Starting point is 00:25:17 than we do with the protocol itself, if that makes sense. I could see us spending a lot more time and energy investing in improving existing clients but I think we can get pretty far with the existing protocol and on that note, just for others go read that Hocking the Protocol post I thought that was interesting at the end on some clients, how they treat host names like case sensitive but DNS does not
Starting point is 00:25:44 and just how you use that to your benefit to set it up. I thought that was super cool. It was not hard to debug, but it was. It's amazing how you found that and then figured out how to use it to your advantage. I thought that was just really cool. One thing I want to ask, too, is just like, so how does Warp Stream Setup change advice around either how many topics I should have or how many partitions I should have?
Starting point is 00:26:08 Should I have fewer partitions and basically just saturate a node for a partition? How should I think about the number of partitions for a topic? Does that change as compared to regular Kafka? I think the main difference probably with traditional Kafka is that there's effectively no write throughput limit on a given partition. You can make a warp stream topic with a single partition and write a gigabyte per second into it.
Starting point is 00:26:41 The problem you'll have is then trying to parallelize consuming that partition on the other side. You're just never trying to paralyze consuming that partition on the other side. You're just never going to have a consumer that can consume a gigabyte a second unless it's doing literally nothing. And so I think the way we usually explain it to people is just the number of partitions you need if you're using consumer groups and not managing.
Starting point is 00:27:01 Because you can write a consumer that looks at all the offsets that are available in a single partition and chunks it up and divides it out to a bunch of workers. But if you're just using a standard consumer group where you get assigned a partition and you read it in order, partition selection is essentially a function of how fast are your consumers and how many consumers do you eventually want to be able to scale up to. The system is still more efficient if you have less partitions because you get better batching, there's less RPCs in the system, there's less syscalls and I.O. and that sort of thing. But really I think it's more a function of
Starting point is 00:27:39 how fast your consumers consume. We've done some discussion too about having, different people call this different things, but topics where you don't have to think about how many partitions there are. There's a couple of different approaches you can have to that. Kinesis can split partitions, for example.
Starting point is 00:28:10 So you can imagine doing that automatically for people because then you get data with the same key always goes to the same partition. Or if you have people who don't care about the number of partitions, they don't care necessarily about which data goes to which partition, they just care about parallelism. You can do something like you can automatically modify the number of partitions based on the observed throughput and increase it and decrease it. I think there's still a lot of room for innovation there, but we haven't gotten into it yet.
Starting point is 00:28:39 But the important thing is that the traditional concerns of, I need to make enough partitions that aren't too many for my brokers to handle, but also I have enough to parallelize my consumer application. Or if my retention is super long, I have to have a lot of partitions so that the unit of movement of data between brokers when a broker fails isn't too big. That overwhelms the network. Those concerns are not as important anymore it's mostly just about making sure that your consumer application can can keep up yeah yep what about um number of topics per per cluster now because i think like you were saying like a lot of people just end up with this giant kafka cluster with all these brokers because it just like maintenance wise that up easier. But now if it's easier to scale up and down my cluster of agents here, does it make sense to have
Starting point is 00:29:31 fewer topics per agent? If I have a new topic, just create a new cluster and scale that independently. We're pretty big fans of cellular architectures. I think that's a really good strategy for people. So we've been trying to think about how we can make this easier for people and encourage people to do it more. But if you're a large company, it's so much nicer to have,
Starting point is 00:30:03 let's say you're a company with a couple hundred or maybe more like a couple thousand people. Your overall Kafka cluster traffic is one to two gigabytes per second. I mean, that's not a crazy huge cluster, but it's really nice to be able to say, in my opinion, okay, there's some data science topics over here and there's some extremely business critical topics over here. And this is the cluster that no one cares about and we just dump our logs into it and we want it to be as cost effective as possible. Because what I see happen a lot is when these clusters get too big and there's only one of them and you have a mixture of very important workloads and low value workloads
Starting point is 00:30:40 in the same cluster, eventually the people responsible for the cluster start saying, no, you can't use it, you can't put this in there, give me exact numbers on your throughput, predict the future, and all these kinds of, you end up with a lot of, and not because they want to stop you from doing stuff, they're just like, if this falls over, the whole business is done. And so I definitely think that limiting your blast radius to specific size logical clusters
Starting point is 00:31:09 so you can be a lot more comfortable basically making changes, doing upgrades, experimenting. I think that's definitely something we'd like to help people with. I do think it is a lot easier once it's literally like, okay, they can all share a massive S3 bucket, but all the compute is fully isolated and they can be upgraded independently
Starting point is 00:31:27 and you can do rolling restarts and all that kind of stuff. You can even run them on different hardware. Even within a cluster, what you can do with WarpStream is you can split the roles. By default, all the nodes run the same role, but if you're a slightly more advanced user, what you can do is be like, okay, these agents handle writes,
Starting point is 00:31:49 and these agents handle reads, and these other agents handle background jobs. Then you can split that onto different hardware groups. You can upgrade them independently. You can scale them independently. You can even get isolation within a cluster for different activities. You could deploy two different sets of agents for reads and give the data science team for their crazy Spark job
Starting point is 00:32:13 that has to read the whole world every hour, you could give them their own dedicated deployment of reading agents and you could have production run on a different deployment of read agents so that you get the blast radius isolation there but still share the same underlying data center. We have one person who I think they're going to do this next month. They have this kind of crazy use case that I think is really cool, which is they want to have a single logical Kafka cluster
Starting point is 00:32:39 that spans multiple cloud accounts. And if you think about what you'd have to do to achieve that with open source Kafka, with peering VPCs and punching holes through the whatever, and they asked us how they could do it, and we're basically like, just figure out how to share the S3 bucket across all three cloud accounts,
Starting point is 00:33:00 and then everything else will just work. It should just work, yeah. Yeah, so I think that's also a really cool thing you can do. Wow, that's super cool. Okay, I want to do a few just closing questions in this area on the things that have changed or that prompted this rethink and what you think about it. So you all are big fans of object storage
Starting point is 00:33:19 and just how it's changing data systems. Is this mostly an analytics revolution, or is this going to slip into transactional workloads as well? I know we see Neon out there experimenting with this. Are we going to see object storage become a huge part of transactional workloads, or is it at least not quite as latency-sensitive stuff? The way that I like to think about object storage is it's just an infinitely large array of hard drives.
Starting point is 00:33:50 That's the way that you can think about it from a performance perspective. And large arrays of hard drives never had particularly great performance characteristics, especially if you ran workloads on them that had lots of IOPs. They needed lots of IOPs, like they needed lots of IOPs. So we've always had this kind of slow storage, like back when you were running things in data centers, this kind of slow storage has always existed. And it's always been at every part of the hierarchy, from analytics workloads to operational workloads,
Starting point is 00:34:22 it was always there. I think with the advent of SSDs, people have kind of forgotten how to make use of slow storage. But now that object storage is so important from a cost perspective, people are going to have to learn how to do it again. And it's going to have to become
Starting point is 00:34:40 a part of operational workloads. Obviously, you have the storage hierarchy. So you have memory, you have disk of a part of operational workloads. Obviously, you have the storage hierarchy. So you have memory, you have disk of a bunch of different kinds, and then now you have object storage that's even slower than an array of disks, but it's still in that same ballpark.
Starting point is 00:34:55 I think it's going to naturally end up just through people redesigning storage systems, it's just going to end up at the bottom of the storage hierarchy for everything, not just analytics workloads. Because even within operational workloads, there are hot and cold sets of data, and you need access to it at some point in theory. And you can't just throw it all into some analytics system that makes it unusable most of the time.
Starting point is 00:35:28 It still has to be available, either just because a user might keep clicking that back or the forward button until you get to page 50 on your application. It still needs to be there. They'll probably be okay if it took 50 milliseconds to read instead of 500 microseconds. Yep. Yep.
Starting point is 00:35:48 Very cool. Okay. Another question I have. So a big benefit of WarpStream is reducing or removing cross-AZ networking costs. I've been trying to get to the bottom of this. Do you think cross-AZ networking costs,
Starting point is 00:36:02 is that just like purely a racket? Does it reflect at least some sort of reality? What are your views on cross-AZ networking costs? Do you have any? I think the sticker prices that are charged are probably a bit of a racket. The discounts I've seen be given to organizations that commit. I've met a number of people who have 90% plus discounts on those fees. I have never met someone who has a 90% discount
Starting point is 00:36:38 on S3 or EC2. So the margin is just obscene there. I do think, though, to a certain degree, I would love to spend some time with someone who actually designs data centers. I found some books on Amazon that I haven't read. I suspect that there's probably some... It's not too hard for Amazon or GCP to add additional links between their availability zone buildings, even if they're 30 miles apart. But it costs something and it's preferable if they can encourage people to design their applications in a nice way.
Starting point is 00:37:14 Even though S3 is multi-zone, it's not triply replicating. So they're way better at ensuring the data is available in multiple zones than your triply replicated Kafka cluster is. And that traffic is probably a lot more predictable. So I bet some of the pricing is designed to encourage good usage of the available resources, but the sticker prices are
Starting point is 00:37:42 crazy high. I think it just reflects the fact that it's presented But the sticker prices are crazy high. I don't know. I think it just reflects the fact that it's presented to the user as a limitless resource, whereas it's actually not. There are physical limits to it. And if you're an enterprise at serious scale, you're going to pursue a discount on that regardless if it's an important part of your workload. So it's just something that you have to,
Starting point is 00:38:13 you just have to notice that it's there, basically. That's it. Once you've noticed that it's there, you just have to make a commit to your cloud provider. They'll probably give you something. I like that answer. Yeah, I think you're right. It's not purely Racket.
Starting point is 00:38:30 They need to signal and communicate that this is not a limitless resource and have people find cost-effective workarounds like you all have done around this stuff. So I think it's a useful signal in that way. It's expensive and things like that, but I think it's not a pure rent-seeking racket like most. The part where I think it becomes very problematic is if you're a small team in a small company that needs to manage large company-shaped workloads,
Starting point is 00:39:00 but you don't yet have any leverage basically to negotiate with them, and you're kind of stuck with and that doesn't sound like a common use case but if you're a startup doing literally anything in the data space it's really painful and you have no leverage. I want to switch gears a little bit
Starting point is 00:39:20 because I just think you both are interesting people and things I've learned from you or hot takes I've seen. So one thing I've seen, I guess the standard advice is don't build your own database at a company. Anything like that. Always sort of use an existing one. But Richie,
Starting point is 00:39:35 recently you were saying that's the standard advice, but it's a little bit stale for different reasons. I guess maybe you want to expand on when does it make sense to build your own database or what has changed to make building your own database feasible and reasonable? I just think there's so many good primitives
Starting point is 00:39:52 you can use today. WarpStream is a great example with object storage. So is Husky at Datadog. I just don't have to think about data durability, data availability, data replication. I can focus on data structures and application level semantics that make sense for whatever this custom database I'm doing does. It's just like the barrier is so much lower than it used to be.
Starting point is 00:40:25 The thing I usually encourage people is if you can find a way to not be programming a block device or a file system, there's still some use cases for doing that. Obviously someone has to write that software, but if you're at a company and you're at that point, I would maybe think about what you're doing. But if you're relying on primitives like RocksDB,
Starting point is 00:40:47 object storage, DynamoDB, other storage systems to handle the low-level nuts and bolts, and you're focusing on, I have this use case and I can build data structures or access patterns or indexes that give me an order of magnitude improvement over something more generic. I think that's where leverage is built, that's where technical competitive advantage is built.
Starting point is 00:41:12 It's just crazy what you can do today. Our team is very small, our company does not have that many people, but if you lean on good primitives like AWS and their network load balancers and their object storage and some of their data stores, you can build mature, battle-tested infrastructure that would have required hundreds of people to build just 10 years ago. I think that really the focus needs to be on
Starting point is 00:41:43 what is the differentiated value that you can provide by getting a couple of layers lower than just throwing all your data into a SQL database and running a query. and you have a big enough business to justify the overhead for development, I don't think there's really any reason to avoid it anymore because most of the serious concerns about writing your own database are around you're going to lose the data because you don't know what you're doing in terms of interacting with the file system. You're going to do something wrong from a distributed systems correctness perspective and corrupt your data that way, which that one's another real concern.
Starting point is 00:42:29 But when we're talking about building around database, like when we made Husky at Datadog, we leveraged FoundationDB and object storage, and we still provided a very interesting and rich query interface for the data that we store. The differentiated value was making sure that we could do it at a price point and at a certain scale that all of the other potential solutions didn't make sense. So we weren't concerned about losing the data because we were leveraging FoundationDB and object storage. From a distributed systems perspective, things are a lot easier when you're relying
Starting point is 00:43:11 on a database that has transactions under the hood. The definition of database can be flexible enough to help you. Not everything is, I'm going to rewrite MySQL. There's a lot of stuff on that. Husky is a great example, I think, too, because if you look at what Husky is, it's essentially a columnar store built on top of object storage.
Starting point is 00:43:36 And there's kind of a lot of those now. There weren't as many when we started building Husky, but you can imagine we could have just used ClickHouse and used tiered storage and whatever. But Datadog has a couple of constraints that most people building those systems don't have, which is that the data is completely schemaless. You can stick a UUID in your log key and just pump those out at a million per second at Datadog, and they'll bill you for it and it'll work. And making that work
Starting point is 00:44:03 on a traditional columnar store where they all have warnings that don't put more than 10,000 columns in or whatever, whereas Husky could support millions and millions of columns in a given data set. The fact that it can do type inference at query time, like if you go use the Datadog UI, you can emit fields as a string, an integer, and a float with the same field name and query it and do math and get sensible results. And trying to do that on existing column restores is also...
Starting point is 00:44:31 So there was a couple of other things. There's enough there to warrant the overhead of building yourself. Also just all the multi-tenancy stuff, right? Most of the existing systems don't help you with that. But to Ryan's point, picking your primitives is really important. If you pick those wrong, that can basically sink your entire project. And so also those build versus buy decisions, it's not easy. It's really hard to get those right.
Starting point is 00:45:00 And I don't know if it comes from anything but experience, but there's still a lot of stuff out there, I think, where there's plenty of room to improve. Can you elaborate on the primitives? I know you mentioned build versus buy recently as well. I guess, what are some of those elements that you had to choose build versus buy on? Obviously, object storage, sounds like FoundationDB, yeah. What did you, I mean, obviously object storage sounds like foundation DB for things.
Starting point is 00:45:27 Like what other things did you consider or elements of, of your system? Do you have for Husky or work streamer? So for, for, I guess for either, I'm like,
Starting point is 00:45:39 what are just like facets of this that you need to think about? And, and how did you like, what choices did you end up making? That's a good question. You want to talk about Workstream? Yeah, we can talk about Workstream a little bit. But I think before we get there, there were some other good examples in Husky about how we made the decision.
Starting point is 00:46:00 One of the options for metadata was always just put it in Postgres. Or MongoDB. Yeah, or Postgres or MongoDB. Datadog being a big company has a proliferation of storage systems internally for lots of different reasons. And there were other options besides FoundationDB. FoundationDB wasn't an option when we showed up. We brought it.
Starting point is 00:46:27 It was actually really controversial at the time. Yeah, it was very controversial. But the problem with building something like this on Postgres is what's the multi-tenancy and scaling story? If you were shipping a product that's like Husky, but the customer is going to run it entirely themselves. Postgres may have been a very reasonable choice there because they could run their whole workload
Starting point is 00:46:54 on one Postgres machine and then it's fine. But if you need to support thousands of tenants, then that's, you know, it's going to be a very different story in terms of wanting to provide them a good isolation experience. What's the overhead of managing, you know, it's going to be a very different story in terms of wanting to provide them a good isolation experience. What's the overhead of managing, you know, a bazillion little Postgres clusters? That's one of the build versus buy decisions.
Starting point is 00:47:13 Another one was on the file format side. Like we created our own file format instead of just using something off the shelf like Parquet. There were a lot of different reasons, but one of the very obvious ones is we had to build a full-text search index, and Parquet is not a full-text search index file format. And when you're building a full-text search index, you need to understand the IO access patterns very well. So designing a format will help you do that because it runs on object storage, we can't just assume that there's an SSD to query a bazillion times to fill the index.
Starting point is 00:47:54 There's just a lot of little ones along the way, but I think that the cloud providers, if you're willing to accept a dependency on cloud provider services, you have a ton more options there as well. Especially with S3, that one's pretty obvious. No one's going to choose something else. But if we were we had the choice to build versus buy on the database for WarpStream that we use internally in the control plane.
Starting point is 00:48:32 And we wanted to give ourselves the flexibility to, in theory, ship a version of WarpStream that ran entirely within the customer's cloud account, like the control plane and the agent data plane. We didn't choose FoundationDB specifically for that reason, where we don't want to force them to have to take on the FoundationDB dependency, because it's easy enough to run, but it's still probably a non-starter for somebody who's paying money. Well, it's just like, there's three companies that know how to do it. It's not hard, but no one can just... Yeah, it's not hard, but there's a little bit of a learning curve to it. So in the end, we chose a very limited interface of what we decided we needed. And it's something that's offered by every cloud provider, basically. So we chose a limited enough subset of what we needed out of a database on our control plane so that we can ship it on every major cloud provider without
Starting point is 00:49:27 having to make major changes to our application. So that was an example where we decided to buy a different product. But basically, an open source product is closer to, using an open source product is closer to building than it is to buying sometimes. So that's why we went with buying this thing. Yep. Well, what's, what are your two history with foundation DB? And I think it's one of those things that like certain people just rave about and love
Starting point is 00:49:57 and then just like, it's not as widely known because, you know, it's very specific use cases, but I think a lot of like systems people just love it. So there's no corporate sponsor. I think that's the main reason it's widely known to be honest with you yeah um i mean like had you had you used it before husky like what was your what was your sort of background or getting it how did you get into foundation so when i was open sourced uh i just got extremely interested in it from like a technical perspective there was no i didn't have any history with it prior to that. It's very useful for learning about distributed systems just because it's kind of weird
Starting point is 00:50:32 in the way that it's designed. It's extremely unique. There's a lot of... Actually, it's a good example of making one trade-off that other people are not willing to make or don't think is right, and then having massive simplification fall out of it. What's that trade-off that other people are not willing to make or don't think is right, and then having massive simplification fall out of it. What's that trade-off that they make? The trade-off with FoundationDB is that,
Starting point is 00:50:52 what's it called? The recoveries? Is that what it's called? That's the trade-off they make. Yeah, if any one of the nodes in what they call the transaction subsystem fails, which is like a subset of your cluster. If you're running a large cluster,
Starting point is 00:51:07 maybe it would be 10%, 20% of the nodes. If any one of those nodes fails, all of the writes to the cluster stop. You can't process any transactions until they replace that node by reconfiguring the cluster. And the reconfiguration happens on the order of one to three seconds for a not particularly large cluster. But during that time, the cluster is down. But because of that, so many other great things fell out of making that choice. And most applications can tolerate that, especially because the client bindings
Starting point is 00:51:45 have a built-in retry loop that if you hit that error code while your transaction's running, it will just restart it. Like what happened if you had a conflict when your transaction was committed. It's the same idea, basically. So most applications don't notice anything
Starting point is 00:52:01 other than a small latency increase for that second. It drives people crazy. You explain that to distributed systems people, and they're like, you can't do this. And it's like, but they did, and actually it works great, and it's way more reliable than most systems built the other way. Because what they can do is just, for a bunch of the roles in the cluster,
Starting point is 00:52:20 they're just like, it's this node, and this one's this one, and this one's this one, and if it fails, we'll just go through a recovery. All the error handling is just do a recovery. And so they just made that really fast, and they test that a lot. And you end up with this really, really efficient system. But that's not a trade-off that most systems would die. Someone would propose that, and they'd be like, no. And so you'd have to go through some other much more complicated architecture. But they were just like, no. And so you'd have to go through some other much more complicated architecture. But they were just like, no, we think this is the right trade-off.
Starting point is 00:52:50 I've operated Cassandra, Elasticsearch, MongoDB, etcd, Zookeeper. What are the other ones? MySQL Postgres. Anything you can name, I've been on call for it at some point. There is just nothing that even remotely comes close to FoundationDB. It just never lets you down. It just will always do the sensible thing. It tries really hard to never lose your data.
Starting point is 00:53:15 We had so many scenarios where we realized we'd been doing something wrong and it was just fine because FoundationDB was just really quickly fixing it under the hood. It's just hard to explain. There are very few systems that you ever use that are just that thoughtfully designed and work the way they're advertised, and FoundationDB is the only one I can think of. Would it make sense for a cloud provider to offer a FoundationDB primitive, like a managed version, or does it sort of not work because it's so low level or something like that?
Starting point is 00:53:49 I think that the main problem with somebody offering FoundationDB directly is the fact that the client bindings are a C library that speaks a, essentially, write C structs over the wire protocol. It's doable with a lot of work, I would say. But overall, just like the segmentation of the networks would be a little funny. You'd have to build your own proxy that would cause a bunch of liability problems.
Starting point is 00:54:20 But there's nothing stopping anybody from offering somebody that is like all of the interface of FoundationDB over a different protocol. Like you could build a protobuf interface that you could stick in front of a load balancer and maybe lose a little bit of efficiency, but you could still totally offer that. The other thing that I think is a little weird about it is FoundationDB has limits in the sense that it's not a completely horizontal, like it doesn't perfectly horizontally scale forever to unlimited size databases. And it's not very friendly to the type of, it would be very friendly to the type of automation that like you would imagine is behind DynamoDB where there's like an absolutely massive fleet of machines that run shards that are kind of self-contained.
Starting point is 00:55:06 It doesn't work that way. So you'd have to build a lot of really good automation around increasing and decreasing the size of individual clusters. And that's tricky. Getting the quality of service right on that is also, I'm sure, really tricky if you're going to try to do it in a multi-tenant way.
Starting point is 00:55:22 But I would love if someone did. That would be cool. But it's definitely a big engineering challenge. And I imagine anybody that gets close to that, they would probably want to start from scratch and just build a new system that had the same semantics instead of using FoundationDB directly.
Starting point is 00:55:46 Are there any other primitives that you wish existed that you ended up having to build yourself for whatever reason? I think S3 Express OneZone might fill a little bit of this role, having a scalable, relatively low latency key value store for large chunks of data, we may have decided to change the way that Husky worked. The pricing of S3 Express 1. Zone may not have been this practical, but in Husky, the query nodes had local disk caches. And there may have perhaps been a way to build a cache that would allow you to query it from multiple nodes instead of just whatever was on the local disk in one of the query nodes, or tiering for the cache.
Starting point is 00:56:48 I think S3 Express One Zone will definitely fill a big hole in people building caches for their systems. I have another one that's in that vein. Well, it's related to caching maybe, but different. It's more of a library thing than it is a cloud primitive or whatever. But I feel like external caches get really overused. In all the systems we've built, including in WarpStream and the control plane and the agents and stuff, we lean pretty heavily on just having caches in the application itself.
Starting point is 00:57:23 As soon as you have to introduce a Redis or a memcache or something, just this other piece of infrastructure and like, it's just remote memory. Like, why do you need to manage machines for that? And I think better primitives around essentially, you know, I want all cache loads
Starting point is 00:57:39 for this type of key to go over here. And I want to be able to, essentially introducing data locality into an existing application. Actor frameworks are one way to express this. Elixir and Erlang's OTP framework is another way to express this,
Starting point is 00:57:57 basically making it easier for people to do more distributed systems things in their application in the same way. I feel like we always end up making stuff like that for ourselves because we're just like, I don't want to manage all these external dependencies and I want to get to pick one or two dependencies and then everything else has to happen in the application. That's how we usually build things. And that usually requires writing these
Starting point is 00:58:23 libraries to do stuff within the application. I feel like if more people had access to those things and there was better toolkits for that, but it's really tough to figure out the right programming model and the right interfaces and stuff like that. That's super interesting. Back to that S3 Express 1 zone, if you could change one thing about
Starting point is 00:58:45 it in you know a year or two would it be the storage cost isn't as much would it be it's actually in two zones right so you don't have to worry about a zone going down would it be um i can't remember i had a third one but like what would you change about s3 express ones that would make make it even more helpful to you? I think the biggest one would be to remove the bandwidth pricing that's kind of baked in after you write half a megabyte. That was my third one.
Starting point is 00:59:18 Like the 512, yeah, okay. Yep, so I would love for that to go away. I understand why. I understand why it's there. Why is it there? Basically, you could use it to do the same thing. The way that WordStream works is, ideally, you have as few puts as possible,
Starting point is 00:59:43 but you make all of those puts as large as possible because you're not billed based on the volume. You're based on the number of puts. So we try to use multi-part uploads. Just to avoid the interzone networking and also keep the put cost low. So you avoid interzone networking by using S3 standard. You make as large of puts and gets as possible.
Starting point is 01:00:03 Puts are more expensive. And if you could use S3 Express One Zone in the same way that you can use S3 Standard, where you write as much data as you want and only pay for one put, that would be great for us in the sense that it would make everything way cheaper. It would just be faster and cheaper. There would be no trade-offs other than the very slightly increased storage cost. But what we would do is we would just write it into S3 Express One Zone and then use compaction to move it to S3 Standard.
Starting point is 01:00:39 So we would only pay for the increased price for a relatively short period of time on the storage side. So I understand exactly why they did it. They don't want you to use S3 Express 1.0 as a way to do cheaper cross-AZ bandwidth, and they don't want you to use it in the same way you use S3 Standard, where you can use drastically more. It's more expensive to write data to what are presumably SSDs. So they have a limited lifetime. You need to pay a little bit for the I.O. that you do,
Starting point is 01:01:09 whereas hard drives, you can rewrite them as many times as you need. So I understand why they did it, but that would be the one thing that I would change about it. That's very good. The replication is annoying in the sense that we'd have to build. We have to build something in order to add replication to S3 Express. But it's not the end of the world. That one is definitely solvable.
Starting point is 01:01:36 I just realized something, too, that I think is another interesting thing about the difference between manually replicating your data and having S3 replicated for you, besides just like it's better because it uses erasure encoding and they can have dedicated links and stuff, which is also that when you manually replicate your data, whether it's with single zone, S3 Express, or just in your application, the AWS has no control over which essentially data centers
Starting point is 01:02:02 or availability zones that data gets replicated to, whereas with S3 they actually get to choose because it's not exposed in the interface. So you can imagine if one of the availability zones is slightly degraded or under capacity, they can essentially traffic shape. Whereas if you're doing it yourself, they don't get that control either.
Starting point is 01:02:18 So I think that might be another interesting kind of data center design question. Very cool. Cool. All right, I want to wrap up with just a few questions around business stuff and WarpStream as a company. So I guess first one, what's your billing model? How will people use WarpStream?
Starting point is 01:02:35 Is it fully hosted service? Is it sort of run-your-own? What's it look like there? Yeah, we have one product offering right now, and we're about to release a second one. So the product offering we have right now, we call it BYOC. We've been debating the name a little bit internally,
Starting point is 01:02:51 whether we should call it self-hosted or whatever. But the idea is it's WarpStream's design with a data plane control plane split. And so the idea with the BYOC product is the data plane runs in your cloud account and the control plane runs in ours. So there's essentially no networking fees between the two cloud accounts except for metadata,
Starting point is 01:03:08 which is super small, on the order of tens to hundreds of kilobytes per second. And also you get all these nice privacy and security benefits because the data literally doesn't leave your cloud account. We call that BYOC. We're still kind of going back and forth on the pricing for that. I think most likely what it's going to end up being is just usage-based pricing on two dimensions.
Starting point is 01:03:31 If you imagine this is the slope of the cost for self-hosted open source Kafka, WarpStream cloud costs plus our licensing fee will look like this. It'll just be a much less steep slope. We'll only price on two dimensions. It'll just be on write throughput and storage pricing most likely. So however much data you're retaining, plus however much data you're writing, we don't think we'll bill
Starting point is 01:03:56 for agents or cores or reads or anything for that product. But we'll see. And then for larger use cases, there's probably some options there for around fixed size pricing and stuff like that for people who want more predictability. We're going to launch our serverless product probably mid next quarter.
Starting point is 01:04:18 That's like the same exact technology as WarpStream, but essentially the data plane also runs in our cloud account. But it's run in such a way that it's fully virtualized, kind of scaled to zero. You can go from like zero megabytes per second to 100 almost instantaneously and back down. That one will unfortunately just have to have more dimensions to the pricing because it's just like there's a lot of different ways you can incur costs for us. And so that one we're still figuring out. There will be more dimensions to it.
Starting point is 01:04:49 Reads will obviously cost something because there's egress fees. Some of that can be mitigated with PrivateLink and stuff like that. But that one will have more dimensions. But under the hood, it's leveraging the same WarpStream technology. So what you'll get, though, is a lot of the other people selling hosted Kafka, it's not super obvious in the pricing, but eventually they make you pick, is all your data in one zone, or are you willing to pay extra to have it in multiple availability zones? And so with the WarpStream serverless product, you'll get essentially usage-based pricing,
Starting point is 01:05:22 scale to zero, everything will always be triply, not triply replicated, but stored in S3, so multiple availabilities on storage. And it'll be a lot cheaper than anything else out there. Yeah. Yeah, that's pretty fascinating. I think especially scale to zero and just being able to quickly scale up to whatever you need to, that'll be pretty interesting.
Starting point is 01:05:44 Another question I like to ask people especially they're building like data products like this is how do you work to convince people of its of its reliability right how do you have people put their trust in in like hey this is a new product um that's holding my most important data or something like that like what's that process sort of look like and how do you get them on board onto it yeah so i think it starts with, we wrote a bunch of blog posts in the beginning that explained bits and pieces about how the technology works. And I hope that shows people that we know what we're doing from a technical perspective. Like the information is dense and interesting.
Starting point is 01:06:23 To give slightly more of an explanation about what we do internally, we have a pretty serious set of correctness tests and integration tests for our system to ensure compatibility with a bunch of the client libraries. And we take the durability, obviously the data is in S3, so that's, you know, the data plane data is all in S3, but the control plane data is also stored in S3. And if you're in Amazon, and if you're in Amazon, also the control plane state is stored in DynamoDB, some of it. And we take backups of that information. Depends on how often, like, depends on how much you write to the cluster, but we take backups of the control plane state, like many times an hour if you're on a,
Starting point is 01:07:20 if you're on a high throughput cluster. We do a lot of fault tolerance and fuzzing and stuff like that too. We have a bunch of integration tests that essentially run I forget what the term is, but we run another implementation on the side that's tracking what the outcome of every produce and fetch should be. And then we inject faults all over the stack up to pretty high ratios and make sure that everything that was acknowledged on the right side comes out on the other side in the right order, uncorrupted. So we do a lot of aggressive testing like that as well. It's not fully determinized in the same way that Tiger Beetle is,
Starting point is 01:08:06 but we do a ton of fault injection. And we do a bunch of other crazy stuff. All our staging clusters run basically at 95% CPU all the time and have really obnoxious workloads with really poorly tuned clients to just stress test the system. Randomly delete topics every 10 minutes.
Starting point is 01:08:28 So a bunch of stuff like that. I think a lot of that trust also just comes with time. As we spend more time developing the system, as more people adopt it, as more people see the value. But it's hard. I think it's not as hard as selling a new database that stores data forever, but it's close to have that hard
Starting point is 01:08:52 because if you just think about how critical Kafka is in a lot of people's workloads, I think it comes with trust and time. We do have one user in production that's been with us for a while now, and their usage has been growing, and they're very happy with it. We'll probably do a case study with them relatively soon,
Starting point is 01:09:15 but we have people that are using it in production and doing proof of concepts right now, and they're finding many fewer bugs than they did six concepts right now. And they're finding many fewer bugs than they did six months ago now. We've ironed a lot of things out. Mostly the bugs are around like, you don't support this thing or you support it in the wrong way.
Starting point is 01:09:35 That doesn't look exactly like open source Kafka. They weren't really a practice issue. They were just, you know, the protocol is huge. So we're making a lot of progress there. For most simple usage of Kafka, you can just point your existing clients at it and it will work today. Very cool. I think that's just the process for a lot of data systems where you've got to get some clients where they have enough pain either from
Starting point is 01:10:03 the costs or the operational issues and they're just like hey we got like we got to make something happen and you build up enough of those and get the case studies and get the word out there like that this is working and it just builds momentum on itself so uh it's fun it's a fun time to see like the kafka space because you know confident like you know the the big gorilla in the room or whatever but then you have you all like squeezing them from the cost side and operational side and red Panda squeezing from like the latency performance side. Like it's, it's a,
Starting point is 01:10:30 it's a fun time to see what's happening in this space. Yeah. I think the streaming is going to look really different 10 years from now. We're pretty excited about it. Yep. Absolutely. Richie, Ryan,
Starting point is 01:10:41 thanks for coming on. I learned a ton. This is a really fascinating stuff and I appreciate you getting down in the weeds with me for all this stuff. If people want to find out more about you or WarpStream, where should I send them? You can go to our website at warpstream.com and then we have, I think the Twitter account is called WarpStream Labs.
Starting point is 01:11:00 And we have a LinkedIn now too. Oh, and also, the place we're most active if you want to just come talk to us directly is our community Slack channel and there's a link to that on the website perfect we'll get those on the show notes I'm going to link in some of your blog posts as well
Starting point is 01:11:15 because I found them really fascinating and useful so I think other people will as well but yeah thanks again for coming on best of luck to you all at Warpstream going forward cool thank you so much for having us thanks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.