The Data Stack Show - 156: Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the Data Stack Show. Acostas, we get to talk about a topic actually, no pun intended, that I don't know if we've dug deeply into on the show, and that's Kafka. And specifically, we're going to talk with Alex from Red Panda.

Starting point is 00:00:53 And I'll tell you what's interesting to me about Red Panda is that as widely used as Kafka is, Confluent is really one of the only major successful commercializations of Kafka. But Red Panda is doing some really cool stuff. So I think we should get a 101 on Kafka because we haven't covered a ton in depth on the show. And then hear about what makes Red Panda unique as sort of a way to run managed Kafka. So that's what's interesting to me.

Starting point is 00:01:23 Yeah, 100%. I mean, as you said, I don't think we had a specific episode in the past about Kafka. We had a few about stream processing, although that's a little bit different because Kafka is

Starting point is 00:01:38 not necessarily that much about processing, but more about resilient transport of data at any scale, which makes it like a very important component in many, like in pieces of like many like infrastructures out there, like from many different companies. And you're right. Like we haven't seen outside of Confluent and okay, like the big providers, like the cloud providers with systems like Kinesis,

Starting point is 00:02:07 to actually build something to go after this particular market. So having RedBand out there, I think it's extremely interesting. And there are a couple of things to chat about. Why do that? Why not do that? Why other people don why do that, right? Why not do that? Why other people don't do that, right? Why we don't see like more competition in this space and see what it means like to build something like Kafka after, like I think Kafka was like built like the beginning of 2010

Starting point is 00:02:39 or something like that. So what's 10 years of innovation in technology gives us as tools to go and build something similar. And yeah, see how it is different, what kind of different tooling it gives us compared to Kafka. And what it means to go after this market and build a company there.

Starting point is 00:03:01 I think this is going to be super, super interesting. It's a very hard technology to get right and a very important one to get right. You can't fail with this data. So let's go and see what Alex has to say. Let's do it. And unfortunately, I actually just had

Starting point is 00:03:18 something come up. So I'll let you kick the call off and then I'll join if I can. If not, I'll try to come back for the intro. You don't have to. Let me enjoy the conversation. It's fine. All right.

Starting point is 00:03:32 I'll see if I can make it back by the end. Okay. Let's do it. Hello, everyone, to another episode of the Data Stack Show. I'm Kostas. And as you probably already learned by now, when I do the introduction, it means that I'm going to be alone on the show, unfortunately. But we have an amazing guest today, so hopefully we're going to compensate for missing Eric with him.

Starting point is 00:03:58 So we have Alex Galego, he's co-founder and CEO of Red Panda. And we're going to talk about a few extremely exciting things today that have to do with Kafka, Kinesis, Red Panda, and technologies around that. So welcome, Alex. Very nice to have you here. Thanks for having me. Good to be here. So let's start with a quick personal introduction. Let's tell us a little bit about yourself, because you're not just like a CEO and a co-founder.

Starting point is 00:04:33 You also wrote a lot of the code behind Red Panda. And these systems tend to be quite complex. So it would be awesome to hear about you, your background, and your journey building Red Panda. Thanks for asking. So I guess by means of introduction, I've been working in streaming for almost about just about 14 years, which is mind-blowing that you could work on a single problem for so long and still find so much richness. I mean, I could probably build two or three systems following up if I was working on Red

Starting point is 00:05:04 Panda. So yeah, you know, anyways, I was, I went to school for, largely I was trying to focus on cryptography. I ended up dropping out of a bunch of programs, graduate programs, and I went to start building distributed systems because I just found them a little bit more fun than breaking things. This was early on in my career. I went to work for an attic in New York where I first started working with, you know, the first couple of versions of, I guess, Zookeeper in 2010, 2011, somewhere around there and Kestrel and then Kafka and so on. And so, yeah, so my journey started really early on. I ended up testing Storm. And then that kind of those ideas led to me writing the code for the first startup that

Starting point is 00:05:52 I sold to Akamai. It was called Concord. I also kind of authored the original first part of that engine. And then we went on to build another small company around it. And so Concord was cool. It was a compute platform that was different than, you know, frankly, most streaming platforms today. And so it was really like container-based, single-threaded, C++ execution engine.

Starting point is 00:06:13 It was really more like a quasi-envoy, you know, the C++ proxy with like language runtimes on top. And so it was pretty cool. We sold that company to Akamai in 2016. And so Red Panda came out of this deeply technical background where, you know, as an engineer, I just couldn't understand where performance was going. And if you were looking at a couple of computers, like I don't understand where the latency is coming from. And so the first ideas for Red Panda came about in 2017, where I took two edge computers

Starting point is 00:06:47 and I connected them back to back with a single SPF cable. No routers, no switches, just two computers and a cable. And I was just like, I just want to measure what's the gap between hardware and state of the art software. And then I went to when I wrote something in C++, you know, it's really mostly the idea, some of the core ideas right there. It's like, well, what is the gap actually in both in throughput and latency? I gave a talk about in 2017. In my mind, I was like, you know, there's like a couple of companies working on this.

Starting point is 00:07:16 They'll figure this out. They're really bright. And they didn't work on those ideas. And so, yeah, I spent the next few years just trying to understand why, you know, basically as an engineer, there's no magic, right? If the job is to save data on disk, then the job is to save data on disk. There's no two ways about it, right? So there's like that essential complexity. But when you look deep down, you learn that if you take a new approach, sort of design for the modern hardware, you could get this category called performance improvement. And so with that came this whole design possibilities.

Starting point is 00:07:45 I was like, hey, if I were to start from scratch, what would I do differently? You know, what choices? Like, how would I think about architecting the next generation of streaming? What would it mean for the engineer? And so eventually that gave the birth of Red Panda, you know, the company and product,

Starting point is 00:08:03 which is kind of fun because it was first an Easter egg. And we'll talk about the naming later in the show. That's Red Panda. That's how it started. And those were the roots of the technology. All right. So I have like a couple of like historical questions, to be honest, because I have someone here who has been like through the evolution of streaming systems, but also being part of this evolution of streaming systems. So I remember like back then in, I don't know, like let's say the beginning of like 2010, maybe a little bit like earlier than that, there was that kind

Starting point is 00:08:31 of like explosion of like systems that came out from places like Twitter with like Storm, we had Samza, we had Kafka obviously that became like, like dominated for a while. My question is, all these systems that they appeared back then, most of them disappeared, right? They didn't make it, let's say, at the end, outside of probably, yeah, like, okay, Confluent made it to the public markets,

Starting point is 00:09:03 but we didn't see more happening there. Even with Samza, right? Not Samza, sorry, Link. Now we see again the market getting interested again into that. What is the reason, in your opinion, that in the streaming space this happened? Out of all these products, we didn't end up with more successions. The Achilles heel of streaming has always been cost and complexity. And for those of us in the room that had to trace, gosh, in 2011, I was writing Scala

Starting point is 00:09:39 deploying that. Scala and Java deploying that on a closure runtime, which was storm. And when like the Nimbus worker decided to stop, you're like, you get a stack trace that is the equivalent of like zero exit that beef, right? You're just like, I have no idea what this means. And you end up, you know, debugging the transitive closures. And like, if you ended up even, you know, important in some of the more sophisticated JVM libraries back in the day, like algebra, which, you know, thankful to Twitter for publishing some cool stuff. It was just gnarly. Really, there weren't just many. And here's the thing about compute in particular. I think a storage, we should separate compute and storage differently. And so when I

Starting point is 00:10:19 think about compute, you know, I think about my previous company, Concord. I think about Apache Storm. I think about Flink. I think about new approaches today, like Bitewax and Materialize and, you know, and so on. Rising Wave, there's a huge host, decodable as a host. So compute, it's its own layer. And then storage is its own layer. And I think what was meaningful back then is that most of us didn't have the scale of Twitter, but we were growing super fast. And so we took what they were doing because it was in the open and you can just get cloned

Starting point is 00:10:57 and it was both a blessing and a curse. A curse later when you had to debug it, but a blessing because you could get started quickly, right? That was a blueprint. And then you could do like large-scale things what most people didn't realize is the cost of operationalizing that was inordinately expensive it's kind of like the promise of the hadoop world that you know materialized for like four companies in the world no i'm kidding but you know it was really hard to to actually extract value out of these things it

Starting point is 00:11:22 just became computationally expensive and like manpower expensive and three people in the company knew. And once they left, you're like, I have no idea how this part of the system works. And so cost and complexity have always been the Achilles heel of streaming. And so two things have happened. One, I think managed services like Red Fund the Cloud and Confluent Cloud and MSK, et cetera, has made onboarding some of the technologies easier, not necessarily simpler. I'm going to talk about, you know, easy is different than simple, right?

Starting point is 00:11:52 Complexity is a different metric for me. And then, you know, on the storage side, just to give a glimpse of that, when I, you know, when you first started and you started messaging, when you look at actually the history and the evolution of the storage with Tipco and Solas, it originated with you showing up to the data center and those of us that had to wire data centers in like, whatever, Secaucus, New Jersey or something like that. And it was like miserably freaking cold when you showed up to the data center and you had to wire these things. It was awful. And, you know, kind of fun because you spend like six days in the freezing cold but anyways you would get these computers and then you would rack them physically yeah and then people would charge you money right like vendors would charge you money for the number of tcp connections and that just sort of didn't scale well with the way modern you know i guess now web 2.0 applications like like a twitter and so i think the pain points that kafka solved and i'll talk about the other evolution of

Starting point is 00:12:54 the big ideas over the last decade is that they took off the shelf cheap computers with the spinning disk and then made the software a little bit more intelligent so that, you know, it could just work with like scaling at the time. And then that's when like most people started really adopting Amazon, you know, still a janky experience early on. Like now the clouds are so sophisticated for those of us that had to debug like networking issues back in Amazon. And they, anyways, you know, then you would just scale by adding cheap computers and it made building

Starting point is 00:13:28 and shipping products super easy. And so to me, that was the key idea early on in streaming is that you had a blueprint that you could copy and potentially you could work yourself into a success

Starting point is 00:13:41 if you've managed to hire really talented engineers. And that was really promising, right? I mean, if you were, we were a young ad tech company in New York competing with Google and we won, for the record, it was fun. We won like, I think New York Times, Forbes, Reuters, MSNBC, et cetera, for a while on mobile traffic. So we were like, hey, we're winning.

Starting point is 00:14:00 And to us, we were like, you know, whatever, we onboarded this complexity, but we were making money. And so that I think that was the keystone idea back then is like, systems and for me that was a huge source of inspiration into building out two companies in the streaming space and you know probably the next five or ten i mean who knows uh but i still find it super exciting today 100 okay i have like one more question about like the streaming systems so we tend like to talk about like streaming platforms and it's like an umbrella term for all of them, but there are some differences between them. In my mind, I can't really compare Kafka to Flink. There are some fundamental differences. I instinctively think of Flink whenever I want to do like some heavy streaming processing with like a complex state that I want to have like guarantees around that.

Starting point is 00:15:10 Like pretty much like what I would do with like a SQL query on a data warehouse, but I want to do it like on a stream of data, right? Well, when I think about like Kafka, I think more of topics and data and guarantees around this data and making sure that the data is not going to get lost and being able to accommodate throughput and latency

Starting point is 00:15:33 requirements. But I never think of out of the box, I'll take Kafka and start doing some crazy stateful processing on top of it. Does this make sense as a distinction between the streaming technologies out there or not?

Starting point is 00:15:55 Yeah, I agree. I was trying to allude to that on my previous answer when I think about Flink as compute and Kafka as a storage. And so if you think on a storage front, and the reason for this is that streaming overall is really the ideas that you take a little bit of a storage, a little bit of compute, and then you sort of chain it together.

Starting point is 00:16:16 And at the end, you have something useful like Uber or DoorDash or fraud detection for a bank or oil and gas pipeline, you know, anomaly detection or IoT, right? But it is the combination, the chaining of combining compute and storage. So in most streaming systems, you need both, unless it's something like, actually in all, let me put it there. Even for the simplest things, the reason why I think compute is a little bit more challenging

Starting point is 00:16:44 to, for, you know for vendors, et cetera, is that with compute, you could do anything, right? Like you set up a cron Python script and maybe the supervision is you get page when the Python script crashes. But whatever, right?

Starting point is 00:16:58 Maybe you accept that risk as an engineer, as a business, because you have two customers and they're paying you $3 a month and you're like, well, whatever. I'm not going to pay for the additional complexity. And over time, I think people just tend to graduate to more sophisticated compute platforms

Starting point is 00:17:12 like a Flink, you know, based platform. And so now on the storage side, that's kind of the core thing, you know, and so you can't really trade off. Like if, as an engineer, if I send data to a storage engine, you expect to retrieve back the data, like full stop. At the highest level, this is really how engineers think about it. If you store my data, then I'm going to send you data and then I'm going to query it back.

Starting point is 00:17:36 And so on the storage platform, which is where Red Panda sits today, we borrowed the modeling mechanics of the Kafka API, which for those listening in, you can think of the Kafka API as a topic, as an unordered collection of items. And a topic is broken down into totally ordered partitions. And so it's an unordered of collections of totally ordered sub collections. You can think about it. It's like a map of lists. If you're thinking in data structures,

Starting point is 00:18:10 you know, an algorithm, it's like, and you either consume, you typically consume from the head or tail, depending on your mental model, and then you can truncate it. Right. And so that's generally the Kafka model. And that proves to be just enough to be really useful for, you know, data engineers or system engineers trying to build higher levels. But you need both at all times. You need some form of compute, even if it's an in-house Python script, you know, supervised by, by Cron, or I guess now, you know, the cool kids are doing Amazon Lambda or whatever. Like that's, you need that layer somehow, right?

Starting point is 00:18:45 Because you need to do something with the data. And you also need the storage now to give you semantics around transactionality or around, you know, safety guarantees, not losing your data or about throughput or latency, right? Like, and those, you can't really just build it incrementally, right? It is like most people today that you could, most people don't go and build a database and then build a business, right? It is like most people today that you could, most people don't go and build a database and then build a business, right? You sort of buy a Postgres or you buy a Red Panda or you buy a Snowflake or you buy, just people buy into storage engines more. And so that's where Red Panda is. And hopefully this makes sense for everyone listening in.

Starting point is 00:19:21 Yeah, it makes a lot of sense. So going back to Red Panda, Red Panda is closer to the storage or the processing or it's equally, let's say, both. Great question. If you had asked me that question yesterday or a couple of days, I would have answered that differently. Let me tell you what,

Starting point is 00:19:38 by the time people listen to this, we would have announced our CRC funding. So prior to this conversation, you know, we're strictly on the storage side. And largely, I still think this is where the largest value that we provide to customers, right? If you think of Red Panda,

Starting point is 00:19:57 you can think of it like a drop-in replacement for Kafka. But, you know, a car is a car, but if you step into it, we're more like stepping into an electric vehicle, right? Like with ludicrous speed mode. So just giving an analogy for the people in the room, but we're just announcing this idea of keeping the simple things simple with WebAssembly. So largely we're still a storage engine, but we're starting to expose some of compute things.

Starting point is 00:20:25 And the reason is if you're a data engineer, you know that the majority of your time is spent doing non-sexy things. You take a JSON object and you make it Avro, you take Avro and you make it Protobuf, or you take, you know, an endpoint and then you enrich it with the IP address. Is this fraudulent? That's just where the bulk of the data pipelines are. And it's just kind the bulk of the data pipelines are. And it's just kind of what it is. And so with WebAssembly, you can now do that at the storage engine level. And so it's not designed to compete with the flinks or the stateful sort of higher level order databases that are super sophisticated, multi-way mergers, like you were mentioning.

Starting point is 00:21:03 It's really designed to be complementary from a mental model of the engineer building a data pipeline. And so if they have this one-way functions, like convert JSON to protobuf, or enrich a JSON object with an IP address, or take an object and give it a chat GPT score, it doesn't matter. Those kinds of simple things, one shot at transforms. Our web assembly engine is really good at that. And so we just announced investing in that, you know, a ton of money, which is going to be fun to see how that matures. And so largely to answer your question specifically, yes, we're mostly a storage engine and we

Starting point is 00:21:37 just started to expose a little bit of the compute. And then we'll talk about, I think Apache Iceberg for the data engineers is like a way of trying to continue to simplify their architecture. Okay, that's super cool. First of all, congrats on the round. I mean, it's kind of amazing to be able to raise a growth round right now where everyone says that the checkbooks are closed for this round. So I think this says a lot about the growth of the company

Starting point is 00:22:04 and what you are doing there and the impact that the company has. And also congrats on building these new capabilities on the storage engine. And one quick question. You mentioned WebAssembly. Why WebAssembly? What's the reason of exposing WebAssembly as the way to interface with writing these functions? Yeah, WebAssembly, I know to some of the engineers listening to this, they feel like WebAssembly is like self-driving cars.

Starting point is 00:22:36 It's always coming and you're like, well, when is it actually going to come? Is it a decade or is it two years? And in part, I feel a little bit guilty of being people have been pushing WebAssembly for a while since 2020, right? We're like one of the first storage engines. And then we inspired other companies to go on building WebAssembly and so on. I know because the engineers worked on those features. DM me.

Starting point is 00:22:57 It's like, hey, how did you do it? And like, you know, this is cool. Let's chat. So why WebAssembly? First of all, multi-tenancy isolation etc like when you start to expose some of the internal mechanics to programmers there is a person that will write a for loop that is an infinite for loop and it'll just take down your cluster it's just a matter of time that's what engineers we were much better i think at breaking systems than building. No, I'm just kidding.

Starting point is 00:23:26 We're also good at building, but the point is if you expose an API to a programmer, they'll just find a way to break the system. It's just what programmers do. Because for fun, you're like, oh, well, what happens if I do this? It's just how you discover products. You have no idea. And so you test it and then you take down you know an entire system it's like how many of us you know blocked the entire code base when we were all using

Starting point is 00:23:49 perforce 15 years ago and then you you go away for the weekend and there's a hot patch and you get called and so so anyway so so going back to in theory allows people to write in their favorite language uh so we don't have to say you have to write in Rust or Go or JavaScript or C++, which is how our storage engine is written. You could write in your favorite programming language. As long as it transpiles to this intermediary representation,

Starting point is 00:24:15 we can execute it safely. And so as an engineer, you now get exposed. I guess Red Panda becomes more like Transformer, Optimus Combiner worse, where you have a little robot and then you add different pieces and now you have a bigger robot that is finding megatrons or whatever. That's the idea behind WebAssembly. Can we teach the storage engine new capabilities, domain-specific capabilities? An example of domain-specific capabilities is GDPR compliance. Let me strip your social security number right before you write it on disk or right after you read it from disk. Or maybe let me teach Red Panda data placement guarantees.

Starting point is 00:24:56 And so if you have a global cluster, can you, the programmer that has infinite business context or larger definitely than our team can you write a data placement so data doesn't leave germany or later data doesn't leave paris or data doesn't leave new york it doesn't matter right so those kinds of exposing the business constraints that's why wasm and so one it allows people to write in their favorite language and two it allows us to sort of give the developers this like Transformers-like capability where you just add domain context onto it. In practice, really, we're going to launch first with Go.

Starting point is 00:25:35 We tried to launch with JavaScript in the past and that had some, you know, adoption, but not the kind of adoption that I was hoping for. I think Go strikes a good balance between ease of use from a developer perspective and reasonable good performance when compiled to WebAssembly. And so those are the practical limitations that we've been working on. And so we've tested almost every available WebAssembly engine in the world by now. So anyways, that's why WebAssembly, I think we have a lot of excitement

Starting point is 00:26:06 pent up for that. That's super exciting. I'd love to play around with it, to be honest. All right, so we already mentioned some of the capabilities that Red Panda brings on the table compared to the older generation of streaming platforms like Kafka. But okay, let's say WebAssembly is the new shiny toy. You started doing things differently from the inception of Red Panda, exactly because you saw the limitations that existed in these products. Tell us a little bit more about that. If you had to summarize, let's say, in a foundational context, what are the three, four, let's say, very different things that Red Panda does compared to the other systems

Starting point is 00:26:57 to deliver the same service, the same value, but in a much better way? Going back with the electric car, I think electric cars deliver a different value. They basically made the hypercars obsolete. Like the zero to 60 no longer makes any sense as a selling point because electric cars are so fast. But I'll tell you the ideas that we focused on that are different from the focus of all the platforms. It's not that they couldn't technically do it.

Starting point is 00:27:21 It's just a different set of decisions early on that have had this huge ramification in terms of what the final product looks like, right? And so let's talk about them. There are three core pillars on the product. All of them, the overall umbrella, the way I think about building companies and so on is if we make the engineer hands-on keyboard behind a terminal, the hero of our story will be a massive financial success. And so my job has always been to obsess maniacally over like, is the engineer actually successful? I know that I can get a CIO to sign a check,

Starting point is 00:27:58 a large check if we make their, you know, product engineers actually super successful with the platform. So that's always been my obsession as a founder, as an engineer, and also because it's sort of how I grew up, technically speaking. And so there are three core tenets. One was simplicity. And the analogy I like to use is used all the time, but it's the Apple experience. You sort of expect your AirPods to work with your iPhone to connect with your tablet and passwords to be shared across all of the Wi-Fi devices. And so that's just a natural expectation. And so to me, the last mile in systems as a systems

Starting point is 00:28:35 engineer, which is what I've been my whole life, was really the human experience. And so the first core design principle is, can we make the best developer experience? Can we make this super easy? Compare that or contrast it with existing other competitors where to just print a hello world, you need something like a data broker like Kafka, Zookeeper, Quorum Service, Schema Registry, and HTTP Proxies. You need four separate systems just to print hello world. And I was like, that's insane.

Starting point is 00:29:10 I've worked on systems that are much easier to use and probably have the same capabilities. And so for me, it was like, if we could deliver the user experience in a single file so that the mental model for the operators to put in a one, two, three computers and you're done, that's the deployment model. That'd be a huge win. And probably the reason why people have adopted us the most, right? Like it is like, that's just one example. And we have like a huge portfolio of that kind of example, which is if I don't want

Starting point is 00:29:40 to use it, then we just simply not build it. Like I will block product releases unless I want to use it. In fact, I time our product releases. It's like my time to wow for the console experience is to be 60 seconds and the Kubernetes is 130 seconds. And like I could go through an entire portfolio and the job is to wow the engineer within seconds of them touching the product. And so that's the first one simplicity two is performance and you know the analogy to electric cars is the zero to 60 right but for us is being able to take

Starting point is 00:30:12 a working example is we took we just took a company from 400 physical computers of the same type to 40 of the same computers it's just because just because we could do more with less, full stop. That was the only change is they turn off basically 10x more computers. And so that's really what performance is. And performance is really the sum of all your bad decisions. As a performance engineer, you just think of latency as the sum of all your bad decisions. And so there isn't one trick. There's a book of tricks. One is pre-allocating the memory, using a three-per-code architecture, using different interfaces, thinking about memory allocation and pooling

Starting point is 00:30:54 and ownership semantics, blah, blah, blah. We could talk about that for a really long time. But the second one was performance and the impact. It's about a 10x less computer. And then the last one was cost. In the context of a cloud-native era, can we leverage S3 or Google Cloud Bucket or Azure Blob Store to be the true disaggregation

Starting point is 00:31:14 of computer in a store? And so if you could deliver something that's easy to use, that's fast and relatively economical, then why wouldn't you build your application in a streaming mode, right? It just never comes across anyone that is like, oh, I want my reports to be at midnight. It's too fast. That never

Starting point is 00:31:30 happens. It's really mostly a historical context of this technology being difficult to use and expensive. Hopefully that gives you a sense. Oh, 100%. That's awesome. I have a feeling we can have an episode for just each one of the things you mentioned

Starting point is 00:31:46 there. But I want to go back to simplicity and focus a little bit more on that. And the reason is because that's also something that I have experienced with systems like this. Back in 2014, when we were building Blendo, we used Kafka. To be honest, for our use case, the performance and the cost at that point were not that important.

Starting point is 00:32:10 But we cared deeply about some specific guarantees that were coming with a system like this, and some capabilities that they were delivering to us. So we decided to go with it. And to be fair, anything that has to do with the guarantees themselves, like, they were delivered, right?

Starting point is 00:32:27 Like, it was great that, like, we managed to have that, like, especially for such, like, a small team that we had. But obviously, like, the whole experience was, like, far from, like, being simple, right? And when I say, in my mind, like, when we're talking about, like, developer experience and simplicity experience and simplicity, it's a very multifaceted kind of definition. You have simplicity in terms of what does it mean for a new developer who is onboarding the team to go and build an environment where they can work and replicate that work. Then there's the simplicity of operating this thing. You have your SREs there, like this thing like this, like just because it's fault tolerant, it doesn't mean that like it's on autopilot, right? Like someone has to be there, like babysit this thing.

Starting point is 00:33:16 And then that's like the parts that like, I want like to talk more about with you is the architectural simplicity, right? Like it's the, all these different components that you need to have like in place just to see on your logs that this thing is running, right? Like you mentioned, you need the schema registry, you need the brokers, you need like the zookeeper. Yep. So let's focus like little bit on that. Especially like, okay, like Zookeeper is not exactly like, let's say,

Starting point is 00:33:50 like I'm sure like many people have nightmares, like operating Zookeeper. But regardless of what it does, right, which is great, like it's not like an easy system to build, like in any case, right? And an important piece. But how do you remove all that complexity with Red Panda? Let's say I download the binary. How do I get the things that each one of these components give to me when I use Kafka?

Starting point is 00:34:21 Gosh, this is such a huge topic. I'm just summarizing my words here. I would say, you can, here's the thing about complexity. You can't eliminate complexity. You can only shift it around. And I can either make it your problem, or I can make it our problem. By and large, from a company philosophy, from a company standpoint, us, the storage team, you know, and I largely think of Red Panda as being a really sophisticated storage engine. We are the experts in the trade-offs and understand a lot of the nuances around, you know, like

Starting point is 00:35:01 whatever, whether it's lock contention or CPU contention or like, you know, or whatever, memory contention, all of these details that manifest in different ways, you know, at a high level, which is why you always over-provision, right? So here's the thing about complexity. Because you can't eliminate it, you have to make a choice. Is it either your problem or my problem? And by and large, we've said it is Red Panda's problems and it is our job to make it easy. And so, you know, a big part of why we adopted the Kafka protocol for context is we knew we could make a system fast.

Starting point is 00:35:34 That was, you know, sort of the company's DNA. We've been writing C++ for, what, 15 years before we started the company, I guess now 20 or so. And yeah, and so we could make it fast. We could do all of these things, but the API, right? There's this huge ecosystem of existing applications. And if I shoot up to one of our customers, let's pick Akamai, right? And they're like, hey, we have this cool new technology. How about you throw away the billion dollar revenue product? They just walk me out the door. That doesn't make any sense. And so being compatible was part of that simplicity start.

Starting point is 00:36:06 But to answer your question directly, when we first started working on this and I authored a lot of the original code, I tried other products actually, or other approaches, right? So I tried, you know, first I took the flood buffers compiler, I extended it with a couple of types.

Starting point is 00:36:23 It's a very Apache Arrow-like format. It was our own thing, catered to a super low latency with basically, you could assume Lido Indian CPU and do a pointer cast and have a bitarray layout and you could do microsecond level latencies with a bunch of these things and nobody wanted to use it. And it's like, okay, this is great and this is really fast. And people, they didn't understand in what space, what latency spectrum do you fit? You know, it was sort of like a much lower level thing. It was the same thing with the replication protocol.

Starting point is 00:36:54 I first started with chain replication and then, you know, then you have to figure out, okay, who watches the chain? Like, you know, what watchman watches the watchman kind of thing. And so you end up designing a system that looks a lot like a consensus protocol, like a Paxos. And so, you know, then we looked at Raft as the protocol implementation. I was like, okay, we could reason about these things. And you sort of start to look at all of these ideas, but fundamentally it was taking a product stance and saying, it is our problem and it is not your problem. So that when you go on installing Red Panda,

Starting point is 00:37:24 you don't have a thousand steps. Like the idea and a big lesson learned that I took from my time at Akamai was they had very small teams running massively large deployments, right? Over like whatever, half a million computers around the world. And so how is that possible?

Starting point is 00:37:41 Well, believe me, no one reads a thousand steps. Like you write code to run your code, to deploy code. That's just kind of how it works. And it's more mainstream now than it was, you know, maybe whatever, seven years ago, you know, or five years when I started the company. And so that was the core. And so we onboarded the complexity. We onboarded a bunch of the things.

Starting point is 00:38:01 We onboarded our own consensus protocol. We based it off of Raft. You know, we decided to onboard like the leadership election and the bootstrapping methodology and the cluster. We onboarded our own Kubernetes operator. We onboarded. So we tend to onboard the complexity ourselves so that we don't give you the complexity

Starting point is 00:38:18 as a thousand, you know, a step that you have to follow. And if you miss one, then you just have data corruption. Like that idea doesn't make any sense to me. Yeah, 100%. 100%, that makes like to follow. And if you miss one, then you just have data corruption. Like that idea doesn't make any sense to me. Yeah, 100%. 100% makes total sense. All right. So we talked about the simplicity. I have a question about

Starting point is 00:38:37 the technology and the experience around the technology. And the reason that I want to ask that is because one of the things that I find fascinating with these kind of systems is the diversity of people that have to interact with it. Being like a middleware in a way, right? Where you have your applications that they write data on it, and then you have downstream applications that might be owned by completely different things.

Starting point is 00:39:06 You have data engineers that they have to read the data out there to just store again the data somewhere else, right? But that creates a very interesting ecosystem inside the company that have to interact with this technology. And that complicates things a lot because a systems engineer is a different kind of beast compared to an application engineer or a data engineer or an SRE. Everyone speaks a slightly different language, right?

Starting point is 00:39:38 And have a slightly different needs. And my question is, and I would like to ask you that with a concrete example, actually. Let's say I'm a company that I have invested in having Kafka inside my system. Obviously, all these different people are working with Kafka. That's one way or another they have to figure out how to do. How is it for the organization to be like, okay, now we're going to take Kafka out and put Red Panda there.

Starting point is 00:40:10 And I hear you about the compatibility, like the API compatibility, and I get that like a hundred percent of like, literally like the only way that you can do that with such complicated infrastructure type of products. But give us a little bit more color of how does this translate for each one of these different personas that we have inside engineering? Yeah, so first let me answer the last. Let me answer there in reverse because it's easier. The migrations are relatively straightforward, and it really depends on how people use Kafka.

Starting point is 00:40:46 And so typically, let's take the example of a financial firm. And I say that because we have a ton of financial firms. And so the way they'll do it is that they'll put the two systems, and then they'll run it from 8 a.m. to 5 p.m. or 4 p.m. whenever the market closes. Next day, Red Panda has the last day worth of data and they just continue running on Red Panda, right? If you have a stateful migrations, we support MirrorMaker too. To MirrorMaker, to all of these tools, literally Red Panda looks exactly like a Kafka protocol.

Starting point is 00:41:16 No one could tell the difference. And to date, as a company, we haven't had a single customer that's touched their applications to interact with Red Panda. So that means years and years of code, they just simply pointed at Red Panda and go. The way, you know, I used to go on calls early on with the product and I said, hey, you can change the container from this, from whatever you're using, and then just plug in Red Panda and see if it works. And so in fact, our test container module, for those of you that use

Starting point is 00:41:50 test containers, you just change, you know, from Kafka test container to a Red Panda test container, and like your entire JVM application just continues to work. It's just faster. And so, so, so that compatibility was super strong and something I take really seriously. If we onboard a customer and they see an issue, they're like, okay, it's an issue with the product. It's not an issue with you. It's an issue with us. And we work really hard to make sure that we fix it right away. And so that's the migration. Now, in terms of sharing, you talk about a really challenging thing, which is the governance of this streaming data, who has access, like how do you interact with the 52 personas? And if you're a bank, you have to ML and AI engineer, and you also have your production engineers that are dealing with compliance

Starting point is 00:42:33 and regulator across 52 countries. And then there's like GDPR and data locality compliance. And so it's just such a gnarly and rich problem. So let me give you just the Hollywood highlights so that you can build on primitives rather than a specific answers. And when I can, I'll just give you examples on by and large, what. Adopting the Kafka access control list, right? So the default ACLs allows people is you can sync an out of band policy mechanism. So let's say Okta or whatever,

Starting point is 00:43:05 Active Directory or whatever it is. And we also integrate with Kerberos, right? And so you can have a centralized place of identity for both users and applications. And so it's the system-to-system communication that is really complicated. It's no longer, you know, Kostas is going to make a query on this and maybe tail the logs and you're using Kafka to see like, you know, it's price dropping is when you start to connect multiple systems and like each system has potentially a different security, you know, boundary and so on. And so the way most people do it today is you have some sort of centralized system and that'll sync eventually the lowest level of primitive is an alcohol and theOL protects people from reading, writing, querying metadata and so on.

Starting point is 00:43:46 And so your applications are there. Now, from an API perspective, if you use any of the Kafka API, that continues to work. And let me say one quick thing about the future, which is fundamentally different from every other streaming engine. So it builds on the richness of the previous answer, which is that's not enough. And the reason is it doesn't meet the developer where they are. And it is my job as a company builder to, it's like, well, you know, not everyone has

Starting point is 00:44:11 gone through the pain points or sophistication of truly understanding how to get value out of streaming data. So let me meet you where you are, which is you're using Red Panda to take data from your API endpoints into some form of database. I was like, let me do that really well. And the way we're going to do it really well is we're going to integrate with Apache Iceberg

Starting point is 00:44:31 as an end storage format on S3 so that today you can bring Snowflake and tomorrow Databricks and next day, whatever is, you know, whether it's, you know, ClickHouse or Dr. B or, you know, whatever. There's like, you know, ClickHouse or Dr. B or, you know, whatever. There's like, you know, a really large set of choices that the developer has on querying the data.

Starting point is 00:44:52 And so the way we meet those developers where they are today is in the tiered storage. This is something that we just announced today. So literally hasn't even been on any other podcast today. So if you're listening to this, you're the first person that's ever listened to that. For me is the future of our tiered storage format is going to be Apache Iceberg so that you can go from a stream to SQL, but not our SQL, your favorite SQL and your favorite SQL today could be Snow and tomorrow could be Databricks

Starting point is 00:45:18 and the next day could be whatever. And so, so hopefully that gives you an answer of when you're interacting in a reach ecosystem, you just have so many stakeholders that largely it could be ML engineers, could be AI, could be, I guess, you know, probably same department, but it could be your CIO looking at dashboards, real-time dashboards and so on. Does that make sense? Absolutely. And it's great to hear about like the integration with Iceberg. Like a big pain for like everyone, anyone who has ever had like to be a data engineer, like in an environment with streaming data at scale, ending up on a

Starting point is 00:45:55 data lake, they know how hard this is. So being able like to have like good integrations with these table formats and being able like to, able to not have to worry every time you are on a call that the pipeline will break and go back there and have to redo everything for the money reports. I think that's going to be a huge value for the people who are managing the data infrastructure. All right right we're getting like closer to the end and i have like two two small questions first of all i can't like close this episode without asking about the name right like you mentioned something at the beginning about like red panda but give us like the story behind it how you ended up like with such like a extremely cute animal it's like a family thing

Starting point is 00:46:46 okay so when i started the project i was living in miami and uh you know i had moved from new york i lived there for a really long time and i was in miami and i just built it right like i i didn't envision red planet to become what it was there but i wanted the product to exist and when you're an engineer it's very free and you open up your laptop and you just code it, right? Like you don't need to ask anyone for permission. You just could write the code. So I did.

Starting point is 00:47:12 And then I sent it to a bunch of friends. And this was at the time where, you know, the Uber for X or the app for X was super popular. And, you know, all of your family members were emailing you. It's like, hey, can you help me build the company? Like, I have this idea. And you're like, yeah, not really. And so I think we were all tired of getting emails from friends on the names.

Starting point is 00:47:33 And so I sent a survey to a bunch of friends and 80% of them. And so I embedded Red Panda as an Easter egg. And I have a bunch of nerd names in between. Obviously, Vectorize, which became the first name of the company and you know a bunch of I can't even remember anyways and so I added red panda because I thought you should like no one's gonna feel this thing and you know whatever so I still send it anyways most people responded and 80 of them chose red panda so that became the project name and of course in my in my head, I didn't listen. I just like, so I named the company differently.

Starting point is 00:48:07 It was started the company as Vectorize. But, you know, Red Panda took this own thing. And my partner at the time, she helped me chase a bunch of design firms around the world from like, you know, Europe and South America and the US and like four or five firms are working on this really cute 8-bit inspired mascot that looked like Mario Bros. That's how I envisioned that. And so that mascot took off. People loved it. And at some point, we just had to rebrand the company Red Panda. No one knew what Vectorize was, but everyone knew what Red Panda was. And the mascot

Starting point is 00:48:41 was just so cute. It was impossible to not like it. So we just had to name it and here we are. It's just, it took over the company. All right. That's awesome. That's an amazing story of like the power of like symbols in general and like language as part of like building a brand. All right. So we're here at the buzzer, as Eric usually

Starting point is 00:49:06 says. So one last thing. I know that you are making some very bold claims around performance, especially compared to Kafka. But you're also one of these people that they don't

Starting point is 00:49:21 make the claims. They're willing to be tested on that, right? So, and prove prove that can you tell us a little bit more because i've seen like on linkedin like some messages that have been like circulated like how this can be done yeah so first of all for those listening in if you're using Confluent, email me and I'll cut your bill in half or I'll give you money. Just kind of the bottom line of that and bottom line up front, it's usually easier. The TLDR is that our main competitor launched an attack on some personal blog post. And I was like, I know how much money it costs to run this. It costs you $90,000. It is impossible for you to run to run this i was like at least you should have the courage or put it on your main website so that

Starting point is 00:50:08 we could talk about it in public and so i was like you know up until that point we were really you know like we would never say that and so i was like okay well so if you're going to spend ninety thousand dollars i want to tell all of your customers that if you come to me i'll cut your bill in half or i'll give you money and As I stand behind that claim, anyone that comes to me, we can post the link of the campaign at the bottom of the podcast notes if people want to check it out. But yeah, super excited to be compatible with all your Kafka workloads. And thanks for having me. It's been a fun show. That's awesome. Thank you so much, Alex. And we're really looking forward to host you again on the show in the future.

Starting point is 00:50:46 All right, Costas, thanks for having me. Okay, Costas, I didn't make it back in time to hear the entire recording, but looking at the internal chat, it seems like at a minimum, they had some huge news about a fundraise, which is super exciting. But tell me what you learned. Yeah, first of all, Eric, we have to say that even the best couples need some distance. which is super exciting. But tell me what you learned. Yeah, first of all, Eric, we have to say that even the best couples need some distance sometimes.

Starting point is 00:51:12 Well, then we can still get a relationship there. So I think, yeah, I mean. Distance makes the heart grow fonder. Did you miss me? I mean, are you jealous if I say that I enjoyed talking with Alex without you? I don't know. I did miss you, yeah. Obviously, you always give a very unique dimension to the conversation,

Starting point is 00:51:35 the conversations that we have. That's why we were like the two of us there. But it was fun to talk with Alex, for sure. He's a very deeply technical person, obsessed with performance, which I think is also a reason why it made him such a good fit to go after this problem

Starting point is 00:51:54 because it is a very critical system that you need to have very strong guarantees when it comes to both performance and resilience. Well, Kafka is also such an important system. And so it was fascinating to chat with him, experience his passion about what he's building. And it pays off, right? They announced their next round of funding.

Starting point is 00:52:22 They raised like a hundred million. Okay, not in the best market out there for fundraising at this stage, which means that they are doing something right. And it seems also that there were good reasons why we didn't have more competition in this space in the past 10 years. But now it seems like there is the time for that so i would recommend to our audience like to tune in and like listen to alex about like talking about these technologies

Starting point is 00:52:56 how they were built why they were built the way they were built why we need a new paradigm today, what Red Panda can offer compared to a system like Kafka, and some very cool new technologies like Wasm, like Wemba Assembly that they are using and how they are incorporating this new infrastructure like Paradigm to really create a unique new experience and a unique new product that it's much better, let's say, addressing the needs of today compared to other systems. So I would suggest to everyone to tune in. He's a bright person, very smart, obviously, with very deep knowledge that he's sharing on this episode. And yeah, it will be fun for everyone to listen to it Awesome, well I am

Starting point is 00:53:48 so glad you learned about Kafka and Red Panda I am so disappointed I missed it but I'll be back on the next one. Yeah, you will Alright, well subscribe if you haven't, tell a friend and we have great episodes coming up

Starting point is 00:54:04 so stay tuned. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.

Starting point is 00:54:29 Learn how to build a CDP on your data warehouse at rudderstack.com.

Your Ad Here

The Data Stack Show - 156: Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.