The Data Stack Show - 156: Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data
Episode Date: September 20, 2023Highlights from this week’s conversation include:Alex’s background in the data space and the creation of Redpanda (4:23)The cost and complexity of streaming (11:07)The evolution of storage with Ka...fka (12:04)The distinction between streaming technologies (15:10)Simplicity as a Core Design Principle (27:03)Cost Efficiency in a Cloud Native Era (30:44)Removing complexity with Redpanda (34:21)Migrations and compatibility with Redpanda (40:35)The Future of Redpanda (43:44)The Story Behind Redpanda (46:45)Final thoughts and takeaways (50:25)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show. Acostas,
we get to talk about a topic actually, no pun intended, that I don't know if we've dug deeply
into on the show, and that's Kafka. And specifically, we're going to talk with Alex from Red Panda.
And I'll tell you what's interesting to me about Red Panda is that as widely used as Kafka is, Confluent is really one of the only major successful commercializations of
Kafka.
But Red Panda is doing some really cool stuff.
So I think we should get a 101 on Kafka
because we haven't covered a ton in depth on the show.
And then hear about what makes Red Panda unique
as sort of a way to run managed Kafka.
So that's what's interesting to me.
Yeah, 100%.
I mean, as you said,
I don't think we had
a specific episode in the past
about Kafka.
We had a few
about stream processing, although that's
a little bit different because Kafka is
not necessarily that
much about processing, but more about
resilient
transport of data at any scale,
which makes it like a very important component in many, like in pieces of like many like
infrastructures out there, like from many different companies.
And you're right.
Like we haven't seen outside of Confluent and okay, like the big providers, like the cloud providers with systems like Kinesis,
to actually build something to go after this particular market.
So having RedBand out there, I think it's extremely interesting.
And there are a couple of things to chat about.
Why do that? Why not do that? Why other people don why do that, right?
Why not do that?
Why other people don't do that, right?
Why we don't see like more competition in this space and see what it means like to build
something like Kafka after, like I think Kafka was like built like the beginning of 2010
or something like that.
So what's 10 years of innovation in technology
gives us as tools to go and build something similar.
And yeah, see how it is different,
what kind of different tooling it gives us
compared to Kafka.
And what it means to go after this market
and build a company there.
I think this is going to be super, super interesting.
It's a very hard
technology to get right and a very important
one to get right. You can't
fail with this data.
So let's go and see
what Alex has to say. Let's do it.
And unfortunately, I actually just had
something come up. So I'll
let you kick the call off and then I'll join
if I can. If not, I'll try to come back
for the intro.
You don't have to.
Let me enjoy the conversation.
It's fine.
All right.
I'll see if I can make it back by the end.
Okay.
Let's do it.
Hello, everyone, to another episode of the Data Stack Show.
I'm Kostas.
And as you probably already learned by now, when I do the introduction, it means that I'm going to be alone on the show, unfortunately.
But we have an amazing guest today, so hopefully we're going to compensate for missing Eric
with him.
So we have Alex Galego, he's co-founder and CEO of Red Panda.
And we're going to talk about a few extremely exciting things today that have to do with Kafka, Kinesis, Red Panda, and technologies around that.
So welcome, Alex.
Very nice to have you here.
Thanks for having me.
Good to be here.
So let's start with a quick personal introduction.
Let's tell us a little bit about yourself, because you're not just like a CEO and a co-founder.
You also wrote a lot of the code behind Red Panda.
And these systems tend to be quite complex.
So it would be awesome to hear about you, your background, and your journey building Red Panda.
Thanks for asking.
So I guess by means of introduction, I've been working in streaming for almost about
just about 14 years, which is mind-blowing that you could work on a single problem for
so long and still find so much richness.
I mean, I could probably build two or three systems following up if I was working on Red
Panda. So yeah, you know, anyways, I was, I went to school for, largely I was
trying to focus on cryptography. I ended up dropping out of a bunch of programs, graduate
programs, and I went to start building distributed systems because I just found them a little
bit more fun than breaking things. This was early on in my career. I went to work for an attic in New
York where I first started working with, you know, the first couple of versions of, I guess,
Zookeeper in 2010, 2011, somewhere around there and Kestrel and then Kafka and so on. And so,
yeah, so my journey started really early on. I ended up testing Storm.
And then that kind of those ideas led to me writing the code for the first startup that
I sold to Akamai.
It was called Concord.
I also kind of authored the original first part of that engine.
And then we went on to build another small company around it.
And so Concord was cool.
It was a compute platform that was different than, you know, frankly, most streaming platforms
today.
And so it was really like container-based, single-threaded, C++ execution engine.
It was really more like a quasi-envoy, you know, the C++ proxy with like language runtimes
on top.
And so it was pretty cool.
We sold that company to Akamai in 2016.
And so Red Panda came out of this deeply technical background where, you know, as an engineer,
I just couldn't understand where performance was going. And if you were looking at a couple
of computers, like I don't understand where the latency is coming from. And so the first ideas for
Red Panda came about in 2017, where I took two edge computers
and I connected them back to back with a single SPF cable.
No routers, no switches, just two computers and a cable.
And I was just like, I just want to measure what's the gap between hardware and state
of the art software.
And then I went to when I wrote something in C++, you know, it's really mostly the idea, some of the core ideas right there.
It's like, well, what is the gap actually in both in throughput and latency?
I gave a talk about in 2017.
In my mind, I was like, you know, there's like a couple of companies working on this.
They'll figure this out.
They're really bright.
And they didn't work on those ideas.
And so, yeah, I spent the next few years just trying to understand why, you know, basically as an engineer, there's no magic, right? If the job is to save data on disk,
then the job is to save data on disk. There's no two ways about it, right? So there's like
that essential complexity. But when you look deep down, you learn that if you take a new approach,
sort of design for the modern hardware, you could get this category called performance improvement.
And so with that came this whole design possibilities.
I was like, hey, if I were to start from scratch,
what would I do differently?
You know, what choices?
Like, how would I think about architecting
the next generation of streaming?
What would it mean for the engineer?
And so eventually that gave the birth of Red Panda,
you know, the company and product,
which is kind of fun because it was first an Easter egg.
And we'll talk about the naming later in the show.
That's Red Panda.
That's how it started.
And those were the roots of the technology.
All right.
So I have like a couple of like historical questions, to be honest, because I have someone here who has been like through the evolution of streaming systems, but also being part of this evolution of streaming systems. So I remember like back then in, I don't know, like let's say the beginning of
like 2010, maybe a little bit like earlier than that, there was that kind
of like explosion of like systems that came out from places like Twitter with
like Storm, we had Samza, we had Kafka obviously that became like, like
dominated for a while.
My question is, all these systems that they appeared back then,
most of them disappeared, right?
They didn't make it, let's say, at the end,
outside of probably, yeah, like, okay,
Confluent made it to the public markets,
but we didn't see more happening there.
Even with Samza, right?
Not Samza, sorry, Link.
Now we see again the market getting interested again into that.
What is the reason, in your opinion, that in the streaming space this happened?
Out of all these products, we didn't end up with more successions.
The Achilles heel of streaming has always been cost and complexity.
And for those of us in the room that had to trace, gosh, in 2011, I was writing Scala
deploying that.
Scala and Java deploying that on a closure runtime, which was storm. And when like the Nimbus worker decided to stop, you're like, you get a stack trace that is the
equivalent of like zero exit that beef, right? You're just like, I have no idea what this means.
And you end up, you know, debugging the transitive closures. And like, if you ended up even,
you know, important in some of the more sophisticated JVM libraries back in the day,
like algebra, which, you know, thankful to Twitter for publishing some cool stuff.
It was just gnarly. Really, there weren't just many. And here's the thing about compute in
particular. I think a storage, we should separate compute and storage differently. And so when I
think about compute, you know, I think about my previous company, Concord. I think about Apache Storm.
I think about Flink.
I think about new approaches today, like Bitewax and Materialize and, you know, and so on.
Rising Wave, there's a huge host, decodable as a host.
So compute, it's its own layer.
And then storage is its own layer. And I think what was meaningful back then is that most of us didn't have the scale of
Twitter, but we were growing super fast.
And so we took what they were doing because it was in the open and you can just get cloned
and it was both a blessing and a curse.
A curse later when you had to debug it, but a blessing because you could get started quickly,
right?
That was a blueprint.
And then you could do like large-scale things what most people didn't
realize is the cost of operationalizing that was inordinately expensive it's kind of like
the promise of the hadoop world that you know materialized for like four companies in the world
no i'm kidding but you know it was really hard to to actually extract value out of these things it
just became computationally expensive and like manpower expensive and three people in the
company knew.
And once they left, you're like, I have no idea how this part of the system works.
And so cost and complexity have always been the Achilles heel of streaming.
And so two things have happened.
One, I think managed services like Red Fund the Cloud and Confluent Cloud and MSK, et
cetera, has made onboarding some of the technologies easier, not necessarily simpler.
I'm going to talk about, you know, easy is different than simple, right?
Complexity is a different metric for me.
And then, you know, on the storage side, just to give a glimpse of that, when I, you know, when you first started and you started messaging, when you look at actually the history and the evolution of the storage with Tipco and Solas, it originated with you showing up to the data center and those of us that had to wire data centers in like, whatever, Secaucus, New Jersey or something like that.
And it was like miserably freaking cold when you showed up to the data center and you had to wire these things.
It was awful.
And, you know, kind of fun because you spend like six days in the freezing cold but anyways you would get these computers and then you would rack them physically yeah and then people
would charge you money right like vendors would charge you money for the number of tcp connections
and that just sort of didn't scale well with the way modern you know i guess now web 2.0 applications like like
a twitter and so i think the pain points that kafka solved and i'll talk about the other evolution of
the big ideas over the last decade is that they took off the shelf cheap computers with the spinning
disk and then made the software a little bit more intelligent so that, you know,
it could just work with like scaling at the time. And then that's when like most people started
really adopting Amazon, you know, still a janky experience early on. Like now the clouds are so
sophisticated for those of us that had to debug like networking issues back in Amazon. And they,
anyways, you know, then you would just scale
by adding cheap computers
and it made building
and shipping products super easy.
And so to me,
that was the key idea
early on in streaming
is that you had a blueprint
that you could copy
and potentially you could work yourself
into a success
if you've managed to hire
really talented engineers.
And that was really promising, right?
I mean, if you were, we were a young ad tech company in New York competing with Google
and we won, for the record, it was fun.
We won like, I think New York Times, Forbes, Reuters, MSNBC, et cetera, for a while on
mobile traffic.
So we were like, hey, we're winning.
And to us, we were like, you know, whatever, we onboarded this complexity, but we were making money. And so that I think that was the keystone idea back then is like, systems and for me that was a huge source of
inspiration into building out two companies in the streaming space and you know probably the next
five or ten i mean who knows uh but i still find it super exciting today 100 okay i have like one
more question about like the streaming systems so we tend like to talk about like streaming
platforms and it's like an umbrella term for all of them, but there
are some differences between them. In my mind, I can't really compare Kafka to Flink. There are
some fundamental differences. I instinctively think of Flink whenever I want to do like some heavy streaming
processing with like a complex state that I want to have like guarantees around that.
Like pretty much like what I would do with like a SQL query on a data warehouse, but
I want to do it like on a stream of data, right?
Well, when I think about like Kafka, I think more of topics and data and
guarantees around this data and making
sure that the data is not going to get lost
and being able to
accommodate
throughput and latency
requirements.
But I never think of
out of the box, I'll take
Kafka and start doing
some crazy stateful
processing on top of it.
Does this make sense as a distinction between the streaming
technologies out there or not?
Yeah, I agree.
I was trying to allude to that on my previous answer when I think about
Flink as compute and Kafka as a storage.
And so if you think on a storage front,
and the reason for this is that streaming overall
is really the ideas that you take a little bit of a storage,
a little bit of compute,
and then you sort of chain it together.
And at the end, you have something useful like Uber
or DoorDash or fraud detection for a bank
or oil and gas pipeline, you know, anomaly detection
or IoT, right?
But it is the combination, the chaining of combining compute and storage.
So in most streaming systems, you need both, unless it's something like, actually in all,
let me put it there.
Even for the simplest things, the reason why I think compute is a little bit more challenging
to, for, you know for vendors, et cetera,
is that with compute,
you could do anything, right?
Like you set up a cron Python script
and maybe the supervision
is you get page
when the Python script crashes.
But whatever, right?
Maybe you accept that risk
as an engineer, as a business,
because you have two customers
and they're paying you $3 a month
and you're like, well, whatever.
I'm not going to pay for the additional
complexity.
And over time, I think people just tend to graduate to more sophisticated compute platforms
like a Flink, you know, based platform.
And so now on the storage side, that's kind of the core thing, you know, and so you can't
really trade off.
Like if, as an engineer, if I send data to a storage engine,
you expect to retrieve back the data, like full stop.
At the highest level, this is really how engineers think about it.
If you store my data, then I'm going to send you data
and then I'm going to query it back.
And so on the storage platform, which is where Red Panda sits today,
we borrowed the modeling mechanics of the Kafka API,
which for those listening in,
you can think of the Kafka API as a topic,
as an unordered collection of items.
And a topic is broken down into totally ordered partitions.
And so it's an unordered of collections of totally ordered sub collections.
You can think about it. It's like a map of lists. If you're thinking in data structures,
you know, an algorithm, it's like, and you either consume, you typically consume from the head or
tail, depending on your mental model, and then you can truncate it. Right. And so that's generally
the Kafka model. And that proves to be just enough to be really useful for, you know, data engineers or system
engineers trying to build higher levels.
But you need both at all times.
You need some form of compute, even if it's an in-house Python script, you know, supervised
by, by Cron, or I guess now, you know, the cool kids are doing Amazon Lambda or whatever.
Like that's, you need that layer somehow, right?
Because you need to do something with the data.
And you also need the storage now to give you semantics around transactionality or around,
you know, safety guarantees, not losing your data or about throughput or latency, right?
Like, and those, you can't really just build it incrementally, right?
It is like most people today that you could, most people don't go and build a database and then build a business, right? It is like most people today that you could, most people don't go and build a
database and then build a business, right? You sort of buy a Postgres or you buy a Red Panda
or you buy a Snowflake or you buy, just people buy into storage engines more. And so that's
where Red Panda is. And hopefully this makes sense for everyone listening in.
Yeah, it makes a lot of sense. So going back to Red Panda, Red Panda is closer to the storage
or the processing
or it's equally, let's say, both.
Great question.
If you had asked me that question
yesterday or a couple of days,
I would have answered that differently.
Let me tell you what,
by the time people listen to this,
we would have announced
our CRC funding.
So prior to this conversation,
you know, we're strictly on the storage side.
And largely, I still think this is where
the largest value that we provide to customers, right?
If you think of Red Panda,
you can think of it like a drop-in replacement for Kafka.
But, you know, a car is a car,
but if you step into it,
we're more like stepping into an electric vehicle, right?
Like with ludicrous speed mode.
So just giving an analogy for the people in the room, but we're just announcing this idea
of keeping the simple things simple with WebAssembly.
So largely we're still a storage engine, but we're starting to expose some of compute things.
And the reason is if you're a data engineer, you know that the majority of your time is spent doing non-sexy things.
You take a JSON object and you make it Avro, you take Avro and you make it Protobuf, or you take, you know, an endpoint and then you enrich it with the IP address.
Is this fraudulent?
That's just where the bulk of the data pipelines are.
And it's just kind the bulk of the data pipelines are. And it's
just kind of what it is. And so with WebAssembly, you can now do that at the storage engine level.
And so it's not designed to compete with the flinks or the stateful sort of higher level
order databases that are super sophisticated, multi-way mergers, like you were mentioning.
It's really designed to be
complementary from a mental model of the engineer building a data pipeline. And so if they have this
one-way functions, like convert JSON to protobuf, or enrich a JSON object with an IP address, or
take an object and give it a chat GPT score, it doesn't matter. Those kinds of simple things,
one shot at transforms.
Our web assembly engine is really good at that.
And so we just announced investing in that, you know, a ton of money, which is going to be fun to see how that matures.
And so largely to answer your question specifically, yes, we're mostly a storage engine and we
just started to expose a little bit of the compute.
And then we'll talk about, I think Apache Iceberg for the data engineers is like a way
of trying to continue to simplify their architecture.
Okay, that's super cool.
First of all, congrats on the round.
I mean, it's kind of amazing to be able to raise a growth round right now
where everyone says that the checkbooks are closed for this round.
So I think this says a lot about the growth of the company
and what you are doing there
and the impact that the company has.
And also congrats on building these new capabilities on the storage engine.
And one quick question.
You mentioned WebAssembly.
Why WebAssembly?
What's the reason of exposing WebAssembly as the way to interface with writing these functions?
Yeah, WebAssembly, I know to some of the engineers listening to this, they feel like WebAssembly is like self-driving cars.
It's always coming and you're like, well, when is it actually going to come?
Is it a decade or is it two years?
And in part, I feel a little bit guilty of being people have been pushing WebAssembly
for a while since 2020, right?
We're like one of the first storage engines.
And then we inspired other companies to go on building WebAssembly and so on.
I know because the engineers worked on those features.
DM me.
It's like, hey, how did you do it?
And like, you know, this is cool.
Let's chat.
So why WebAssembly?
First of all, multi-tenancy isolation etc like when you start
to expose some of the internal mechanics to programmers there is a person that will write
a for loop that is an infinite for loop and it'll just take down your cluster it's just a matter of
time that's what engineers we were much better i think at breaking systems than building. No, I'm just kidding.
We're also good at building, but the point is if you expose an API to a programmer, they'll
just find a way to break the system.
It's just what programmers do.
Because for fun, you're like, oh, well, what happens if I do this?
It's just how you discover products.
You have no idea.
And so you test it and then you take down you know
an entire system it's like how many of us you know blocked the entire code base when we were all using
perforce 15 years ago and then you you go away for the weekend and there's a hot patch and you get
called and so so anyway so so going back to in theory allows people to write in their favorite
language uh so we don't have to say you have to write in Rust
or Go or JavaScript or C++,
which is how our storage engine is written.
You could write in your favorite programming language.
As long as it transpiles
to this intermediary representation,
we can execute it safely.
And so as an engineer, you now get exposed.
I guess Red Panda becomes more like Transformer, Optimus Combiner worse,
where you have a little robot and then you add different pieces and now you have a bigger robot
that is finding megatrons or whatever. That's the idea behind WebAssembly. Can we teach the
storage engine new capabilities, domain-specific capabilities? An example of domain-specific
capabilities is GDPR compliance. Let me strip your social security number right before you write it on disk or right after you read it from disk.
Or maybe let me teach Red Panda data placement guarantees.
And so if you have a global cluster, can you, the programmer that has infinite business context or larger definitely
than our team can you write a data placement so data doesn't leave germany or later data doesn't
leave paris or data doesn't leave new york it doesn't matter right so those kinds of exposing
the business constraints that's why wasm and so one it allows people to write in their favorite
language and two it allows us to sort of give the developers
this like Transformers-like capability
where you just add domain context onto it.
In practice, really, we're going to launch first with Go.
We tried to launch with JavaScript in the past
and that had some, you know, adoption,
but not the kind of adoption that I was hoping for.
I think Go strikes a good balance
between ease of use from a developer perspective and reasonable good performance when compiled to
WebAssembly. And so those are the practical limitations that we've been working on. And so
we've tested almost every available WebAssembly engine in the world by now. So anyways, that's
why WebAssembly, I think we have a lot of excitement
pent up for that. That's super exciting. I'd love to play around with it, to be honest.
All right, so we already mentioned some of the capabilities that Red Panda brings on the table
compared to the older generation of streaming platforms like Kafka. But okay, let's say WebAssembly is the new shiny toy.
You started doing things differently from the inception of Red Panda, exactly because you
saw the limitations that existed in these products. Tell us a little bit more about that.
If you had to summarize, let's say, in a foundational context,
what are the three, four, let's say, very different things
that Red Panda does compared to the other systems
to deliver the same service, the same value, but in a much better way?
Going back with the electric car, I think electric cars deliver a different value.
They basically made the hypercars obsolete.
Like the zero to 60 no longer makes any sense as a selling point because electric cars are
so fast.
But I'll tell you the ideas that we focused on that are different from the focus of all
the platforms.
It's not that they couldn't technically do it.
It's just a different set of decisions early on that have had this huge ramification in
terms of what the final product looks like, right?
And so let's talk about them.
There are three core pillars on the product.
All of them, the overall umbrella, the way I think about building companies and so on
is if we make the engineer hands-on keyboard behind a terminal, the hero of our
story will be a massive financial success. And so my job has always been to obsess maniacally over
like, is the engineer actually successful? I know that I can get a CIO to sign a check,
a large check if we make their, you know, product engineers actually super successful with the
platform.
So that's always been my obsession as a founder, as an engineer, and also because it's sort of how I grew up, technically speaking.
And so there are three core tenets.
One was simplicity.
And the analogy I like to use is used all the time, but it's the Apple experience.
You sort of expect your AirPods to work with your iPhone to connect with your tablet and passwords to be shared across all of the Wi-Fi devices.
And so that's just a natural expectation. And so to me, the last mile in systems as a systems
engineer, which is what I've been my whole life, was really the human experience. And so the first
core design principle is, can we make the best developer experience?
Can we make this super easy?
Compare that or contrast it with existing other competitors where to just print a hello
world, you need something like a data broker like Kafka, Zookeeper, Quorum Service, Schema
Registry, and HTTP Proxies.
You need four separate systems just to print hello world.
And I was like, that's insane.
I've worked on systems that are much easier to use and probably have the same capabilities.
And so for me, it was like, if we could deliver the user experience in a single file so that
the mental model for the operators to put in a one, two, three computers and you're
done, that's the deployment model.
That'd be a huge win.
And probably the reason why people have adopted us the most, right?
Like it is like, that's just one example.
And we have like a huge portfolio of that kind of example, which is if I don't want
to use it, then we just simply not build it.
Like I will block product releases unless I want to use it.
In fact, I time our product releases.
It's like my time to wow for the console experience is to be 60 seconds and the Kubernetes
is 130 seconds.
And like I could go through an entire portfolio and the job is to wow the engineer within
seconds of them touching the product.
And so that's the first one simplicity two is performance and you know the analogy to electric cars is the zero to 60 right but for us is being able to take
a working example is we took we just took a company from 400 physical computers of the same
type to 40 of the same computers it's just because just because we could do more with less, full stop. That was the
only change is they turn off basically 10x more computers. And so that's really what performance
is. And performance is really the sum of all your bad decisions. As a performance engineer,
you just think of latency as the sum of all your bad decisions. And so there isn't one trick.
There's a book of tricks.
One is pre-allocating the memory, using a three-per-code architecture,
using different interfaces, thinking about memory allocation and pooling
and ownership semantics, blah, blah, blah.
We could talk about that for a really long time.
But the second one was performance and the impact.
It's about a 10x less computer.
And then the last one was cost.
In the context of a cloud-native era,
can we leverage S3 or Google Cloud Bucket
or Azure Blob Store to be the true disaggregation
of computer in a store?
And so if you could deliver something that's easy to use,
that's fast and relatively economical,
then why wouldn't you build your application
in a streaming mode, right?
It just never comes across anyone
that is like, oh, I want my reports to be
at midnight. It's too fast. That never
happens. It's really mostly
a historical context of this technology
being difficult to use and expensive.
Hopefully that gives you a sense.
Oh, 100%. That's awesome.
I have a
feeling we can have an episode for
just each one of the things you mentioned
there.
But I want to go back to simplicity and focus a little bit more on that.
And the reason is because that's also something that I have experienced with systems like
this.
Back in 2014, when we were building Blendo, we used Kafka.
To be honest, for our use case,
the performance and the cost at that point
were not that important.
But we cared deeply about
some specific guarantees that were coming
with a system like this, and some
capabilities that they were
delivering to us.
So we decided to go with it.
And to be fair,
anything that has to do with the guarantees themselves, like, they were delivered, right?
Like, it was great that, like, we managed to have that, like, especially for such, like, a small team that we had.
But obviously, like, the whole experience was, like, far from, like, being simple, right?
And when I say, in my mind, like, when we're talking about, like, developer experience and simplicity experience and simplicity, it's a very multifaceted kind of definition.
You have simplicity in terms of what does it mean for a new developer who is onboarding the team to go and build an environment where they can work and replicate that work.
Then there's the simplicity of operating this thing.
You have your SREs there, like this thing like this, like just because it's
fault tolerant, it doesn't mean that like it's on autopilot, right?
Like someone has to be there, like babysit this thing.
And then that's like the parts that like, I want like to talk more about with you
is the architectural simplicity, right?
Like it's the, all these different components that you need to have like in place just to see
on your logs that this thing is running, right?
Like you mentioned, you need the schema registry, you need the brokers, you need like the zookeeper.
Yep.
So let's focus like little bit on that.
Especially like, okay, like Zookeeper is not exactly like, let's say,
like I'm sure like many people have nightmares, like operating Zookeeper.
But regardless of what it does, right, which is great,
like it's not like an easy system to build, like in any case, right?
And an important piece.
But how do you remove all that complexity with Red Panda?
Let's say I download the binary.
How do I get the things that each one of these components
give to me when I use Kafka?
Gosh, this is such a huge topic.
I'm just summarizing my words here.
I would say,
you can, here's the thing about complexity. You can't eliminate complexity. You can only
shift it around. And I can either make it your problem, or I can make it our problem.
By and large, from a company philosophy, from a company standpoint, us, the storage team,
you know, and I largely think of Red Panda as being a really sophisticated storage engine.
We are the experts in the trade-offs and understand a lot of the nuances around, you know, like
whatever, whether it's lock contention or CPU contention or like, you know, or whatever, memory contention, all of these details that manifest in different
ways, you know, at a high level, which is why you always over-provision, right?
So here's the thing about complexity.
Because you can't eliminate it, you have to make a choice.
Is it either your problem or my problem?
And by and large, we've said it is Red Panda's problems and it is our job to make it easy.
And so, you know, a big part of why we adopted the Kafka protocol for context is we knew
we could make a system fast.
That was, you know, sort of the company's DNA.
We've been writing C++ for, what, 15 years before we started the company, I guess now
20 or so.
And yeah, and so we could make it fast. We could do all of these
things, but the API, right? There's this huge ecosystem of existing applications. And if I
shoot up to one of our customers, let's pick Akamai, right? And they're like, hey, we have
this cool new technology. How about you throw away the billion dollar revenue product? They just walk
me out the door. That doesn't make any sense. And so being compatible was part of that simplicity start.
But to answer your question directly,
when we first started working on this
and I authored a lot of the original code,
I tried other products actually,
or other approaches, right?
So I tried, you know,
first I took the flood buffers compiler,
I extended it with a couple of types.
It's a very Apache Arrow-like
format. It was our own thing, catered to a super low latency with basically, you could assume
Lido Indian CPU and do a pointer cast and have a bitarray layout and you could do microsecond level
latencies with a bunch of these things and nobody wanted to use it. And it's like, okay, this is
great and this is really fast. And people, they didn't understand in what space, what latency spectrum do you
fit?
You know, it was sort of like a much lower level thing.
It was the same thing with the replication protocol.
I first started with chain replication and then, you know, then you have to figure out,
okay, who watches the chain?
Like, you know, what watchman watches the watchman kind of thing.
And so you end up designing a system that looks a lot
like a consensus protocol, like a Paxos. And so, you know, then we looked at Raft as the protocol
implementation. I was like, okay, we could reason about these things. And you sort of start to look
at all of these ideas, but fundamentally it was taking a product stance and saying,
it is our problem and it is not your problem. So that when you go on installing Red Panda,
you don't have a thousand steps.
Like the idea and a big lesson learned
that I took from my time at Akamai was
they had very small teams
running massively large deployments, right?
Over like whatever,
half a million computers around the world.
And so how is that possible?
Well, believe me, no one reads a thousand steps.
Like you write code to run your code, to deploy code.
That's just kind of how it works.
And it's more mainstream now than it was, you know, maybe whatever, seven years ago,
you know, or five years when I started the company.
And so that was the core.
And so we onboarded the complexity.
We onboarded a bunch of the things.
We onboarded our own consensus protocol.
We based it off of Raft.
You know, we decided to onboard like the leadership election
and the bootstrapping methodology and the cluster.
We onboarded our own Kubernetes operator.
We onboarded.
So we tend to onboard the complexity ourselves
so that we don't give you the complexity
as a thousand, you know, a step that you have to follow.
And if you miss one, then you just have data corruption.
Like that idea doesn't make any sense to me.
Yeah, 100%. 100%, that makes like to follow. And if you miss one, then you just have data corruption. Like that idea doesn't make any sense to me. Yeah, 100%.
100% makes total sense.
All right.
So we talked about the simplicity.
I have a question about
the technology and the experience
around the technology.
And the reason that I want to ask that is because one of the things that I find fascinating with these kind of systems
is the diversity of people that have to interact with it.
Being like a middleware in a way, right?
Where you have your applications that they write data on it,
and then you have downstream applications that might be owned by completely different
things.
You have data engineers that they have to read the data out there to just store again
the data somewhere else, right?
But that creates a very interesting ecosystem inside the company that have to interact with
this technology.
And that complicates things a lot because a systems engineer is a different kind of beast
compared to an application engineer
or a data engineer or an SRE.
Everyone speaks a slightly different language, right?
And have a slightly different needs.
And my question is,
and I would like to ask you that with a concrete example, actually.
Let's say I'm a company that I have invested in having Kafka inside my system.
Obviously, all these different people are working with Kafka.
That's one way or another they have to figure out how to do.
How is it for the organization to be like, okay, now we're going to take
Kafka out and put Red Panda there.
And I hear you about the compatibility, like the API compatibility, and I get
that like a hundred percent of like, literally like the only way that you
can do that with such complicated infrastructure type of products.
But give us a little bit more color of how does this translate for each one of these different personas that we have inside engineering?
Yeah, so first let me answer the last.
Let me answer there in reverse because it's easier.
The migrations are relatively straightforward,
and it really depends on how people use Kafka.
And so typically, let's take the example of a financial firm.
And I say that because we have a ton of financial firms.
And so the way they'll do it is that they'll put the two systems, and then they'll run it from 8 a.m. to 5 p.m. or 4 p.m. whenever the market closes.
Next day, Red Panda has the last day worth of data and they just continue running on Red Panda, right?
If you have a stateful migrations,
we support MirrorMaker too.
To MirrorMaker, to all of these tools,
literally Red Panda looks exactly like a Kafka protocol.
No one could tell the difference.
And to date, as a company,
we haven't had a single customer that's touched their applications
to interact with Red Panda.
So that means years and years of code, they just simply pointed at Red Panda and go.
The way, you know, I used to go on calls early on with the product and I said, hey, you can
change the container from this, from whatever you're using, and then just plug in Red Panda
and see if it works. And so in fact, our test container module, for those of you that use
test containers, you just change, you know, from Kafka test container to a Red Panda test container,
and like your entire JVM application just continues to work. It's just faster. And so,
so, so that compatibility was super strong and something I take really seriously.
If we onboard a customer and they see an issue, they're like, okay, it's an issue with the product.
It's not an issue with you. It's an issue with us. And we work really hard to make sure that we fix
it right away. And so that's the migration. Now, in terms of sharing, you talk about a really
challenging thing, which is the governance of this streaming data, who has access,
like how do you interact with the 52 personas? And if you're a bank, you have to ML and AI engineer, and you also have your production engineers that are dealing with compliance
and regulator across 52 countries. And then there's like GDPR and data locality compliance.
And so it's just such a gnarly and rich problem. So let me give you just the Hollywood highlights
so that you can build on
primitives rather than a specific answers.
And when I can, I'll just give you examples on by and large, what.
Adopting the Kafka access control list, right?
So the default ACLs allows people is you can sync an out of band policy mechanism.
So let's say Okta or whatever,
Active Directory or whatever it is. And we also integrate with Kerberos, right?
And so you can have a centralized place of identity for both users and applications.
And so it's the system-to-system communication that is really complicated. It's no longer,
you know, Kostas is going to make a query on this and maybe tail the logs and you're using
Kafka to see like, you know, it's price dropping is when you start to connect multiple systems and like each system
has potentially a different security, you know, boundary and so on. And so the way most people
do it today is you have some sort of centralized system and that'll sync eventually the lowest
level of primitive is an alcohol and theOL protects people from reading, writing, querying metadata and so on.
And so your applications are there.
Now, from an API perspective,
if you use any of the Kafka API, that continues to work.
And let me say one quick thing about the future,
which is fundamentally different from every other streaming engine. So it builds on the richness of the previous answer,
which is that's not enough.
And the reason is it doesn't meet the developer where they are.
And it is my job as a company builder to, it's like, well, you know, not everyone has
gone through the pain points or sophistication of truly understanding how to get value out
of streaming data.
So let me meet you where you are, which is you're using Red Panda to take data from your
API endpoints
into some form of database.
I was like, let me do that really well.
And the way we're going to do it really well
is we're going to integrate with Apache Iceberg
as an end storage format on S3
so that today you can bring Snowflake
and tomorrow Databricks
and next day, whatever is, you know,
whether it's, you know, ClickHouse or Dr. B
or, you know, whatever. There's like, you know, ClickHouse or Dr. B or, you know, whatever.
There's like, you know, a really large set of choices
that the developer has on querying the data.
And so the way we meet those developers where they are today
is in the tiered storage.
This is something that we just announced today.
So literally hasn't even been on any other podcast today.
So if you're listening to this,
you're the first person that's ever listened to that. For me is the future of our tiered storage
format is going to be Apache Iceberg so that you can go from a stream to SQL, but not our SQL,
your favorite SQL and your favorite SQL today could be Snow and tomorrow could be Databricks
and the next day could be whatever. And so, so hopefully that gives you an answer of when you're interacting in a reach ecosystem,
you just have so many stakeholders that largely it could be ML engineers, could be AI, could
be, I guess, you know, probably same department, but it could be your CIO looking at dashboards,
real-time dashboards and so on.
Does that make sense?
Absolutely.
And it's great to hear about like the integration with Iceberg. Like a big pain for like everyone, anyone who has ever had like to be a data
engineer, like in an environment with streaming data at scale, ending up on a
data lake, they know how hard this is.
So being able like to have like good integrations with these table formats and being able like to, able to not have to worry every time you are on a call that the pipeline will break and go back there and have to redo everything for the money reports.
I think that's going to be a huge value for the people who are managing the data infrastructure.
All right right we're
getting like closer to the end and i have like two two small questions first of all i can't like
close this episode without asking about the name right like you mentioned something at the beginning
about like red panda but give us like the story behind it how you ended up like with such like a
extremely cute animal it's like a family thing
okay so when i started the project i was living in miami and uh you know i had moved from new york
i lived there for a really long time and i was in miami and i just built it right like i i didn't
envision red planet to become what it was there but i wanted the product to exist and when you're
an engineer it's very free and you open up your laptop and you just code
it, right?
Like you don't need to ask anyone for permission.
You just could write the code.
So I did.
And then I sent it to a bunch of friends.
And this was at the time where, you know, the Uber for X or the app for X was super
popular.
And, you know, all of your family members were emailing you.
It's like, hey, can you help me build the company?
Like, I have this idea.
And you're like, yeah, not really.
And so I think we were all tired of getting emails from friends on the names.
And so I sent a survey to a bunch of friends and 80% of them.
And so I embedded Red Panda as an Easter egg.
And I have a bunch of nerd names in between.
Obviously, Vectorize, which became the first name of the company and you know a bunch of I can't even remember anyways and so
I added red panda because I thought you should like no one's gonna feel this thing and you know
whatever so I still send it anyways most people responded and 80 of them chose red panda so that
became the project name and of course in my in my head, I didn't listen.
I just like, so I named the company differently.
It was started the company as Vectorize.
But, you know, Red Panda took this own thing.
And my partner at the time, she helped me chase a bunch of design firms around the world
from like, you know, Europe and South America and the US and like four or five firms are
working on this really cute
8-bit inspired mascot that looked like Mario Bros. That's how I envisioned that. And so
that mascot took off. People loved it. And at some point, we just had to rebrand the company
Red Panda. No one knew what Vectorize was, but everyone knew what Red Panda was. And the mascot
was just so cute. It was impossible to not like it. So we just had to name it and here we are.
It's just, it took over the company.
All right.
That's awesome.
That's an amazing story of like the power of like symbols in general and like language
as part of like building a brand.
All right.
So we're here at the buzzer, as Eric usually
says. So one
last thing. I know that
you are
making some
very bold claims
around performance, especially compared to
Kafka. But you're also
one of these people that they don't
make the claims. They're willing to
be tested on that, right? So, and prove prove that can you tell us a little bit more because i've seen like on linkedin
like some messages that have been like circulated like how this can be done yeah so first of all
for those listening in if you're using Confluent, email me and I'll cut
your bill in half or I'll give you money. Just kind of the bottom line of that and bottom line
up front, it's usually easier. The TLDR is that our main competitor launched an attack on some
personal blog post. And I was like, I know how much money it costs to run this. It costs you
$90,000. It is impossible for you to run to run this i was like at least you should have the courage or put it on your main website so that
we could talk about it in public and so i was like you know up until that point we were really
you know like we would never say that and so i was like okay well so if you're going to spend
ninety thousand dollars i want to tell all of your customers that if you come to me i'll cut
your bill in half or i'll give you money and As I stand behind that claim, anyone that comes to me, we can post the link of the campaign at
the bottom of the podcast notes if people want to check it out. But yeah, super excited to be
compatible with all your Kafka workloads. And thanks for having me. It's been a fun show.
That's awesome. Thank you so much, Alex. And we're really looking forward to host you again
on the show in the future.
All right, Costas, thanks for having me.
Okay, Costas, I didn't make it back in time to hear the entire recording,
but looking at the internal chat, it seems like at a minimum,
they had some huge news about a fundraise, which is super exciting.
But tell me what you learned.
Yeah, first of all, Eric, we have to say that even the best couples need some distance. which is super exciting. But tell me what you learned.
Yeah, first of all, Eric,
we have to say that even the best couples need some distance sometimes.
Well, then we can still get a relationship there.
So I think, yeah, I mean.
Distance makes the heart grow fonder.
Did you miss me?
I mean, are you jealous if I say that I enjoyed talking with Alex without you?
I don't know.
I did miss you, yeah.
Obviously, you always give a very unique dimension to the conversation,
the conversations that we have.
That's why we were like the two of us there.
But it was fun to talk with Alex, for sure.
He's a very deeply technical person,
obsessed with performance,
which I think is also a reason
why it made him such a good fit
to go after this problem
because it is a very critical system
that you need to have very strong guarantees
when it comes to both performance and resilience.
Well, Kafka is also such an important system.
And so it was fascinating to chat with him,
experience his passion about what he's building.
And it pays off, right?
They announced their next round of funding.
They raised like a hundred million.
Okay, not in the best market out there for fundraising at this stage,
which means that they are doing something right.
And it seems also that there were good reasons
why we didn't have more competition in this space
in the past 10 years.
But now it seems like there is the time for that so i would recommend
to our audience like to tune in and like listen to alex about like talking about these technologies
how they were built why they were built the way they were built why we need a new paradigm today, what Red Panda can offer compared to a system
like Kafka, and some very cool new technologies like Wasm, like Wemba Assembly that they are
using and how they are incorporating this new infrastructure like Paradigm to really
create a unique new experience and a unique new product that it's much better,
let's say, addressing the needs of today compared to other systems. So I would suggest to everyone
to tune in. He's a bright person, very smart, obviously, with very deep knowledge
that he's sharing on this episode. And yeah, it will be fun for everyone to listen to it
Awesome, well I am
so glad you learned about Kafka
and Red Panda
I am so disappointed I missed it
but I'll be back
on the next one. Yeah, you will
Alright, well subscribe if you
haven't, tell a friend
and we have great episodes coming up
so stay tuned.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app
to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.