The Changelog: Software Development, Open Source - Reinventing Kafka on object storage (Interview)
Episode Date: August 29, 2024Ryan Worl, Co-founder and CTO at WarpStream, joins us to talk about the world of Kafka and data streaming and how WarpStream redesigned the idea of Kafka to run in modern cloud environments directly o...n top of object storage. Last year they posted a blog titled, "Kafka is dead, long live Kafka" that hit the top of Hacker News to put WarpStream on the map. We get the backstory on Kafka and why it's so widely used, who created it and for what purpose, and the behind the scenes on all things WarpStream.
Transcript
Discussion (0)
What's up friends friends? Welcome back.
This is The Change Law.
We feature the hackers, the leaders, and those who are building data streaming platforms inspired by Kafka.
Yes, today's conversation revolves around Kafka and data streaming.
We're joined by Ryan Worrell, co-founder and CTO at WarpStream.
Last year, they posted a blog titled Kafka is dead. Long live Kafka. And that, of course,
hit the top of hacker news and put Warp Stream on the map. Today, we get the backstory on why Kafka
is so widely used, who created it and for what purpose, and more importantly, the story of Warp
Stream. And the question they asked themselves was this, what would Kafka look like if it was
redesigned from the ground up today to run in modern cloud environments directly on top of object storage with no local disks to manage, but still had to support the existing Kafka protocol?
Well, that's just the premise for today's conversation.
A massive thank you to our friends and partners over at Fly.io.
More than 3 million apps have launched on Fly, and we're one of them.
Scalable full stack without the cortisol.
No stress.
Learn more at fly.io.
Okay, let's Kafka.
What's up, friends?
I'm here with a new friend of mine, Sagar Bhattu, co-founder and CEO at Speakeasy.
You know, I've had the pleasure of meeting several people behind the scenes at Speakeasy,
and I'm very impressed with what they're doing to help teams to create idiomatic SDKs,
enterprise-grade SDKs in nine languages, and it's just awesome.
So, Sagar, walk me through the process of how
Speakeasy helps teams to create enterprise grade idiomatic SDKs at scale.
You know, APIs are tough things to manage for a company. The open API spec, this great standard,
widely adopted standard to describe and document the SDKIs is the best chance the company has towards documenting it,
understanding state, you know, point in time, what is the API, and also ownership. What are the APIs?
How are they grouped? Which teams own them? What services do they get deployed to? There's a lot
of questions there that often we see teams and companies kind of struggling to answer. So
speakeasy is a forcing function for them to invest in making that open API spec
as great as possible.
Completely descriptive, fully enriched.
Speakeasy helps with those gaps.
We have deterministic and AI tools
to kind of fill in the gaps for them.
And so the better and better that open API spec gets,
the better chance you have at serving your community.
The end value is always to the end user
who is actually integrating with your API. So end value is always to the end user who is actually
integrating with your API. So if your open API has gaps in it, more likely they are into errors.
They don't understand what they're implementing. It gets tough to maintain because it becomes
institutionalized knowledge as opposed to described on the document. So there's a lot of great reasons
why you invest in that open API spec. Any artifact like that, that you're going
to invest a ton of time into needs tooling to manage, right? And that's what speakeasy is at
its core. It's tooling to manage that open API spec, give you kind of very clear change management
principles around it, version it, understand, you know, exactly what versions are used for what
SDKs. If you invest in that spec and use speakeasy, you'll have a good document. And the moment you
have a good document, you can have good or great SDKs, which make integration easy.
The way Speakeasy works there is you point us at your document, wherever it lives, in GitHub or maybe some other file storage or somewhere else.
We detect changes as it evolves, as different people contribute to it.
And we send you new updated code every time that happens.
And so the moment we send you code,
there's an opportunity for you to review that
and say, you know, yes or no.
Like this is new code we want to ship to our customer.
We do that heavy lifting of generating that code,
giving you kind of provenance of your spec,
but leave you as human in the loop to decide,
okay, am I going to serve my ecosystem
with a new version of the spec in SDK?
So that's the kind of core workflow that we're built around.
And that's really the point of collaboration
between us and companies that we work with.
Okay, friends, the next step is to go to speakeasy.com.
Try today for free.
You get one free SDK in your language of choice on them.
Enjoy it.
Robust, idiomatic SDKs in nine plus languages.
Your API, the OpenAPI spec available everywhere.
Again, go to speakeasy.com.
Once again, speakeasy.com. dot com All right, today we are joined by Ryan Worrell from WarpStream.
Ryan, welcome to the ChangeLog.
Thanks, it's great to be here.
Great to have you.
Shout out to listener Vladimir for requesting this episode.
Also, shout out to your co-founder, Richard,
who unfortunately couldn't be here today,
but hey, Richard.
What's up, Richard?
Yeah, but you're here,
so let's talk to you
and not to Vladimir and to Richard.
That being said,
Vladimir requested this episode.
You too, listener, can request episodes.
Head to changelog.fm slash request.
Let us know what you would like to
hear about on the pod, and we might just fulfill your every desire. Vlad wanted to hear about
WarpStream, and so that's why Ryan is here. Just so happens that Adam and I both would also like
to hear about WarpStream. So here we are. Let's start with Kafka, though, because it sounds like WarpStream's story starts with Kafka's story.
What is Kafka, besides an author from the early 1900s?
But the open source thing, what is that thing all about?
Yeah, Kafka is both a very interesting and a very boring system. The easiest way to think about it is it lets you create topics and you can have producers
that write messages into these topics and consumers that consume messages out of the topics.
It's kind of like a publish and subscribe type deal. But the thing that makes it interesting
is the fact that once you consume those messages, they're not deleted. So they're
still stored inside the system and another consumer can go and read them again for a
different purpose. Like if you have two different applications that are consuming the same
data set, they can both equally consume those messages. You know, let's say that you have
one application that does machine learning training and another that does alerting based on the two different, like the same messages you want to process them, but you want to process them in different applications.
Kafka is a useful tool for that.
It also provides ordering for those messages so that if you need to implement an application where you send messages in a certain order and you want that order to be retained on the other side. Kafka also does that, you'll get them back in the same order every time.
So you can implement something like state machine replication
or that type of thing where the ordering matters.
Okay, so what are some use cases for this?
Lots of, it sounds like, well-funded companies
use it, larger companies.
And I think that some of that is because
of the operational complexities
and the love-hate relationship with it.
But why are people grabbing this particular tool often?
Yeah, the reason why it's useful is there just isn't a lot out there that fulfills those, you know, the two main things.
It's like a publish and subscribe mechanism that's scalable.
And then also that lets you have different consumers process the same set of messages without, you know, one of the consumers deleting it.
Like there's a lot of queuing systems that the messages, when you consume them once, they're just gone forever at that point.
Like the purpose is to consume the message and then have it go away, not to reprocess it again in the future. There are a lot of use cases for it. I'd say that the most broadly popular
one is for moving data from point A to point B, kind of like a dump pipe. It's used a lot in
observability and security-related workloads where you have a lot of application servers that are generating logs
and you want to temporarily put those logs somewhere
before you put them in something else.
Like you say, you want to put them in Elasticsearch
or something like that.
Elasticsearch can be a little finicky.
So you want to have Kafka,
which is a much simpler system in place,
as a temporary buffer
to hold those long messages
that you want to write to Elasticsearch
in case that Elasticsearch cluster is down
or you're doing an upgrade or something like that.
There's a lot of different reasons for it,
but Kafka is pretty much the de facto standard
for those kinds of workloads.
And then when you get outside of observability and security,
there's a lot of people that are building custom applications on top of Kafka, like an inventory management system for a warehouse where every time you want to keep track of the real-time status of everything going on in the warehouse, you might want to send messages to say, oh, this new batch of inventory has been added onto the shelves of the warehouse. I'm taking things out. And then you're computing
some type of a live application based on that inventory data to say, you know, that you need
to replenish the stock when it goes below a certain amount, but you want to do that in real
time so that you can react to fat, you know, faster than just doing this once a day.
So Vladimir pointed us to a post, which I think Adam and I, we had both already read this post because it was last year.
Last summer, I believe.
Kafka is dead. Long live
Kafka. This was your big coming out
party, it seems. A great way
to introduce WarpStream.
And in that post,
you said that Kafka is one of the most
polarizing technologies in the data
space. And then, whether
it was you or Richard who wrote that,
then you just moved on and kept going,
assuming that we all just knew why or how,
or agreed that that was just true.
I assume it's true, it's probably polarizing.
But why is it polarizing?
My guess is because it's useful but difficult to use.
And so people love it and hate it,
but maybe there's more to it than that.
So I think that there are probably two it than that so i i think that there there are probably
two main uh criticisms that people have of kafka the first is that it's hard to run like as an as
the operator you have to have a lot of knowledge about how to use the open source project
appropriately and this the second major issue is the cost You know, I'm sure we'll get into this, but the, the, the cost of running open source Kafka in the cloud, it's pretty high compared to what people expect it to be for such a, if you think of it as a dumb pipe, you would expect to pay dumb pipe type rates for it. But given the fact that it requires triply replicating the data onto local disks,
and you'd have to pay, most of the cloud providers are charging you money for interzone replication,
you end up paying a lot more than you expect, even if you're just storing the data temporarily.
If you're using open source Kafka and AWS, for example, the minimum cost for a highly available
3AZ setup for the cluster
is 5.3 cents per gigabyte,
compressed gigabyte written into the cluster.
That's just to do the replication part.
The storage part is all another story.
It depends on how long you want to store the data for,
but you're like, if you're starting out,
that's your baseline cost.
It can get pretty expensive pretty quickly.
Is anyone building or using Kafka, open source Kafka, as you said, in a scenario where they're not on public cloud, where they're building out their own infrastructure, where it's probably maybe even more harder because you're literally managing the disks.
You're not ordering the disks or SRAing the disks.
You're literally managing the disks.
Is that a scenario that happens or is it less likely?
So that's definitely a thing that happens.
I know of companies that do that,
but just as the migration to public cloud
over the last 10 years has only increased in velocity,
essentially, that is becoming less and less popular because it is
indeed hard and it's even harder when it's in your own data center as opposed to the cloud where you
can just ask for more disks and you get them right away the cost situation is a little different
there too because typically you're the way that you're provisioning network in your own data center
would not end up with a per gigabyte cost. I mean, obviously, you amortize everything over how much data you're transferring
inside your data center,
but you're buying it in terms of hardware
and your per gigabyte rate if your traffic goes up
is it doesn't correlate the same way linearly
as it does with Amazon.
But that's, it's definitely still a thing people do,
but it's less and less popular every day.
Continue with the polarizing.
What else is polarizing about Kafka?
Some people have some,
some people have strong opinions
about the actual developer
programming model of Kafka
and that it's a little hard to use sometimes.
I think that's less of a big deal these days
as more tools have integrated with Kafka.
It just, it makes it even easier
to use Kafka than,
there are some other systems
that might have a theoretically easier to use Kafka than there are some other systems that
might have a theoretically easier to use programming model, but everything speaks Kafka
now. So those concerns are mostly trumped by the fact that it's the de facto standard.
I think really what most people are concerned about when, like, if you don't use Kafka today
and you're thinking about bringing it in to your company, the two things that you're going to be
concerned about are how hard is it to run and how much is it going to cost?
Those are typically people's two big blockers.
It doesn't have anything to do with the fact that conceptually they have an issue with Kafka.
It's those more practical things.
What makes it so difficult to run?
Is it the SSDs? I think that post also called it finicky.
Is it poorly architected? Like why is it finicky? It's a number of different things. I think the
first one is yes, being responsible for anything that stores data on local disks. If you want to
achieve high availability and high durability of your data is challenging. It requires
experienced SREs to handle those types of failures when they do occur. But that, I think,
that can be dealt with because people do that with other systems all the time. But I think that
most people's problems with Kafka come when they
want to scale up and scale down the cluster in response to load. The open source project doesn't
really give you much tooling when it comes to helping you manage that process. Like for example,
in the open source project, there is no automated tool to rebalance the data among the machines
when you add or remove machines. Like that's kind of a table stakes feature in a lot of, like, if you're thinking about a relational,
a distributed relational database, you know, that would seem kind of silly if you had to like
run a script to move data between the nodes and the database. But that is true of open source
Kafka. And there are now, there are other tools that you can use alongside of it that can, that
can take some of this work off of you, but they're not always the easiest to use either.
It's not like a self-balancing, self-managing thing like a lot of the distributed relational databases are.
It's something that takes a little bit more hands-on work. thing that goes along with that is the if you're storing data for a long period of time in the open source project uh on they they didn't add a tiered storage feature until very recently
in the open source product and the time that it takes just to copy the data around from machine
to machine when you're scaling up or scaling down the cluster can be hours or days depending on how
dense you're running the machines.
Some of that is alleviated with the new tiered storage stuff where the older data is moved
to object storage, but that part doesn't alleviate the inner AZ networking costs.
And there's another post on our blog about tiered storage and Kafka if people are interested
in learning more about that topic.
It is open source though, right?
Apache Kafka?
Yeah.
Yeah.
The project is managed by the Apache Foundation
and has a variety of contributors across a ton of companies.
And I would say it's a fairly healthy example of an open source project
in terms of having a big community around it.
There's a margin of haters, let's just say, towards Kafka. And it is open source. And I'm
just curious, you know, you may be in that bucket of margin of haters because you've created
WarpStream, right? So you're kind of not for, you're kind of against, at least from an economic
standpoint, and maybe a DX standpoint and many other standpoints. The point I'm getting
to is, why not just improve Kafka?
There are a lot of practical challenges with improving
a large open source project with a lot of users and a lot of
dependent parties, I should say. Not even necessarily
just users, but stakeholders
of all kinds.
Making large sweeping changes is essentially impossible.
Like it's not the amount of code churn required to take open source Kafka and get it to something
resembling the architecture of Workstream is just not going to, that's not going to
happen in any reasonable amount of time. That's the first part. something resembling the architecture of Workstream is just not going to, that's not going to happen
in any reasonable amount of time. That's the first part. If you just wanted to abstractly,
no financial interests involved, how would you do this? It would be very hard, practically.
The second reason is that Workstream makes a pretty different set of trade-offs than the
open source project does in terms of the environment that we expect users to run in. Now, I think those trade-offs are correct for the world
that exists today.
But in the abstract, it is different than the open source
project.
So WarpStream stores data only in object storage.
That's step one.
You need an environment that has object storage.
And then step two is that we run a control plane
for the cluster, which in the open
source, the comparison would be kind of like if somebody was running Zookeeper or Kraft, which is
their replacement for Zookeeper inside of the open source project. It's kind of as if we're running
that for you remotely, and then you're running the agents as we call them, which is the replacement
for the Kafka broker inside your
cloud account. So just like there's a very specific topology that we're prescribing to our customers
as well. That's different. Probably wouldn't fly in an open source environment, or at least would
make it even more challenging to run potentially. I think those are probably the two biggest reasons
of why we couldn't just improve Kafka is it just, it would be too hard practically to make
improvements. And then also we're making trade-offs around what we think the world, like how we see the world existing today and how we think it's going to continue to exist in the future that a lot of the stakeholders to the OpenService product may not agree with our assessment pair, basically.
Good answer.
I was expecting a version of that.
I was not suggesting that you should just not start WarpSt and by all means just go contribute to Kafka and bail. But it's always good to get
that perspective because Kafka's got history. It's 13-ish years old. It was developed inside
of LinkedIn for different purposes. That's why I started off with the question, which was
their own infrastructure. Because LinkedIn designed this for a different purpose than
everybody else today uses it. It was not designed to be used in a cloud environment where there's a lot of
egress fees and a lot of fees between moving data around and so it was not really designed for its
actual use case or the the usage space that it's in and LinkedIn did not charge its users those
transaction fees I assume potentially because i don't know linkedin's
infrastructure history but i assume because they had far more control over their cloud or their
own environment to not have to deal with those costs than maybe everyone else who's become a
kafka user has had to take on yeah the way that I like to explain that, the networking cost side, is that when you're renting space in a colo
or you have your own data center,
you're implicitly paying for
what is kind of a fixed capacity resource.
It has a very high fixed capacity,
but you are essentially paying for a resource
that has a fixed capacity
without doing a bunch of capital improvements
to your data center.
Whereas if
you're in the public cloud, you can show up and put a credit card down and start moving gigabytes
a second across the network without asking anybody for permission, nothing. So you're paying kind of
a tax for that flexibility of being able to show up without asking anybody. All of a sudden,
start moving a ton of data and
especially in terms of how spiky you can you can do it like you can you can write 100 gigabytes a
second for one minute and then never pay amazon any money again and they they have to they have
to do some capacity planning on their end to just like they do for every other service and why they
charge you know higher on-demand rates for EC2 instances
than if you go and buy a random server off the internet and put it in your house.
The cost looks very different.
Now, whether that cost is right, whether that reflects real economic realities,
I don't think anybody can say without being inside of Amazon,
but I think there's a pretty logical rationale for why it exists that way
because there are people that will consume bandwidth in a very different... of Amazon, but I think there's a pretty logical rationale for why it exists that way, because
there are people that will consume bandwidth in a very different... You have to think about the
worst case scenario users, basically, of your service, the people that you might even call it
abusers of your service in terms of your cost profile. So I think that's why, as you're saying,
you're correct that LinkedIn can just decide to use Kafka in a different way internally to match their ability to provision infrastructure.
And Amazon can't really force you to do that in any way other than just charging you more money for it.
So that's why they do.
So you and Richard, did you guys meet at Datadog?
Is that where you guys connected or was he at Datadog?
Tell us a little bit of the history of you two.
Yeah, so Richard and I met a little over five years ago now at a conference.
We met at Percona Live, I think it was 2019 in Austin.
And he was working at Uber at the time.
And yeah, so we did eventually both end up joining Datadog, but that was a little later.
Gotcha.
And while you were there, you had put some sort of Datadog infrastructure on S3 or on object storage.
Husky, I think.
I'm going from memory now.
Yeah.
So my co-founder, Richie, and I, after he left Uber, we started working on a prototype of a system that was, the idea was basically
a snowflake for observability data. That was like the elevator pitch. And we were going around
pitching that to investors at the time. And that's how we got to know some of our investors
in Warframe today is we met them back in those days. And that eventually caught
Datadog's attention. And we ended up joining Datadog together to build that system, Husky,
with some of our current colleagues at Warpstream were also there at Datadog building that system
with us. Basically, the idea there was to replace the legacy system inside of datadog for a lot of
the the kind of basically anything that you can think of that's not like pre-aggregated time
series metrics the idea was to be we think of it as like timestamp plus json that was the the data model, basically. And we wanted to move all that data to object storage.
There's a ton of different reasons for it,
similar to the reasons why WarpStream is useful.
But over the three and a half years that my co-founder and I were there,
we migrated all of the products that were using the legacy system over to ASCII.
Yeah, I mean, that's why I ask about it
because it seems like it's a precursor
to this very similar move with Kafka, right?
Like what if we took Kafka,
ripped out the local storage aspect of it?
Sounds easy enough.
And built something, I mean, by ripped out,
conceptually ripping out, right?
You didn't fork Kafka and write this, right?
You started over?
Yeah, we started from scratch and writing it in Go.
Right, so conceptually rip it out,
but actually rewrite something that's Kafka compatible
in terms of features and API, I assume,
and all that kind of stuff.
But no local storage, object storage.
And your success with what happened to Datadog
probably led the way for you to say,
well, if we did that, it would be a lot cheaper basically
and way easier to operate
because hello, Amazon Web Services, right?
Like it's their problem now.
Yeah, there's definitely a lot of
like high level conceptual overlap.
The systems are extremely different because one looks more like an OLAP database,
and the other is, I mean, Kafka is more like a log.
So there's some very high-level conceptual similarity.
And I think the thing that we really got the most experience with there was learning about object storage. So that's about where the similarities stop is just like the deep
experience of understanding how object storage works at scale when all of the major public clouds
was a hugely valuable learning experience for us to know that like when we left and we were,
you know, we're doing the back of the envelope math on
could we make this thing work that experience let us you know the experience with object storage
that we that we learned there was was pretty helpful now i think a lot of like object storage
people talk a lot about object storage nowadays so i think that's not like an unknown thing to
understand the characteristics of of working with it. But I'd say in 2019,
that was a fairly different story. I think the only people that would know a lot about building
high performance systems on top of object storage, they were probably all either inside the public
cloud providers themselves, or they were working at Snowflake or a similar company. The knowledge
was not super well distributed at that time. Most people, when they think of object storage, they think of something that's super slow.
Like they're thinking about it in terms of like seconds of latency to do anything.
And they just like you have to rework your, the numbers around it are very different than
what people might think of off the top of their head.
And that opens up a lot of design possibilities that you don't think of immediately.
Okay, friends, here are the top 10 launches from Supabase's launch week number 12.
Read all the details about this launch at superbase.com slash launch week.
Okay, here we go.
Number 10, Snaplet is now open source.
The company Snaplet is shutting down, but their source code is open. They're releasing three tools under the MIT license for copying data, seeding databases, and taking database snapshots. Number nine, you can use PG replicate to copy data,
full table copies and CDC from Postgres to any other data system. Today it supports BigQuery,
DuckDB and MotherDuck with more syncs to be added in the future. Number eight, Vect2PG,
a new CLI utility for migrating data for vector databases to Supabase or any Postgres instance
with pgVector. You could use it today with Pinecone and QDrant. More will be added in the future.
Number seven, the official Supabase extension for VS Code and GitHub Copilot is here. And it's here
to make your development with Supabase and VS Code even more delightful. Number six, official Python support is here.
As suit base has grown,
the AI and ML community have just blown up suit base.
And many of these folks are Pythonistas.
So Python support expands.
Number five, they released log drains
so you can export logs generated by your suit base products
to external destinations like Datadog or custom endpoints.
Number four, authorization for real-time broadcast and presence is now public beta.
You can now convert a real-time channel into an authorized channel using RLS policies in two steps.
Number three, bring your own Auth0, Cognito, or Firebase.
This is actually a few different announcements.
Support for third-party auth providers,
phone-based multi-factor authentication,
that's SMS and WhatsApp,
and new auth hooks for SMS and email.
Number two, build Postgres wrappers with Wasm.
They released support for Wasm WebAssembly
foreign data wrapper.
With this feature, anyone can create an FDW and share it with the Superbase community. They released support for Wasm, WebAssembly, Foreign Data Wrapper.
With this feature, anyone can create an FDW and share it with the Superbase community.
You can build Postgres interfaces to anything on the internet.
And number one, Postgres.new.
Yes, Postgres.new is an in-browser Postgres with an AI interface. With Postgres.new, you can instantly spin up an unlimited number of Postgres databases
that run directly in your browser and soon deploy them to S3.
Okay, one more thing.
There is now an entire book written about Supabase.
David Lorenz spent a year working on this book, and it's awesome.
Level up your Supabase skills and support David and purchase the book.
Links are in the show notes.
That's it.
Superbase launch week number 12 was massive.
So much to cover.
I hope you enjoyed it.
Go to superbase.com slash launch week to get all the details on this launch or go to super base.com
slash change law pod for one month of super base pro for free.
That's S U P A B A S E.com slash change law pod. what are some lesser known things about object stores that you know that we don't know
or maybe nobody knows besides you yeah it's it's not really he's gonna know it's not really one
secret trick i think it's just a conceptual framing
that you have to think of it as if you have access
to a very large oversubscribed array of spinning disks.
If you think about it like that,
then the conceptual framing of how it works will make,
like how you
design a system around it will make a lot more sense so there's a you know there's a couple
different pieces of that really large like way bigger than your individual application so like
you have a the world's biggest raid zero of all the disks ever right zero it's actually unlimited
so think about that way But also oversubscribed.
It's like it's the latency characteristics that are highly variable. Like one request might take 10 milliseconds and the other takes
50. And there's no discernible reason to you why that is the case.
It's just that is how it works. So you have to design around that a little bit
in terms of retrying requests speculatively
and that type of thing.
But if you have that framing
of it's very large,
cheap storage
with variable latency characteristics,
you can,
if you rework your application
to think about how it would make it work
on top of that,
then you've got the right framing.
The reason why it's so challenging
for people today
is that they think about,
they spend all their time thinking about the fastest storage that's available today.
They spend a lot of time thinking about persistent memory or NVMe SSDs, stuff like that.
They think about that first when they're designing their application.
How do I get the lowest possible latency?
Making your application work on that first
and then trying to add object storage on top
is a very popular thing that people try to do.
They always call it tiered storage.
Basically every system that has that calls it tiered storage.
And it's very hard to match the characteristics
of those two things together going top down,
whereas going bottom up the other direction,
starting with object storage and then layering stuff on top,
it seems like it should be the same, but it's not.
You don't end up making the same design decisions along the way.
And that has a big influence on the overall characteristics
of the system.
And I can explain specifically what
that means for Kafka in terms of tiered storage.
So they were thinking about disks first, like local NVMe SSDs. That's usually what people are running it on these days
in the cloud. The way that that influences the design is that the way that they implement tiered
storage is they just take those log files on disk that have all the records in them, and they copy
them over to object
storage. That solves a cost problem. If you never want to read that data again, you're good. Like
that's cool. It's much cheaper now. When you want to come back and read it, let's say that you
wanted to read all of it, like all of the data you've ever tiered off into storage. The way that that works in the open source project is that you'll end up reading
all of that data you're going to have to pull back through one of the brokers. There's no way for you
to like parallelize that processing because they just view it as this bunch of log files that I
put into object storage. And with OrbStream, we've kind of decoupled the idea
of the local storage being owned by one machine
to now there's a metadata layer that says,
these are all the files that exist.
And then we have all these stateless agent things
that can actually pull the data out of object storage for you.
So you can scale up and down as quickly as you need to, to read all that data out
of object storage. So you wanted to pull it all out. You can scale up temporarily for the hour
that you wanted to run some big batch job and then scale back down at the end. With the open
source tiered storage in Kafka, that's a lot harder because they started with the local disk
part, which makes sense because that's what existed before.
It just means that adding stuff on afterwards, you're usually the tiered storage, lower layers of storage is like a secondary concern.
It doesn't get as much love and attention as the primary storage gets.
And you end up with a very different system at the end.
For us laymen, can you describe how the brokers work and contrast that again with these stateless agents?
I understand that you can scale the agents horizontally because they are stateless versus a broker, which seems to have kind of a lock on some data.
But what do Kafka brokers do exactly?
Yeah, so Kafka has, let's start with topics topics are like a basically just a name for doing the you know mapping consumers and producers together they agree on the name of
a topic for how they're gonna where they're gonna send the data to and where they consume the data
from and within the topic there are partitions and a partition is basically just a shard to make that
topic scalable there are a lot of different ways to decide basically just a shard to make that topic scalable.
There are a lot of different ways to decide which shard you're going to write the data to,
but let's just say for now,
you do it by hashing the key of the message
and then routing it to the shard
based on the hash of that key.
So if you have the record with the same key,
you'll end up going to that same broker
every time that owns that partition.
So that's how it works in the open source product. The brokers own some set of partitions from a
leadership perspective. And then there's also replicas of that that are just copying the data.
And it's just other brokers that are the replicas for those partitions. So the broker will write
that data that it receives from a client,
producer client down to the local disk and replicate it out to the followers. And then a consumer can come along and read either from a replica or the leader, the data that producer
wrote. But they're all coordinating on essentially one of those brokers owns the partition specifically
that I'm interested in and reading. So that's how it works in the open source project. And in
WarpStream, we've decoupled the idea of ownership of a partition from the broker itself.
We have a metadata store that runs inside our control plane that has a mapping of
here are all the files and object storage. And within those files, the data for this partition
for this offset is here. It's in some section of a file and object storage.
So any of our agents, which are like the stateless broker that speaks the Kafka protocol to your clients, any one of those agents can consult the metadata store and ask, I want to read this topic partition at offset X.
Where do I have to go in object storage and potentially multiple places in object storage?
Where do I have to go in object storage to read that data. But because the metadata store inside the control
plane is handling the ordering aspect of it, essentially, you get the same guarantees as Kafka
in terms of, I have this message with this key that's routed to this topic partition,
and I want them to stay in the same order because I'm writing them in a specific order.
That ordering part is enforced by the metadata store inside the control plane,
but the data plane part of actually moving all of those messages around is only inside the agents and object storage.
So it lets you do that thing that I was saying before, where if you want to scale up and down, it's very easy to do that because you don't have to rebalance those partitions, which take up space on the local disk amongst the brokers in order
to facilitate that.
So you're reading metadata versus reading the real data, basically.
And that's what makes it faster.
In terms of being faster, it's faster at the fact that there is no rebalancing that happens
because the data is always just in object storage somewhere.
You don't have to do any rebalancing for it.
That part of it is faster.
There's obviously a trade-off when you do this, in that the latency
of writing to object storage is higher than writing to the local disk. So if you want your
data to be durable, you have to wait for the data to be written to object storage first.
So that's the primary trade-off somebody that's using WarpTree would be making,
is that they're comfortable with around 500 milliseconds at the p99 of latency to write
data to the system and then the end-to-end latency of like a producer sends data and then it's
consumed by a consumer is somewhere between one to one and a half seconds again at the at the p99
what percentage of the kafka population does that out? Because it seems like many of them are highly real-time oriented.
So it's interesting that you say that you use that word real-time
because we've talked to a ton of different Kafka users.
And when you ask them,
what is your end-to-end latency of your system today?
A lot of them don't know the answer.
They know, like, they think that they know the answer.
Well, it's real time.
Yeah.
They're either not measuring it
or they're measuring it in a weird and incorrect way.
There's a lot of different ways that that can happen,
but typically the way that we've experienced
is that if you ask an executive
at the company that uses Kafka heavily,
ask them,
is your application latency sensitive? They'll say, of course, we're an extremely high performance
organization. We love high performance systems. Obviously, the intent latency couldn't be anything
more than 50 milliseconds. That would be crazy if it were anything more than that.
And then you make it a little bit further down the chain in the organization. You ask the application developer
or the SRE who's actually on call for the thing
or wrote the code.
You ask them and they're like,
I don't know.
I hope that it's fast, but I'm not really sure.
Or you ask them and you get an explicit answer
that's very different than the answer
that the executive gave you.
And realistically, there are a few
applications that we come across that do need that low latency. And the primary example of that,
I mean, there's a lot of this kind of application out there in different domains, but the good
example that demonstrates it is credit card fraud detection. The way that, you know, there are people out in the real world
using credit cards
and you want to make a determination
about whether a chart is fraudulent
at the point of time
that they're swiping the card.
So that is like a,
necessarily a real time thing.
You know, like there's a user
who's waiting on the real world
and if Kafka is in the critical path,
especially multiple hops through Kafka and a critical path, then a system that has higher latency
like WarpStream would be harder to adopt. And there are other applications that meet this
criteria. But basically, if the user is in the critical path of the request,
then WarpStream is harder to adopt, like in the abstract. You can obviously, some specific applications
might be okay with higher latency than others,
but that's the one that we see from time to time.
When you strip all those out though,
the things that you have left
are the more analytical type applications.
Like the example I was talking about before,
moving application logs around.
Developers are pretty used to some
delay between the log print statement running inside their application and being searchable
inside wherever they're consuming their logs from. So the additional one second of latency
there is typically a non-issue. And the reason why that's useful for us as a company at Workstream is that
those workloads are typically really high volume and they cost the user a lot of money. So our
solution being more cost-effective really resonates with them because usually there's also a curve of
the more data you're generating, the less valuable that data is per byte. So there's also a curve of the more data you're generating, the less valuable that data is per byte.
So there's like budget pressure to get the efficiency to process that data.
You know, you want to increase the efficiency of processing that data.
And Kafka sticks out like a sword from in terms of that processing cost.
So we can come in and say, hey, the, you know, because of the way the cloud providers don't charge you for bandwidth between VMs and object storage, and we store all the data in object storage, that means you're going to save this many hundreds of thousands of dollars a year on sending the dumb application logs that you're generating into the eventual downstream storage.
That makes a lot of sense to them.
So while we understand that we can't hit every possible application in the market
with the shape that WarpStream is today,
we're pretty happy with the set of use cases
and workloads that we can target
because there are just so many of them out there
and they happen to align with the budget-sensitive ones.
Those reads and writes, can you restate those?
Did you say writes are at most in P99 500 milliseconds and reads are one to two seconds
in P99?
Is that correct?
So the writes are around 500 milliseconds at the P99.
That's tunable.
By default, we have the agent buffer the records that your clients are sending
in memory for 250 milliseconds before writing them to object storage so that you just write
fewer files to object storage, which is the primary determinant of the cost of the object
storage component of the system if you're not retaining the data for very long. But you can
shrink that down all the way to 50 milliseconds, in which case the produce latency at that point
would be probably ballpark 300 milliseconds at the P99.
I said end-to-end instead of read
because that's typically what people talk about in Kafka terms
because they want to know, like, a producer sends a message,
how long does it take until a consumer can consume that message successfully? So that's what I mean by end-to-end. And that is one to one and a half seconds of the
P99 for most R users. So latency aside, what are the other downsides of this approach?
So there really aren't that many downsides other than the latency. The latency is what actually enables all of the benefits of WarpStream, basically.
The object storage is what enables a lot of the benefits.
We have a couple of interesting features that are based on the fact that all of the data is in object storage.
One of them we call agent groups. And agent groups let you take one logical cluster and split it up physically amongst a bunch of different domains.
They could be different VPCs within the same cloud account.
They could be different cloud accounts.
They could be different cloud accounts or the same cloud account but across regions.
All by just sharing the IAM role for the object storage
bucket between those different accounts. The alternative to this with open source Kafka is
setting up something crazy like VPC peering, which is extremely hard to do. And your security team
will probably not be super happy if you try to ask them to peer a bunch of VPCs together because it
introduces more security risks.
So we have customers in production using this feature today
where the example that we usually give is
there's a games company that splits their production games account,
where all the game servers run, from the analytics account
where they do like the, they run a bunch of flank jobs
to process the data generated from the analytics account where they do like the, they say they run a bunch of flank jobs to process the data generated from the production account.
And they run agents that just do produce.
So just writes, they run that in the production account
and they run agents that just do fetch
inside their analytics account.
So they've kind of flexed the cluster
across those two different environments.
And all they had to do to set that up
was share the IAM role on the object storage bucket instead of peering the VPCs together. So the fact that everything is in object storage
opens up a ton of new possibilities, actually. Basically, the only downside of WarpStream is
the fact that the latency is higher. Now, obviously, we're a new company. The product Like the product does not have the 13-year maturity of the open source Kafka project.
But just to speak of the operational stuff and the cost stuff, the Workstream is a huge win on both of those.
Does it have any of the hosting flexibility?
I suppose you're putting everything in object storage anyway,
so there's probably people running
their own object storage clusters,
but that might be crazy.
I don't know.
Yeah, so there are a number of projects
and products out there that you can buy
to give you an object storage interface
in essentially any environment.
Like there's the open source project,
MinIO, and then basically
every storage vendor on the market
will sell you something with an S3-compatible interface
if you're running in a data center environment.
And because we work with S3, GCS, and Azure Blob Storage,
we can essentially connect to anything.
If you had an NFS server,
we can even make it work on that too.
We don't have anybody in production doing that,
and I wouldn't recommend it.
I would recommend using the object storage interfaces,
but we're pretty flexible
in terms of the deployment topology.
What about R2?
Would you have even more savings,
or would that not matter
because nothing's going outbound
from the virtual network
there so i think it would depend on where you're running the compute if you were storing the data
in r2 but you were running compute in aws you would get charged a lot of internet transfer
as part of that if you're running your compute in one of the providers that has free peering with R2, then yeah, you would get a nice
savings there and you'd be able to move data reliably across, let's say, multiple regions
of whatever providers have peered for free with R2 using Workstream. I was thinking about
getting started really, or just trying it out i do like your curl
demo i did play with it i had no idea what i was doing but it was cool the command is on your
home page it's curl and a url to an install script i did not review that script prior to running it
i just trusted you you're admitting that everybody well you know? Well, you know, it was a VM on Proxmox.
I didn't care that I could just throw away.
It wasn't my own machine.
I was safe.
That's a good layer.
It did spin up, and then it gives you a URL you can go to to log in.
And next thing you know, you're looking at a cluster.
So I like that aspect about it.
Whose idea was it to come up with that demo?
I mean, it's very hacker.
It's very developer, right?
Like no pain whatsoever.
If you've got a VM or you want to spin up a VM
or you have Proxmox and you can do it safely like I've done,
or you can spin up a droplet on DigitalOcean
or pick your own if you've got a VPC, whatever.
You could do it in a more safe manner and have some fun.
What do you expect people to do with that?
What are people saying about that?
And whose idea was it to produce that demo?
This is very hacker.
I like it.
Yeah, I think the demo was Richie's idea.
It basically just starts up a producer and a consumer so that you can just see something happening in the console.
Like, yeah, it provides you a link.
If you would have run that locally on your laptop,
we would have opened the link automatically in your browser for you.
It said it had a problem and I had to click it.
We even designed the little niceties like that.
The idea behind the demo is basically just to show people that it does something.
Kafka is not an exciting technology to demo.
We're kind of limited there.
It's even
more boring than doing a demo for a relational database or something.
But we, uh, there is another mode that you can run that in that's called playground and
playground will let you start a cluster that doesn't have like a fake producer and consumer
running on it as a, as a demo. It just starts a cluster for you temporarily
and makes an account that expires in 24 hours.
And you can take that Playground link
and you can start multiple nodes,
like say one on my laptop and one on yours
and point it at R2,
and we can have a cluster that spans our two laptops together.
My co-founder and I did that before
and posted a video of it on Twitter or something like that.
But because the data is all in object storage
and the compute part is stateless,
it's actually not that complicated to do.
It's basically the thing we were talking about a second ago with R2,
just connecting two laptops instead of two different regions or something like that.
So to get to the Playground version of it, is it like dash dash Playground?
How do I get there?
Yeah, so there's three different commands primarily that people would run.
There's WarpStream Demo, there's WarpStream Playground, and then there's WarpStream Agent.
The Agent is like the one you would run for production to start an an agent and the the playground one is is how you start a playground i think the playground even
gives you like it spits out in the output the command that you would copy and send to somebody
else to to start it in another um in another terminal but it's been a long time since i have
since i've played with it so i may be remembering wrong. The reason why people like the demo, or I should say the Playground, is that it makes it
easy if you're a developer to just start a cluster and use it for local development instead of having
to run. If you use WarpStream in production, you want to use the same thing in your development environment just to ensure consistency.
You can use Playground Mode to create a cluster and it will just go away when you stop using it.
And there's no cost.
Yeah, I dig it.
I kind of wish there was more documentation.
If there is, then I would go find it.
Or maybe a video or something like that.
Because that's kind of cool. I like this demo because for those who just want to tinker without having to
spin it up in the EC2 or just whatever,
you know,
go the extra mile.
I love that you can just sort of do this on your own,
but I had no idea the playground version was there or the agent version was
there to go a little further.
And there's some room that you can make some content around that to,
to give people more of a
guidance. And you should do that.
Yeah, totally.
The Playground has been, a lot of people
have found, the Playground and the demo, people
have found a lot of joy in because they're
just cool.
We also have a serverless
version of the product that basically just gives
you a URL that you can
connect to over the internet
to fulfill a similar purpose, basically
for people, if they want, if they want to try it out without actually doing anything locally on their machine.
I think we give new accounts like $400 of credit
when they sign up. So you can, you do a lot with, you can do a lot
with that if you just want to play around without actually starting the infrastructure.
And I guess while I'm on your home page perusing just under this demo that is so cool there is a mention of
plug and play part of your angst i suppose to get to where you're at was let's rethink what this
meant like in a modern time which is what you've done but then also to be just swap out so one
thing it says is there's no need to rewrite your application
to use a proprietary sdk you just literally change a url was that how did you get there in terms of
the it's fine to not want to contribute to kafka and make your own way and i'm totally cool with
that and warp stream reinventing or rethinking this model but how do you get to this point where
you're like let's make this as frictionless as possible
to focus on the DX of what it might actually be like
to say, okay, well, if this is,
like Jared said earlier, that subset of folks
that maybe they're not doing credit card transactions
and fraud detection where that needs to be
literally real time, where the latency
cannot be absorbed.
In a scenario where it can be absorbed
and it's a large population of Kafka users to say,
listen, we're here and this is how easy it is to swap.
How did you get to that design, that idea?
We got there by just talking to people, basically.
The number of developers out there who are using Kafka,
it's really high and we talked to a lot of them. And
when we asked them basically, what, you know, what do you not like about Kafka? You know,
they would give us a bunch of different answers, but the, when we would ask them,
if we could fix those problems for you, you know, would you want to do that? And it would involve,
you know, essentially rewriting large parts of your application.
It's a non-starter for people.
And there are a bunch of other things out there in the world that integrate with Kafka, like Spark and Flink.
And there's a bazillion open source tools out there that integrate with Kafka.
We have no influence on any of those things either, really.
So it was kind of a choice that was forced upon us.
There's really no way.
Kafka has so much momentum behind it that it's pretty much impossible to get broad adoption of something that would be a replacement for it without having the exact same wire protocol.
So you can use the exact same clients and stuff like that.
It's a lot of work to maintain that compatibility.
Thankfully, a lot of that work is front loaded.
It's just you do it once and Kafka is not a particularly fast moving open source project.
So they're not changing the protocol every day.
There's a lot of backwards compatibility is very good with Kafka.
So it's thankfully it was mostly a one-time cost, but it's opened up a lot of opportunities because we are compatible to even just doing basic stuff for the company, like being able to do co-marketing with other vendors of products that are compatible with Kafka.
If we weren't compatible with Kafka, you know, we would be able to do that. And a lot of the open source tools that we would want to integrate with, like let's say
the OpenTelemetry Collector or Vector, these kind of observability agent tools, they all
can write data to Kafka.
And we inherit that benefit right out of the box.
So it's been super important for us basically to have that compatibility. And do you think that, I know
you're sort of young-ish, but do you think that, I suppose, how are you winning? Are you winning
the market? That's what I'm trying to get to is like, are you truly absorbing a lot of the Kafka
user base? Is there, is there a major demand for WarpStream?
What's the state of product market fit
and are you winning?
Yeah, so we have a number of large use cases
in production today.
I can't talk about very many of them, unfortunately,
but there are WarpStream clusters out in the world
processing multiple gigabytes a second of
traffic through, and not just like one of them, like there's, there's a decent number of them
at this point. And where we're having success in the market is basically the large open source
users who are, you know, they feel like the open source project is a bit too challenging for them to run.
And there's budget pressure all over the industry today, especially in the, you know, in the corners that we're interested in, like in the observability and security and analytics side, there's a lot of budget pressure.
So we're a pretty natural fit for those folks who are both tired of running the open source project and they're getting budget pressure to decrease their cost. We're having a lot of success there. What about
Greenfield? Is there anybody that's like, okay, we need to adopt Kafka or something like it, but
what is out there before we go and write a lot of code or flesh out our infrastructure model or
make any plans.
What about those that are not migrating?
What's the path, I suppose?
What's the inbound of those folks and what's the path to like the DX?
Because one of the things you mentioned is that you solve a few problems.
You solve cloud economics, you solve operational overhead. And one thing that you mentioned, at least the article that was from last year,
was a major problem with Kafka, which was developer user experience.
And that's what I'm trying to get to there. Those were coming on green, brand new.
What is that user experience like and what is the path like for them?
I think that for Greenfield products, there's
two different branches of those. There's Greenfield products that
are only greenfield in
the sense that they're trying to adopt Kafka for some goal. They're not greenfield like the
application didn't exist before. There's that aspect of it where they're just new users of
Kafka. And then there are truly greenfield projects where the project itself is new.
And also the choice to choose Kafka is new. And usually those products don't have a super high volume of data.
It's the existing initiatives or applications within a company that process a lot of data
but are not using Kafka for cost reasons where we are having more success.
There's a product that I would love to talk about that won't quite
be public by the time this episode is posted, but they're in that first category where it's an,
it's a large existing workload, but they were not using Kafka for a bunch of different reasons, cost being one of them. And they're now a big Workstream customer
because they saw that there are benefits
to using Kafka for their application,
but they just couldn't use the open source project
for cost reasons.
And now essentially they can.
And there's a lot of cool stuff that they can do now
that they couldn't do before that Kafka enabled them to do.
And WarpStream is
their Kafka-compatible
product of choice for
those cost reasons.
They're starting to get some benefits from it now.
So I guess the obvious question
to me at this point is
Kafka is
not dead. It is alive.
It is open source. To my knowledge, I don't think it is. WarpStream is not dead. It is alive. It is open source.
To my knowledge, I don't think it is.
Warpstream is not open source.
Was there a conversation about licensing?
Was there a conversation about
being a commercial open source company?
Just to follow in the footsteps of the predecessors
that you at least, from a conceptual standpoint,
copied and improved upon, right? You were led by your here you stood on the shoulder of giants where are you at with that what
have you thought about in terms of licensing and open source and what y'all stance on open source
as your core or not yeah so we we had a lot of back and forth initially when we were thinking about this specific issue.
And the conclusion that we came to is that in order to dishonesty move of the way a lot of commercial open source projects have evolved in the last five years in terms of either relicensing or, you know, changing the focus of the project drastically to, you know, benefit the primary commercial backer. And we, we just didn't think that it was,
we're, we're providing a lot of value by providing a solution that is dramatically lower cost
and also compatible with the existing ecosystem. And the way that that works in practice means that you can switch away from WarpStream
because you're not locked into it from an application perspective or a protocol perspective.
So we're not locking you into something proprietary from an interface perspective.
So it's actually relatively easy to switch away from WarpStream if you decided to in the future
because you didn't like something that we did. But we're hopeful that the fact
that we provide something
that's dramatically lower cost
and easier to use
means that you won't switch away
and you'll continue to have
the best of both worlds, so to speak,
where there is an open source thing out there
that obviously is going to continue to exist
because it has a ton of users.
But if you want to use our product to save money
and have something easier to use, you can as well.
And we will be able to continue to invest
in making that product better and better over time
because we are not stuck in these kind of middle-of-the-road
outcome issues that a lot of commercial open-source companies have
where they're forced a few years down the line
to cash in all of their brand goodwill on a relicense
in order to gain that commercial success that they wanted.
We're hoping to be able to, by sticking to this model,
we're hopeful that we'll be able to be a good citizen
of the Kafka ecosystem in terms of making a product
that's not incompatible and proprietary and steering everybody away.
And we do put a lot of effort into testing clients.
We find bugs in Kafka clients that are typically open source and make improvements there.
But the core part of the product is not going to be open source.
What's interesting about those re-licenses is that they all were
commercially successful companies, even at the time of the re-license.
They had arrived. And at a certain
size and scale, it seems that the growth curve has to continue to go vertical
to satisfy investors, to satisfy public demand
in the case of Hashi.
But I'm sure, I don't actually know the state of Redis Labs
or the commercial success or not of Redis,
but many of them were large, successful commercial companies,
bigger than most companies ever get
before they actually went ahead and did that not cool rug pull.
But I wonder if the pressures on them,
because it's other people's money,
similar in your situation,
like you have a vent VC behind you.
And I'm just curious about that decision
from your guys' perspective.
Because you're a small team,
probably well-funded in terms of
you guys are highly successful software people, so you're probably making good money.
Run way well into the next decade for the sites.
Yeah, so why not bootstrap?
Why not bootstrap and then not have any of that VC pressure that you currently have?
That's a really good question.
And I think that the, take a step back from that question for a second, talking about the commercial open source stuff.
This is obviously a little bit inside baseball, but as a part of going through that decision process, we talked to the founders of a lot of commercial open source companies.
And we asked them, let's say you were starting our company today, what would you do? And without hesitation,
the answer we got was,
I would not start it
as a commercial open source company today.
And there are a lot of different reasons
that they gave for that.
And I can't really give some of those reasons
without potentially identifying
who those people are.
And I don't want to do that.
But the challenges of a commercial
open source company today
with the,
it's not even just the hyperscaler cloud providers anymore, taking your stuff and running it.
Like that's obviously a concern, but you can get around that with like the AGPL does a
decent job of preventing some flavors of that.
The other issue is just like the competition within the category that they're building their product in is extremely
high and having your source code out there in the wild and letting everybody know your secrets
essentially about how you made your product better the you you lose a lot of the juice behind
why you have these huge staffs of developers working on interesting things.
It's not to say you can't protect that other ways either, but like with software patents
and stuff like that, but people don't, the appetite for software patents, it would do
a lot of brand reputation.
I think if companies created a lot of software, if these commercial open source companies
created a bunch of software patents and started enforcing them against each other, for example, it's just,
it's a very challenging situation today. A lot of the companies that you might view as successful
commercial open source projects, they might be successful in their, in the iteration that they
exist in today or, you know, yesterday in the case of the holidays, real licenses, where they
have good adoption and the developer community, and they might have good success in the case of the holidays licenses where they have good adoption in the developer
community and they might have good success in the the vc funded startup segment of the world
but there is an inevitable push to go up market and to go after larger and larger customers
because it's the it's effectively the only way to support growth like the The growth of what you can achieve within the
small...
If your customers are all small startups,
even medium-sized startups,
and developers playing around
in their personal
capacity
or stuff like that,
the revenue opportunity
is just really small, unfortunately,
for a lot of these businesses.
It's much easier to sell a million-dollar-a-year contract to an enterprise than it is to get a million dollars of revenue out of a bunch of small and medium-sized businesses.
So the temptation when the growth starts to slow down is, I need to go do that now.
That's the first thing your investors are going to tell you is, you need to go do that now. That's the first thing your
investors are going to tell you is you need to go out market and get enterprise customers.
If the product that you're selling them is support or a couple of features on top of an
open source project, your ability to exert pricing pressure on that enterprise buyer to get them to pay a higher price
or to get them to pay at all.
In the case of a lot of these open source projects
where they spent so much time making it good
that the enterprise can just hire one person
to maintain it internally
and just move on with their life
and run the open source forever
and maybe pay you a peanuts support contract,
essentially, not actually enough to support
the business. It's just really hard. I completely understand where you're coming from and that it
might've felt as if these companies were successful from the outside. And some of them definitely
were, but just there is that inevitable pressure to keep the growth rate up. And the only way to do that is to go upmarket.
And when you're going upmarket,
you need to provide something that looks valuable.
And if your project is open source
and the alternative is hiring a developer
or two to maintain it internally,
you kind of have a cap on how much you can charge.
And it's the same thing if you're offering
a cloud version of an open source project, for example.
The premium someone will pay for your cloud version, it may be lower than you expect if they can self-host.
Because they're always looking at that.
They're looking at both sides of the coin.
How much will it cost me to self-host this versus how much does it cost to use your cloud-hosted version?
And that calculus does not always come out in your favor as a
vendor. And you may want to charge, you may have to charge significantly more to make the numbers
work on your side than what they think they can run it for internally. It's really challenging
stuff. And that's, we wanted to provide the best product possible with the best product
experience possible. And we didn't feel like the shape
of an open source,
commercial open source company
was the right way to do it
without having a lot of these distractions
about the things that I'm talking about right now
come up along the way.
And we didn't feel like it would be right
to do that, the bait and switch thing
that people are doing these days.
We wanted to be honest, basically, from day one.
That makes sense to some degree. I don't fully agree with all of your sentiment, although that's a very deep and lengthy conversation
teetering on, just not fitting this conversation necessarily. But what I can appreciate, given that
I don't fully agree with all of your reasons, The one reason I think that you've done well,
or I suppose the most positive thing is,
is you've made it easy to get in and get out.
So if for some reason WarpStream is of great benefit,
and let's just say a year down the road,
somebody does WarpNotStream,
and it's commercially open source,
and they eat your lunch,
because they decided to be open source first, and they can get into that just as easily as they can get out of you.
Then that's a whole different story.
I'm not suggesting that's going to happen,
but it's possible.
It's totally possible.
Yeah.
And that's,
and you're,
you're exactly right.
If our,
if one of our competitors came up with a,
a better implementation tomorrow and it was...
The exact implementation.
They can literally copy everything you do and just, they would be okay, the world would
be okay with that because they made it open source.
That's a version or at least a subset of a conversation we had at length on this podcast
a few weeks back with JJ, Joseph Jaxx.
He was like, yeah, I'm totally cool with somebody,
a founder going out there and literally copying X
and saying this is now X as open source.
He was totally cool with that.
I'm not saying that makes sense completely to me too,
but the world now believes that's an okay thing.
And it's an okay thing because at the core,
it is meant to be an open source commons good
yeah i i would have no i would not harbor any ill will towards someone i would decided to do that
i would be like come on man don't do that well someone's gonna do it yeah i mean as you guys
have success now whether or not they can actually pull it or off is the question right but like
there will be at some point as warp streamstream continues to grow a Hacker News number one story,
X is like Warpstream,
only it's open source
and self-hosted.
And it'll get 500 to 1,000
and maybe it gets adoption,
maybe it doesn't.
Maybe by then you guys are
so far ahead it doesn't matter.
There's tons of what ifs,
but like it will happen
from somewhere in the world
if you're successful.
And the reason why that doesn't bother me so much, basically, is the portion of the Kafka market, let's say because we have commercial competitors, obviously, the portion of the Kafka market that has been commercialized.
Let's say somebody is paying a a licensing fee or you know some
of their fee to use the product not just you know hiring somebody to run it for them
the portion of that market that's been commercialized is very small so there's so much
greenfield market out there for us to commercialize along with this constant
ever increasing trend of things becoming more real time and these other
are these other tailwinds of uh more observability and security data being generated in the world
there's just this market is just going to be so big in the future that a i think it's unlikely
to have a winner-takes-all dynamic similar to the way that there are
multiple large public cloud hyperscalers that exist and are very profitable.
And there's just so much of this market out there that we're not super concerned about
any particular competitor.
Even if one were open source,
there's a lot of other dimensions
that we would hopefully be better at competing on
that you don't get out of just the fact
that the product is open source.
That combined with the fact that the market is so huge
that we're pretty happy with our position as it is today.
Hey, friends, I'm here with Brandon Fu, co-founder and CEO of Paragon.
Paragon lets B2B SaaS companies ship native integrations to production in days with more than 130 pre-built connectors or configure own custom integrations. So, Brandon, talk to me about the friction developers feel with integrations, SSO, dealing with rate limits, retries, auth, all the things.
Yeah, so there's a lot of aspects to the different problems that you have to solve in the
integration story in building these integrations and also providing them in a user-friendly way
for your customers to self-serve and onboard and consume those integrations so part of what the
paragon sdk provides is that embedded user experience again what we call our connect portal
that's going to provide the authentication for your users to connect their accounts that's going going to be the initial onboarding. But in addition to that, your users may also want
to configure different options or settings for their integrations. A common example that we see
for Salesforce or for CRM integrations in general is that your users may want to select some type of
custom object mapping. Every CRM can be configured differently, so your users might want to map
objects to some different type of record in their Salesforce
or different fields in their Salesforce.
And typically, that's what developers would have to build on their own,
is this UI for your users to configure these different settings for every single integration.
That's also going to be what's provided by the Paragon SDK,
is not just that initial onboarding and authentication experience, but also the configuration end user UX
for different settings like custom field mapping,
selecting which types of features on your integration
that your user might want to configure.
And that's also going to be provided fully out of the box
by Paragon SDK.
With integrations, different APIs
might have different rate limits.
They might have different policies
that you have to conform with,
and your developers typically have to learn
these different nuances for every API
and write code individually
to conform to those different nuances.
With Paragon, because we build and maintain the connector
with each of the integrations that we support in our catalog,
we're automatically gonna handle for things like retries,
things like rate limits.
And so we look at this as sort of the backend
or infrastructure layer of the integration problem
that we have spent the last five years
essentially building and optimizing
the Paragon infrastructure to act
as the integration infrastructure for your application.
Okay, Paragon is built for product management.
It's built for engineering.
It's built for everybody.
Ship hundreds of native integrations into your SaaS application in days.
Or build your own custom connector with any API.
Learn more at useparagon.com slash changelog.
Again, useparagon.com slash changelog. That's U-S-E-P-A-R-A-G-O-N dot com slash changelog.
And I'm also here with Dennis Pilarinos, founder and CEO of Unblocked.
Check him out at getunblocked.com.
It's for all the hows, whys, and WTFs.
Unblocked helps developers to find the answers they need to get their jobs done.
So Dennis, you know we speak to developers.
Who is Unblocked best for?
Who needs to use it?
I think if you are a team that works with a lot of coworkers,
if you have like 40, 50, 60, 100, 200, 500 coworkers, engineers,
and you're working on a code base that's old and large,
I think Unblocked is going to be a tool that you're working on a code base that's old and large, I think Unblocked is
going to be a tool that you're going to love. Typically, the way that works is you can try it
with one of your side projects. But the best outcomes are when you get comfortable with the
security requirements that we have. You connect your source code, you connect a form of documentation,
be that Slack or Notion or Confluence. And when you get those two systems
together, it will blow your mind. Actually, every single person that I've seen on board with the
product does the same thing. They always ask a question that they're an expert in. They want to
get a sense for how good is this thing? So I'm going to ask a question that I know the answer to
and people are generally blown away by the caliber of the response. And that starts to build a relationship of trust where they're like, no,
this thing actually can give me the answer that I'm looking for. And instead of interrupting a
coworker or spending 30 minutes in a meeting, I can just ask a question, get the response in a
few seconds and reclaim that time. The next step to get unblocked for you and your team is to go to getunblocked.com.
Yourself, your team can now find the answer they need
to get their jobs done
and not have to bother anyone else on the team,
take a meeting or waste any time whatsoever.
Again, getunblocked.com.
That's G-E-T-U-N-B-L-O-C-K-E-D.com.
And get unblocked.
So let's go back to bootstrapping then.
It seems like the kind of thing you could bootstrap.
I mean, it's just you and Richie
coding it up on nights and weekends, you know?
Get it rocking and rolling.
Keep all that equity.
No one to answer to.
You're going to get customers pretty quick.
Then you can start hiring based off of your customer.
Why that decision to raise?
So the reason why people raise money is let's only
put the other front the right reason to raise money is that you want to go faster that's basically
why someone should raise venture capital is they have something that's working and they want it to
go faster my co-founder and i had so much conviction in what we were doing in terms of it being commercially successful that we knew on day one we would be able to go much faster if we raised money.
So that's why we did it.
There was never a period of time where we were guessing like oh do people need this it was it was like very obvious to us
from day one that we wanted to go as fast as possible and raising money is the way to do that
because we were able to hire a lot you know relative to the two of us many more people
and pay them very well and make them happy and support, you know, make it hiring people that
are good at distributed systems stuff is very expensive.
And the, those types of people also really appreciate job security.
So being able to have a bunch of cash in the bank, even if we're not spending it is very
important to those folks.
So our internal stakeholders, you know, as employees and founders and stuff,
it makes it very comfortable to have that cushion
and allows us to hire people
that will make things go faster.
And then on the complete other side of the coin,
if you want to sell products to enterprise buyers
as two people without having raised any money, it's going to
raise a lot of eyebrows if they want to put that in production as the backbone of their multi-billion
dollar business. That makes a lot of sense. It's really hard. Yeah. Whereas if we can walk into a
meeting and say, hey, we've raised roughly $20 million from Greylock and Amplify Partners, who are our Series A and seed
investors, respectively, that sidesteps a lot of really awkward conversations about like,
what's going to happen to you founders if you get hit by a bus tomorrow or something?
Obviously, that'll be very bad for the company, but there is at least somebody else who cares and would like to continue to hopefully see their
investment succeed. So the dilution stuff is really, obviously it's a good point, but you
just have to think, are the odds of success higher? And will the eventual outcome be bigger
if I raise VC? And if that is true, then I think it's worth doing. But if you're in a position
where you don't know if your product is going to be commercially successful, it closes a lot of
doors to raise VC. Like every further round that you raise, it makes it harder and harder
to explore different kinds of exit opportunities that you might personally view as a success, but your venture investors may not
view as a success. So it's definitely a balancing act, but you just have to go into it with your
eyes open and understand what you're, you have to understand the game you're playing basically
and walk into it with your eyes open. Had you played this game before?
Yes. Very briefly, a long time ago, unsuccessfully. I did, yeah.
And in between that and starting WarpStream,
my co-founder and I were considering raising money
for the thing that we were doing before we joined Datadog.
And that's how we got to know our seed investors
at Amplify Partners.
And we didn't have that conviction at the time to say, let's go raise
money. This is going to be huge. In hindsight, we probably would have done very well with that.
Had we chose to raise VC and like remain as an independent thing and all that instead of joining
data dog. But because we didn't have that conviction, we took the quote unquote exit
opportunities that were available to us at that moment because we hadn't yet raised money. We're
very flexible. So we were able to join Datadog and it worked out super well. We got to meet a
bunch of interesting people and the project we were going on was successful and super fun and
all that stuff. But because we did have that conviction this time around and we wanted to go
as fast as possible. That's why we,
that's why we chose to raise money this time around.
I think your reasons are sound. I don't disagree. And I will not argue.
Good answer. I'll give it to you.
I will not argue. I think, you know, we, we check wisdom. We don't,
while we love open source, I don't think that you would have had i can see how going
the route of venture capital and not going as you had said some of the burden of open source
in terms of distraction was your actual word i can understand that and that's your prerogative
right bobby brown is is dated in terms of an artist.
Nobody knows Bobby Brown anymore.
But it's my prerogative is still a true phrase.
Ryan, do you know Bobby Brown?
It's been a long time since I've heard any Bobby Brown,
but I do indeed a little bit.
I grew up on Bobby Brown, so I can't help but bring it up. It's my prerogative.
Yeah, it's my prerogative.
Yeah, great song.
You know, so it's your prerogative.
And it's Richie slash Richard's great name, by the way, Silicon Valley.
I mean, I had to bring it.
He was called Richie and his name was
Richard Hendricks, but he was called Richie by his attorney. I don't disagree with the reasoning
for your direction. I hope it works out for you. I think it seems like it's going to, but I do agree
with what Jared said, which was there is probably going to be, if you hit critical mass and enough scale, somebody who copies what you've done and simply just says, okay, literally copy and now it's open source and they'll be okay with that.
I don't think that you should operate in a state of fear of that and make choices because of it because that's free market, man.
That's going to happen.
But good on you for being able to answer these hard questions.
I think you did well on that front. I don't any any argument really that's all i'll say and and that's that's only because we spent a lot of time thinking about it and a lot of time talking
to folks who are like day-to-day building commercial open source businesses yeah that
really brought our perspective to where it is today. And it's not to say that
there are no possible opportunities to start a commercial open source company that would
be successful today. There obviously are. It's just that for our particular market and the
strategy that we were pursuing, it just wasn't going to be, I think I can put it a little bit
more crisply. The segment of the market that we're going after is already price and cost sensitive. If we offered them the opportunity
to run our product for free, the odds that we will be able to charge them almost any money
would be pretty low. There are other markets out there that have completely different dynamics in this, especially if you're not trying to provide the low cost solution.
So I didn't mean to denigrate commercial open source companies.
I was just saying that when we explained our strategy, basically, to these other commercial open source founders, they said, that's going to be hard.
It's going to be very hard for you.
So you should think about it before you choose to go down that path. And we chose this path
because we think it's most likely to be successful for us while also, I would be personally very
upset if I had to do one of those license change rug pulls. It would make me very sad because I know it
causes a lot of consternation and heartburn for people
when those things happen. So we just wanted to be straight up with
people from day one.
I also think that you are a particularly easy target for the hyperscalers to reclone
and host and offer because of the nature of what you're doing.
Yeah, I mean it's a general purpose infrastructure building block
and Amazon has a product.
Amazon has MSK as a competing product
with WarpStream so they very directly
could just offer a new skew
of msk that is the the warp stream one if it were open source right that would be that would be very
challenging for us ride your coattails are there other competitors out there are there other people
said that are putting kafka on object storage yeah i mean there there are a number of companies out
there that have talked about how they're doing this.
I think the most notable of them out there would probably be Confluence announcement of any of them where they're taking a similar direct to S3 approach
as Workstream does. And the product isn't available today for anybody to just go sign up for and do a
comparison, but they've made an announcement and I'm sure that's going to progress more in the
future. I'm sure essentially every one of our competitors, if they
haven't started working on it already, a similar storage engine, they
will. So I have no doubts that the
cat is out of the bag, so to speak, on the idea.
Well, that does make sense then why you went venture capital
so that you can go fast. And I think that from a visual standpoint and you've done well from a brand standpoint, I think your marketing site is pretty awesome.
I mean, there's obviously always room for improvement, but it's pretty solid. I do want to bring up the idea of pricing because I don't disagree there either.
There's large corporations, enterprises, so to speak, Fortune 500s, that if you're not charging them $10,000, $20,000 a year, they're like, what's wrong with you?
We can't use you.
We literally need to give you a lot of money to trust you.
And that's just the nature of the beast there. But when you land on your page for pricing out the gate,
the TCO, total cost of ownership,
is at least the default numbers that are put there,
is $2,295 per month.
So you're not even scaring people away.
I mean, you're literally putting your fist in their face and saying, like, it costs a lot, y'all.
Yeah, but that's the cheap version.
These people are probably used to paying more than that, right?
Yeah, I mean,
there's a little slider
that lets you turn on the breakdown mode
of the comparison
to open source Kafka
running in three AZs or one AZ
or comparing to AWS
MSK.
And we didn't even put a particularly big workload
as the default on the pricing calculator.
I think it's a pretty standard workload.
And people are used to looking at big numbers
when it comes to running, yeah,
when they're used to running Kafka
for these kinds of observability and telemetry workloads.
They just cost a lot.
If you look a little bit further down the pipeline there,
if they're sending the data to Elasticsearch or Snowflake or Clickhouse,
they're probably paying significantly more for those things.
So Kafka looks cheap in comparison
and then WarpStream looks cheap compared to Kafka.
So we're very open about the fact that our product is designed to be more cost effective.
But we do offer additional, we call them account tiers, basically, where the things that enterprises want from you. a month is they want to be able to file a support ticket and have somebody reply to their support
ticket extremely quickly that's the thing that they're basically paying you for that's the stuff
that doesn't scale basically as you get bigger or your product gets better obviously you might
have fewer support tickets but you still need humans to be able to respond quickly when somebody does file those support tickets.
So our account tiers for pro and enterprise give customers a support response time SLA that they can count on that today is backed by the engineering team.
Like if you file like if you're an enterprise customer and you file a priority zero support ticket, which is just like my, my production cluster is down.
I need help right away.
That pages the engineering on call rotation and gets you help as quickly as somebody can respond to page reading.
So that's the type of stuff that people would be paying for basically on top.
And that's how we make enterprises trust us.
Another reason to raise venture capital,
you need to hire people so you can have a 24-7
follow the sun on-call rotation
in order to back those support response time SLS.
So if you needed five gigabit write throughput,
which I imagine is quite high,
but let's say that you do,
14-day retention,
so that's two weeks retention.
Not that much.
We're talking 97 grand per month going to WarpStream
and $1.76 million a month using Kafka?
These are numbers that blow my little mind.
Sorry, I didn't hear the first year, your throughput number.
It was the highest.
It was five gigabits.
Five gigabits.
Yeah.
Yeah.
I mean, it's obviously as you get up into these larger and larger, well, first of all,
I'll say 14 days, pretty long retention for most people for Kafka.
Usually because it's a transitory, I'd say three to seven days.
That's a pretty, that's a pretty typical one.
And if you're at these kinds of scales,
you're probably not paying your cloud provider retail price
for cross-AZ networking anymore.
If Kafka was a big part of your bill,
that would be probably one of the items
that you would want to negotiate with your cloud provider.
So the comparison doesn't get nearly as rosy if you've negotiated some discounts.
But the way that you can kind of estimate what those would be is if you switch it from
Kafka 3.0 AZ to Kafka 1.0 AZ, that will reduce the inner zone networking dramatically and
turn on the single zone consumer's flag.
So the comparison doesn't look quite as good anymore.
Still 10X.
Still looking pretty good.
Yeah.
There you turn it one day retention.
Turn it to one day retention.
And then it goes to 86% savings versus 60% savings.
So it's still big,
but we understand that there are a lot of
big Kafka workloads out there.
And we're confident that if we can deliver 75%, 80% savings, they don't always come out at 90% like that example does.
But if we can deliver 75%, 80% savings, it's a compelling enough reason for someone to...
There's a little bit of activation energy it takes to get people to do anything. And we're confident that that 75% to 80% cheaper thing
is enough of that activation energy to get people to at least give us a shot.
I want to point out that these are just dollars too.
This is not developer friction or operational burden
or enhanced developer experience,
which are the hallmark of any conversation today
with dev tools, right?
Like you could be a 13-year-old tool like Kafka
and get away with, and I have no idea.
So no skin in the game.
I've never used Kafka personally.
So if there's some haters out there,
those marginal haters I mentioned earlier,
don't hate on me.
But there may be some warts and blemishes and burdens
within the kafka ecosystem
that just makes it just challenging to operate to stand up obviously there's costs we've already
talked about that literally at length but i think there's something to be said about a modern take
given today's cloud infrastructure with some of the dev user experience attributes I've seen you already put in place.
So cost is one thing,
but then happy developers is retained developers,
morale boosts, you know, maybe freedom on weekends,
less pager duty, you know,
less whatever from anybody
who might be competing with pager duty.
That's a good thing.
Yeah, we all, like at Workstream,
we know that that's like us,
a very important part of what we do.
But it's always easier to walk into a sales conversation
with the hard facts numbers and not the,
a lot of vendors use those exact attributes to describe,
to attribute a lot of savings to their product, which is
probably true, but they feel a little bit more wishy-washy compared to the hard facts
numbers.
So that's why we lead with those in our pricing calculator.
And obviously those are still things that we highlight when we're talking to potential
customers to help them understand the value of the product.
But we like to think of that as more like the icing on the cake stuff.
And the cost savings is what we're promising them, basically.
Everything else is just icing on the cake.
Icing on the cake.
What's a good next step?
I mean, I feel like we've really just gone through all of it, Jared.
You got anything else?
I think we have.
We've covered it all, man.
I think we've covered every ounce of Warpstream.
Ryan, thank you for being patient with our questions
and going through everything and filling in all the blanks too.
I think you did a great job with this conversation.
I'm happy.
I'm impressed.
I think there's a lot of things I can see as quality in you as a person
and also the thing that you're trying to do.
I think you guys have led with some wisdom.
I like a lot that you went out and talked to folks
rather than just shooting from the hip, so to speak,
with your choices and letting it be opinion-based.
You seem to have leaned into the wisdom of those who have come before you
with your particular target market,
which I think is key to your choices. And so I'm stoked that you were able to answer the questions
we asked. So thank you. Yeah, this has been, it's been very fun. I was not expecting to talk about
raising money at all during this conversation, but that was something that we, we spent a lot
of time. They like, when you're building a company, you have to spend a lot of time thinking
about like strategic stuff. That's not just writing code, you have to spend a lot of time thinking about like strategic stuff.
That's not just writing code.
And that one was a lot of back and forth with my co-founder and I about how we were going to do things.
And we're very happy with our direction now.
But it took the input of a lot of people to arrive at this conclusion.
And we're very thankful for those people that made themselves available for for learning more
about commercial open source stuff because we had never really even considered it before and
super important uh to learn along the way very cool well warpstream.com is where you can go we'll
obviously put links in the show notes ryan thank you it's been awesome. Thanks, man. Thanks.
Okay, so WarpStream seems to be what Kafka would look like if it was redesigned from the ground up to run in modern cloud environments.
They did not tend to open source,
and I think Ryan had a pretty solid argument for why not.
But time will tell if an open source copycat comes along to sniff out their lunch and eat it.
Until then, good for them for putting in the work to gain the conviction they have for
their choices and their position.
Later this week, our game show Pound to Find is back on ChangeLoginFriends, and it was
epic.
This is the closest I've come to winning, and I was still pretty far off, and that's
this Friday.
Okay, big thanks to our sponsors this week. Speakeasy,
love them. Newdomainspeakeasy.com. Also our friends over at Superbase celebrating launch
week number 12, superbase.com. And our friends over at Paragon, all the B2B SaaS integrations
you want in a single platform, useparagon.com. And also our friends over at Unblocked for all those whys, hows, and WTFs.
Check them out, getunblocked.com.
And of course, to our partners over at fly.io.
That is the home of changelog.com.
Check them out, fly.io.
And to the beatmaster in residence,
Breakmaster Cylinder. Bringing those beats. Okay, that's it. This show's done. We'll see you on
Friday. Game on.