Screaming in the Cloud - Yugabyte and Database Innovations with Karthik Ranganathan
Episode Date: September 21, 2021About KarthikKarthik was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer, and ...also an early contributor to Cassandra, before it was open-sourced by Facebook. He is currently the co-founder and CTO of the company behind YugabyteDB, a fully open-source distributed SQL database for building cloud-native and geo-distributed applications.Links:Yugabyte community Slack channel: https://yugabyte-db.slack.com/Distributed SQL Summit: https://distributedsql.orgTwitter: https://twitter.com/YugaByte
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
You could go ahead and build your own coding and mapping notification system,
but it takes time, and it sucks.
Alternately, consider Courier, who's sponsoring this episode.
They make it easy.
You can call a single send API for all of your notifications and channels.
You can control the complexity around routing,
retries, and deliverability,
and simplify your notification sequences
with automation rules.
Visit courier.com today and get started for free.
If you wind up talking to them,
tell them that I sent you,
and watch them wince, because everyone does when you bring up my name.
That's the glorious part of being me.
Once again, you could build your own notification system, but why on God's flat earth would you do that?
Visit courier.com today to learn more.
This episode is sponsored in part by you, Gabite.
Distributed technologies like Kubernetes are great, citation very much needed,
because they make it easier to have resilient, scalable systems.
SQL databases haven't kept pace, though.
Certainly not like no SQL databases have, like Route 53, the world's greatest database.
We're still, other than that, using legacy,
monolithic databases that require ever-growing instances of compute. Sometimes we'll try and
bolt them together to make them more resilient and scalable, but let's be honest, it never works out
well. Consider UGAbyteDB. It's a distributed SQL database that solves basically all of this. It is 100% open source, and there's no asterisk
next to the open on that one. And it's designed to be resilient and scalable out of the box,
so you don't have to charge yourself to death. It's compatible with Postgres SQL, or Postgresquil,
as I insist on pronouncing it, so you can use it right away without having to learn a whole
new language and refactor everything. And you can distribute it wherever your applications take you, from across availability
zones to other regions or even other cloud providers should one of those happen to exist.
Go to yugabyte.com, that's Y-U-G-A-B-Y-T-E dot com, and try their free beta of Yugabyte Cloud,
where they host and manage it for you. Or see what the open source project
looks like. It's effortless distributed SQL for global apps. My thanks to you,
Gabyte, for sponsoring this episode. Welcome to Screaming in the Cloud. I'm Corey Quinn.
Today's promoted episode comes from the place where a lot of my episodes do. I loudly and
stridently insist that Route 53, or DNS in
general, is the world's greatest database. And then what happens is a whole bunch of people
who work at database companies get upset with what I've said. Now, please don't misunderstand
me. They're wrong. But I'm thrilled to have them come on and demonstrate that, which is what's
happening today. My guest is CTO and co-founder of YugaByte,
Karthik Ranganathan. Thank you so much for spending the time to speak with me today.
How are you? I'm doing great. Thanks for having me, Corey. We'll just go for YugaByteDB being
the second best database. Let's just keep the first one out of it. Okay, we're all fighting
for number two. And besides, number two tries harder. It's like that whole branding thing from years past. So you were one of the original database engineers at Facebook,
responsible for building a bunch of, well, nonsense, like Cassandra and HBase.
You were an HBase committer, early contributor to Cassandra,
even before it was open sourced.
And then you look around and said, all right, I'm going to go start a company,
roughly around 2016, if memory serves. And I'm going to go build a database and bring it to the world.
Let's start at the beginning.
Why on God's flat earth do we need another database?
Yeah, that's the question.
That's the million dollar question, isn't it, Corey?
So this is one, fortunately, that we've had to answer so many times from 2016 that I guess
we've got a little good at it.
So here's the learning
that a lot of us had from Facebook. We were the original team, like all three of us founders,
we met at Facebook and we not only built databases, we also ran them, right? And let me paint a
picture. Back in 2007, right, the public cloud really wasn't very common, right? And people were
just going into multi-region, multi-data center deployments. And Facebook was just starting to take off to really scale.
Now, forward to 2013, I was there through the entire journey.
A number of things happened in Facebook.
Like we saw the rise of the equivalent of Kubernetes, which was internally built.
We saw like, for example, microservices.
Tupperware equivalent there.
Tupperware, exactly.
You know the name.
Yeah, exactly.
So, and we saw how we went from two data centers to multiple data centers and nearby and faraway data centers, zones and regions, what you know as today.
A number of such technologies come up.
And, you know, I was on the database side, and we saw how existing databases wouldn't work to distribute data across nodes, failover, et cetera, et cetera, right?
So we had to build a new class of databases, what we now know as NoSQL.
Now, back in Facebook, I mean, the typical difference between Facebook and an enterprise at
large is Facebook has a few really massive applications. For example, you do a set of
interactions, you view profiles, you add friends, you talk with them, etc. These are super massive
in their usage, but there are very few in their access patterns. At Facebook, we were mostly
interested in dealing with scale and availability. Existing databases couldn't do it, so we built
NoSQL. Now, forward a
number of years, I can't tell you how many times I've had conversations with other people building
applications that would say, hey, could I get a secondary index on the SQL database? Or how about
that transaction? I only need it a couple of times. I don't need it all the time, but could you,
like, for example, do multi-row transactions? And the answer was always no, because it was never
built for that. So today, what we're seeing is that transactional data and transactional applications are all
going cloud native, and they all need to deal with scale and availability, right?
And so the existing databases don't quite cut it.
So the simple answer to why we need it is we need a relational database that can run
in the cloud to satisfy just three properties, right?
It needs to be highly available.
Failures are no, upgrades are no, it needs to be available. It needs to scale on demand. So simply add or remove nodes and scale up
or down. And it needs to be able to replicate data across zones, across regions in a variety
of different topologies. So availability, scale, and geographic distribution, along with retaining
most of the RDBMS features, the SQL features, right? That's really what the gap we're trying to solve. I don't know that I've ever told this story on the podcast,
but I want to say it was back in 2009. I flew up to Palo Alto and interviewed at Facebook.
And it was a different time, a different era. It turns out that I'm not as good on the whiteboard
as I am at running my mouth. So all right, I did not receive an offer. And I think everyone can agree at this point that was for the best. But I saw one of the
most impressive things I've ever seen during a part of that interview process. My interview was
scheduled for a conference room for, must have been 11 o'clock or something like that. And at
10.59, they're looking at their watch like, hang on 10 seconds. And then the person I was with
reached up to knock on the door to let the person know
that their meeting was over and the door opened.
So it's very clear that even in large companies, which Facebook very much was at the time,
people had synchronized clocks.
This seems to be a thing, as I've learned from reading the parts I could understand
of the Google Spanner paper.
When you're doing distributed databases, clocks are super important.
At places like Facebook, that is,
I'm not going to say it's easy. Let's be clear here. Nothing is easy, particularly at scale,
but Facebook has advantages in that they can mandate how clocks are going to be handled
throughout every piece of their infrastructure. You're building an open source database and you
can't guarantee in what environment and on what hardware that's going to run. And you must have
an atomic clock
hooked up, is not something you're generally allowed to tell people. How do you get around that?
That's a great question. Very insightful. Cutting right to the chase.
So the reality is we cannot rely on atomic clocks. We cannot mandate our users to use them,
or we'd not be very popularly used in a variety of different deployments. In fact, we also work
in on-prem
private clouds and hybrid deployments where you really cannot get these atomic clocks.
So the way we do this is we come up with other algorithms to make sure that we're able to get
the clocks as synchronized as we can. So think about it at a higher level. The reason Google
uses atomic clocks is to make sure that they can wait to make sure every other machine is
synchronized with them.
And the wait time is about seven milliseconds, right? So the atomic clock service or the true
time service says no two machines are farther apart than about seven milliseconds. So you just
wait for seven milliseconds. You know, everybody else has caught up with you. And the reason you
need this is you don't want to write on a machine. You don't want to write some data and then go to
a machine that has a future or an older time and get inconsistent results. So just by waiting seven milliseconds, they can ensure that
no one is going to be older and therefore serve an older version of the data. So every write that was
written, all the other machines see it. Now the way we do this is we only have NTP, the network
time protocol, which does synchronization of time across machines, except it takes 150 to 200
milliseconds. Now, we wouldn't be a very good database if we said, look, every operation is
going to take 150 milliseconds. So within these 150 milliseconds, we actually do the synchronization
in software. So we replace the notion of an atomic clock with what is called a hybrid logical clock.
So one part using NTP and physical time and another part using counters and logical
time and keep exchanging RPCs, which are needed in the course of the database functioning anyway,
to make sure we start normalizing time very quickly. This, in fact, has some advantages
and disadvantages. Everything was a trade-off. But the advantage it has over a true time style
deployment is you don't even have to wait that seven milliseconds
and a number of scenarios you can just instantly respond. So that means you get even lower latencies
in some cases. Of course, the trade off is there are other cases where you have to do more work
and therefore more latency. The idea absolutely makes sense. You've started this as an open
source project and it's thriving. Who's using it and for what purposes? Okay, so one of the
fundamental tenets of building this database, right, I think back to your question of why does
the world need another database, is that the hypothesis is not so much the world needs another
database API. That's really what users complain against, right? You create a new API and even if
it's SQL and you tell people, look, here's a new database, it does everything for you. It'll take
them two years to figure out what the hell it does and build an app, and they'll put it
in production, and then they'll build a second and a third. And then by the time they hit the
10th app, they find out, okay, this database cannot do the following things. But you're five
years in, you're stuck, you can only add another database. That's really the story of how NoSQL
evolved. And it wasn't built as a general purpose database, right? So in the meanwhile, databases like, you know,
Postgres, for example, have been around for so long
that they like, you know, absorb
and have such a large ecosystem and usage
and people who know how to use Postgres and so on.
So we made the decision
that we're going to keep the database API compatible
with known things.
So people really know how to use them from the get-go
and enhance it at a lower level
to make it cloud nativenative, right?
So what does YugoByteDB do for people?
It is the same as Postgres and Postgres features at the upper half.
It reuses the code, but it is built on the lower half
to be shared nothing, scalable, resilient, and geographically distributed.
So we're using the public cloud-managed database context.
The upper half is built like Amazon Arura.
The lower half is built like Google Spam. Now, when you think about workloads that can benefit
from this, we're a transactional database that can serve user-facing applications and real-time
applications that have lower latency. So the best way to think about it is people that are building
transactional applications on top of, say, a database like Postgres, but the application
itself is cloud native, you'd have to do a lot of work to make this Postgres piece be highly available and scalable
and replicate data and so on in the cloud. Well, with YugoByteDB, we've done all that work for you,
and it's as open source as Postgres. So if you're building a cloud native app on Postgres that's
user-facing or transactional, YugoByteDB takes care of making the database layer behave like Postgres, but become cloud
native.
Do you find that your users are using the same database instance, for lack of a better
term?
I know that instance is sort of a nebulous term.
We're talking about something that's distributed.
But are they having database instances that span multiple cloud providers?
Or is that something that is more talk than you're actually seeing in the wild?
So I'd probably replace the word instance with cluster, just for clarity, right?
So a cluster has a-
I concede the point, absolutely.
Okay, so we'll still keep Route 53 on top, though, so it's good.
At that point, the replication strategy is called a zone transfer,
but that's neither here nor there. Please, by all means, continue.
Okay, so a cluster database like Yoga by DB has a number of instances.
Now, I think the question is, is it theoretical or real, right?
What we're seeing is it is real, and it is real perhaps in slightly different ways than
people imagine it to be.
Okay.
So I'll explain what I mean by that.
Now, there's one notion of being multi-cloud where you can imagine there's like, say, the
same cluster that spans multiple different clouds, right?
And you have your data being written in one cloud and being read from another, right?
This is not a common pattern, although we have had one or two deployments that are attempting
to do this.
Now, a second deployment shifted once over from there is where you have multiple instances
in a single public cloud and a bunch of other instances in a private cloud.
So it stretches the database
across public and private, right? You'd call this a hybrid deployment topology. That is more common.
So one of the unique things about YogaByteDB is we support asynchronous replication of data,
just like your RDBMSs do, the traditional RDBMSs. In fact, we're the only one that straddles both
synchronous replication of data as well as asynchronous replication of data. We do both. So once shifted over would be a cluster that's deployed in one of the clouds, but an
asynchronous replica of the data going to another cloud. And so you can keep your reads and writes,
even though they're a little stale, you can serve it from a different cloud. And then once again,
you can make it an on-prem private cloud and another public cloud. And we see all of those
deployments. Those are massively
common, right? And then the last one over would be the same instance of an app or perhaps even
different applications, some of them running on one public cloud and some of them running on a
different public cloud. And you want the same database underneath to have characteristics of
scale and failover, right? Like for example, if you built an app on Spanner, what would you do
if you went to Amazon and wanted to run it for a different set of users? That is part of the reason I tend to avoid the idea of
picking a database that does not have a least theoretical exodus path, because reimagining
your entire application's data model in order to migrate is not going to happen. So come hell or
high water, you're stuck with something like that where it lives. So even though I'm a big proponent
as a best practice, and again, there are exceptions where this does not make sense, but as a general piece of guidance,
I always suggest pick a provider, I don't care which one, and go all in. But that also should
be shaded with the nuance of, but also at least have an eye toward theoretically, if you had to
leave, consider that if there's a viable alternative. And in some cases, in the early days
of Spandau, there really wasn't. So if you needed that functionality, okay, go ahead and use it,
but understand the trade-off you're making. That's really what it comes down to from my
perspective, understand the trade-offs. But the reason I'm interested in your perspective on this
is because you are providing an open source database to people who are actually doing
things in the wild. There's not much agenda there in the
same way among a user community of people reporting what they're doing. So you have,
in many ways, one of the least biased perspectives on the entire enterprise.
Oh, yeah, absolutely. And like I said, I started from the least common to the most common. Maybe
I should have gone the other way. But we absolutely see people that want to run the
same application stack in multiple different clouds for a variety of reasons. Oh, if you're a SaaS vendor, for example, you say, oh, we're only in this one cloud.
Potential customers who are in other clouds say, well, if that changes, we'll give you money. Oh,
money! You say other cloud. I thought you said something completely different. Here you go.
Yeah, you've got to at some point. But the core of what you do, beyond what it takes to get that
application present somewhere else, you usually keep in your primary cloud provider. Exactly. Yeah, exactly. Crazy things sometimes dictate
or have to dictate architectural decisions, right? Like, for example, you're seeing the
rise of compliance. Different countries have different regulatory reasons to say,
keep my data local or keep some subset of data local. And you simply may not find the right
cloud providers present in those countries. You may be a pass or an API provider that's helping
other people build applications. And the applications that the API providers customers
are running could be across different clouds. And so they would want the data local. Otherwise,
the transfer costs would be really high. So a number of reasons dictate or like a large company
may acquire another company that was operating in yet another cloud. Everything else is great,
but they're in another cloud. They're not going to say no, because you're operating on another
cloud. It still does what they want, but they're in another cloud. They're not going to say no because you're operating on another cloud.
It still does what they want,
but they still need to be able to have a common base of expertise
for their app builders and so on.
So a number of things dictate
why people start looking at cross-cloud databases
with common performance
and operational characteristics
and security characteristics,
but don't compromise on the feature set, right?
That's starting to become super important
from our perspective.
I think what's most important is the ability to run the database with ease
while not compromising on your developer agility or the ability to build your application, right?
That's the most important thing. When you founded the company back in 2016,
you are VC-backed. So I imagine your investor pitch meetings must have been something a little
bit surreal. They ask hard questions such as,
why do you think that in 2016, starting a company to go and sell databases to people is a viable business model? At which point you obviously corrected them and said, oh,
you misunderstand. We're building an open source database. We're not charging for it. We're giving
it away. And they apparently said, oh, that's more like it. And then invested, as of the time
of this recording, over $100 million in your company. Let me be the first to say there are aspects of money that I
don't fully understand, and this is one of those. But what is the plan here? How do you wind up
building a business case around effectively giving something away for free? And I want to be clear
here. Yugabyte is open source, and I don't have an asterisk next to that.
It is not one of those source available licenses, or anyone can do anything they want with it
except Amazon, or you're not allowed to host it and offer it as a paid service to other
people.
So how do you have a business, I guess, is really my question here.
You're right, Corey.
We're 100% open source under Apache 2.0, I mean, the database.
So our theory on day one, I mean, of course, this was a hard question and people did ask us this. And I'll take you guys back to 2016. It was unclear, even as of 2016, if open source companies were going doing a great job. Do you really need open source to succeed? There were a lot of such questions, right? And every company, every project, every space has
to follow its own path, right? Just applying learnings. Like for example, Red Hat was open
source and that really succeeded, but there's a number of others that may or may not have succeeded,
right? So our plan back then was to tread the waters carefully in the sense we really had to
make sure open source was the business model we wanted to go for. So under the advisement from our VCs, we said, we take it slowly.
We want open source on day one.
We talk to a number of our users and customers and make sure, you know, that is indeed the path we wanted to go.
The conversations pretty clearly told us people wanted an open database that was very easy for them to understand.
Because if they are trusting their crown jewels, their most critical data, their systems of record,
this is what the business depends on, into a database, they sure as hell want to have
some control over it and some transparency as to what goes on, what's planned, what's
on the roadmap.
Look, if you don't have time, I will hire my people to go build for it.
They want it to be able to invest in the database.
So open source was absolutely non-negotiable for us.
We tried the traditional technique for a
couple of years of keeping a small portion of the features of the database itself closed. So it's
what you'd call open core. But on day one, we were pretty clear that the world was headed towards
DBaaS, database as a service, and make it really easy to consume. At least the bad patterns as well,
like, oh, if you want security, that's a paid feature. No, that is not optional.
And the list then of what you can wind up adding as paid versus not gets murky. And you're
effectively fighting your community when they try and merge some of those features in. And it just
turns into a mess. Exactly. So it did for us for a couple of years. And then we said, look, we're
not doing this nonsense. We're just going to make everything open and just make it simple, right?
Because our promise to the users was we're building everything that looks like Postgres,
so it's as valuable as Postgres, and it'll work in the cloud, right?
And people said, look, Postgres is completely open, and you guys are keeping a few features
not open.
What gives, right?
And so after that, we had to conceive the point and just do that.
But one of the other founding theses of the company, the business side, was that DBaaS
and ability to consume the database is actually far more critical than whether
the database itself is open source or not, right? I would compare this to, for example, MySQL and
Postgres being completely open source, but Amazon's Aurora being actually a big business. And
similarly, it happens all over the place. So it is really the ability to consume and run business
critical workloads that seem to be more important for our customers and enterprises that paid us.
So the day one thesis was, look, the world is headed towards DBaaS.
We saw that already happen with inside Facebook.
Everybody was automated operations, simplified operations, and so on.
But the reality is, we're a startup.
We're a new database.
No one's going to trust everything to us, the database, the operations, the data.
Hey, why don't we put it on this tiny company?
And oh, it's just my most business-critical data, so what could go wrong, right? So we said, we're going to build a
version of our DBaaS that is in software. So we call this YugoByte Platform, and it actually
understands public clouds. It can spin up machines. It can completely orchestrate software installs,
rolling upgrades, turnkey encryption, alerting, the whole nine yards. That's a completely different
offering from the database.
It's not the database.
It's just on top of the database
and helps you run your own private cloud.
So effectively, if you install it on your Amazon account
or your Google account,
it will convert it into what looks like a DynamoDB
or a Spanner or what have you
with YugaByte as DB as the database inside.
So that is our commercial product.
That's source available, right?
And that's what we charge for. The that is our commercial product. That's source available, right? And
that's what we charge for. The database itself, completely open. Again, the other piece of the
thinking is if we ever charge too much, our customers have the option to say, look, I don't
want your DBAS plan. I'll go to the open source database and we're fine with that. So we really
want to charge for value. And obviously we have a completely managed version of our database as
well. So we reuse this platform for our managed
version. So you can kind of think of it as portability, not just of the database, but also
of the control plane, the D-pass plane. They can run it themselves. We can run it for them. They
can take it to a different cloud, so on and so forth. I like that monetization model a lot better
than a couple of others. I mean, let's be clear here. You've spent a lot of time developing some
of these concepts for the industry when you were at Facebook.
And because it's Facebook, the other monetization models are kind of terrifying.
Like, okay, we're going to just monetize the data you store in the open source database
is terrifying.
Only slight less would be the Google approach of, ah, every time you wind up running a SQL
query, we're going to insert ads.
So I like the model of being able to offer features that only folks who already have expensive
problems with money to burn on those problems to solve them will gravitate towards.
You're not disadvantaging the community or the small startup who wants it but can't afford
it.
I like that model.
Actually, the funny thing is we are seeing a lot of startups also consume our product
a lot.
And the reason is because we only charge for the value we bring, right?
Typically, the problems that a startup faces are actually much simpler than the complex requirements of an enterprise at scale, right? They are different. So the value is also proportional
to what they want and how much they want to consume, and that takes care of itself.
So for us, we see that startups, equally so as enterprises, have only limited amount of bandwidth.
They don't really want to spend time on operationalizing the database, especially if they have an
out to say, look, tomorrow this gets expensive.
I can actually put in the time and money to move out and go run this myself.
Why don't I just get started because the budget seems fine and I couldn't have done it better
myself anyway because I'd have to put people on it and that's more expensive at this point.
So it doesn't change the fundamentals of the model.
I just want to point out both sides
are actually gravitating to this model.
This episode is sponsored in part
by our friends at Jellyfish.
So you're sitting in your office chair,
bleary-eyed, parked in front of a PowerPoint,
and oh, my sweet feathery Jesus,
it's the night before the board meeting
because of course it is.
As you slot that
crappy screenshot of traffic light colored Excel tables into your deck or sift through endless
spreadsheets looking for just the right data set, have you ever wondered why is it that sales and
marketing get all this shiny, awesome analytics and insight tools, whereas engineering basically
gets left with the dregs? Well, the founders of Jellyfish certainly did.
That's why they created the Jellyfish Engineering Management Platform, but don't you dare call it JEMP.
Designed to make it simple to analyze your engineering organization,
Jellyfish ingests signals from your tech stack, including JIRA, Git, and collaborative tools.
Yes, depressing to think of those things as your tech stack, but this is 2021.
And they use that to create a model that accurately reflects
just how the breakdown of engineering work aligns with your wider business objectives.
In other words, it translates from code into spreadsheet.
When you have to explain what you're doing from an engineering perspective
to people whose primary IDE is Microsoft PowerPoint, consider Jellyfish. That's jellyfish.co and tell them Corey sent you. Watch for the wince. That's my favorite part. companies prefer open source databases, and this is waved around as a banner of victory by a lot of,
well, let's be honest, open source database companies. I posit that that is in fact crap
and also bad data because what the open source purists of which I admit I used to be one,
and now I solve business problems instead, believe that people are talking about freedom
and choice and the rest. In practice, in my experience,
what people are really distilling that down to is they don't want a commercial database. And it's not even about they're not willing to pay money for it, but they don't want to have a per core
licensing challenge or even having to track licensing of where it is installed and how,
and wind up having to cut checks for folks. For example, I'm going to dunk on someone because why not?
Azure, for a while, has had this campaign that it is five times cheaper
to run some Microsoft SQL workloads in Azure than it is on AWS.
As if this was some magic engineering feat of strength or something.
It's absolutely not.
It's that it is really expensive licensing-wise to run it on things that aren't Azure.
And that doesn't make customers feel good. That's the thing they want to get away from. And what open source license it is, and in many
cases, until the source available stuff starts trending towards, oh, you're going to pay us,
or you're not going to run it at all, that scares the living hell out of people, then they don't
actually care about it being open. So at the risk of alienating, I'm sure, some of the more vocal
parts of your constituency,
where do you fall on that? We are completely open, but for a few reasons, right? Like multiple different reasons. The debate of whether it purely is open or is completely permissible to me, I tend
to think a little more where people care about the openness more so than just the ability to
consume at will without
worrying about the license, but for a few different reasons, right? And it depends on which segment of
the market you look at. If you're talking about small and medium businesses and startups, you're
absolutely right. It doesn't matter. But if you're looking at larger companies, they actually care
that, for example, if they want a feature, they are able to control their destiny because you
don't want to be half wedded to a database
that cannot solve everything,
especially when the time pressure comes
or you need to do something.
So you want to be able to control
or to influence the roadmap of the project.
You want to know how the product is built,
the good and the bad.
You want a lot of people testing the product
and the feedback to come out in the open
so you at least know what's wrong.
Many times people often feel like,
hey, you know, my product doesn't work in these areas.
It's actually a bad thing, right?
It's actually a good thing
because at least those people won't try it
and it'll be safe, right?
Like customer satisfaction is more important
than just the apparent whatever it is
that you want to project about the product, right?
At least that's what I've learned in all these years
working with databases.
But there's a number of reasons
why open source is actually good.
There's also a very subtle reason that people may not understand, which is that, like, you know, legal teams,
engineering teams that want to build products don't want to get caught up in a legal review
that takes many months to really make sure, look, this may be a unique version of a license,
but it's not a license the legal team has seen before. And there's going to be a back and forth
for many months, and it's just going to derail their product and their timelines, not because
the database didn't do its job or because the team wasn't ready, but because the
company doesn't know what the risk it will face in the future is. There's a number of these aspects
where open source starts to matter for real. I'm not a purist, I would say. I'm a pragmatist,
like so I've always been. But I would say that a number of reasons why, you know, I might be
sounding like a purist, but a number of reasons why true open source is actually useful, right?
And at the end of the day, if we've already established, at least at Bugabite, we're pretty
clear about that.
The value is in the consumption and is not in the tech, right?
Like if you're pretty clear about that, because if you want to run a tier two workload or
a hobbyist app at home, would you want to pay for a database?
Probably not.
I just want to do something for a while and then shut it down and go do my thing, right?
I don't care if the database is commercial or open source. In that case, being
open source doesn't really take away. But if you're a large company betting, it does take away, right?
Oh, it goes beyond that, because it's not even in the large company story whether it costs money,
because regardless, I assure you, open source is not free. The most expensive thing that we see in
all of our customer accounts, Again, our consultancy fixes
AWS bills, an expensive problem that hits everyone. The environment in AWS is always less expensive
than the people who are working on the environment. Payroll is an expense that dwarfs the AWS bill
for anyone that is not a tiny startup that is still not paying a market rate salary to its
founders. It doesn't work that way. And the idea for those folks is not about the money.
It's about the predictability.
And if there's a 5x price hike from their database vendor,
that suddenly completely disrupts their unit economic model,
and they're in trouble.
That's the value of open source, in that it can go anywhere.
It's a form of not being locked into any vendor where it's hosted, as well
as no one company that has put it out there into the world. Yeah, and the source available license,
right, we considered that also. The reason to vote against that was you can get into scenarios where
the company gets competitive with its open source site, where the open source wants a couple of
other features to really make it work for their own use case, like, you know, case in point,
this is a startup, but the company wants to hold those features for the commercial
side. And now the startup has that 5X price jump anyway. So at this point, it comes to a head-on
where the company or the startup is being charged, not for value, but because of the monetization
model or the business model, right? So we said, you know what, the best way to do this is to truly
compete against open source. And if someone wants to operationalize the database, great, but we've already done it for you. If you think that you
can operationalize it at a lower cost than what we've done, great, that's fine.
I have to ask, there has to have been a question somewhere along the way during the investment
process of, well, what if AWS moves into your market? And I can already say part of the problem
with that line of reasoning is, okay, let's assume that AWS turns YugoByte into a managed database offering. First,
they're not going to be able to articulate for crap why you should use that over anything else,
because they tend to mumble when it comes time to explain what it is that they do.
But it has to be perceived as a competitive threat. How do you think about that?
Yeah, this absolutely came up quite a bit. And like I said, in 2016, this wasn't news back then. This is something that was happening in the world already.
So I'll give you a couple of different points of view on this. The reason why AWS got so successful
in building a cloud is not because they wanted to get into the database space. They simply wanted
their cloud to be super successful and it required value-added services like these databases, right?
Now, every time a new technology shift happens, right,
it gives some set of people an unfair advantage, right?
In this case, database vendors probably didn't recognize
how important the cloud was and how important it was
to build a first-class experience on the cloud on day one, right,
as the cloud came up because it wasn't proven
and they had 20 other things to do, and it's rightfully so.
Now, AWS comes up and they're trying to prove a point
that the cloud is really useful and absolutely valuable for their customers. And so they start putting
value-added services. And now suddenly you're in this open source battle, right? At least that's
how I would view that it kind of developed. With UgoByte, obviously, the cloud's already here,
we know on day one, so we're kind of putting out our managed service. So we'd be as good as AWS or
better. The database has its value, but the managed service has its own value. And so we'd
want to make sure we provide at least as much value as AWS, but on any cloud, right anywhere.
So that's the other part. And we also talked about the mobility of the DBaaS itself,
so moving it to your private account and running the same thing, as well as for public, right? So
these are some of the things that we have built, that we believe makes us super valuable.
It's a better approach than a lot of your predecessor companies who decided,
oh, well, we built the thing.
Obviously, we're going to be the best at running it in the end
because they dramatically sold AWS's operational excellence short.
And it turns out they're very good at running things at scale.
So that's a challenging thing to beat them on.
And even if you're able to, it's hard to differentiate among the differences
because at that caliber of operational
rigor, it's one of those you can only tell in the very niche cases. It's a hard thing to
differentiate on. I like your approach a lot better. Before we go, I have one last question
for you. And normally it's one of those positive, uplifting ones of what workloads are best for
YugaByte, but I think that's boring. Let's be more cynical and negative. What workloads are best for YugaByte? But I think that's boring. Let's be more cynical and negative.
What workloads would run like absolute crap on YugaByteDB?
Okay.
We do have a thing for this because we don't want to take on workloads and, you know, everybody
have a bad experience around.
So we're a transactional database built for user-facing applications, real-time and so
on, right?
We're not good at warehousing and analytic workloads.
So like, for example, if you were using a Snowflake or a Redshift, those workloads are
not going to work very well on top of YugoBot. Now, we do work with other external systems like,
you know, Spark and Presto, which are like real-time analytic systems, but they translate
the queries that the end user have into a more operational type of query pattern. However, if you're using it straight up for analytics, we're not a good bet.
Similarly, there are cases where people want very high number of IOPS by using a cache
or even a persistent cache.
You know, Amazon just came out with a number of persistent cache that does very high throughput
and low latency serving.
We're not good at that.
We can do reasonably low latency serving and reasonably high IOPS and scale,
but we're not the use case where you want to hit that same lookup over and over and over
millions of times in a second. That's not the use case for us. And the third thing I'd say is
we are a system of record. So people care about the data they put and they absolutely don't want
to lose it and they want to show that it's transactional. So if there's a workload where
there's a lot of data and you're okay if you want to lose and it's just some sensor data and your reasoning is like, okay, if I lose
a few data points, it's fine.
I mean, you could still use us, but you know, at that point you'd really have to be a fan
boy or something for, for YugoByte.
I mean, there's other, other databases that probably do it better.
Yeah.
That's the problem is whenever someone says, oh yeah, database or any tool that they've
built, like this is great.
What workloads is it not a fit for?
And their answer is, oh, nothing. It's perfect for everything. Yeah, I want to
believe you, but my inner bullshit sense is tingling on that one because nothing's fit for
all purposes. It doesn't work that way. Honestly, this is going to be, I guess, heresy in the
engineering world, but even computers aren't always the right answer for things. Who knew?
As a founder, I struggled with this answer a lot initially. I think the problem is when you're
thinking about a problem space, that's all you're thinking about. You don't know what other problem
spaces exist. And when you are asked the question, what workloads is it a fit for? At least I used to
say initially everything, because I'm only thinking about that problem space as the world, and it's
fit for everything in that problem space, except I don't know how to articulate the problem space.
Right. And at some point too, you get so locked into one particular way of thinking as the world and it's fit for everything in that problem space, except I don't know how to articulate the problem space. Right. And at some point too, you get so locked into one
particular way of thinking of the world that people ask about other cases. Oh, that wouldn't
count. And then your follow-up question is, wait, what's a bank? And it becomes a different story.
It's how do you wind up reasoning about these things? I want to thank you for taking all the
time you have today to speak with me. If people want to learn more about Yugabyte, either the company or the DB, how can they do that? Yeah, thank you as well for having me. I think to learn
about Yugabyte, just come join our community Slack channel. There's a lot of people. There's like
over 3,000 people. They're all talking interesting questions. There's a lot of interesting chatter on
there. So that's one way. We have a industry-wide event. It's called the Distributed SQL Summit.
It's coming up September 23rd, 22nd, 23rd, I think a couple of days. It's a two-wide event. It's called the Distributed SQL Summit. It's coming up September 23rd,
22nd, 23rd,
I think a couple of days.
It's a two-day event.
That would be a great place
to actually learn
from practitioners
and people building applications
and people in the general space
and as adjacencies.
And it's not necessarily
just about Yoga Byte, right?
It's generally about
distributed SQL databases
in general, right?
Hence, it's called
the Distributed SQL Summit.
And you can ask us on Twitter
or any of the usual
social channels as well, right?
So we love interaction.
So we are a pretty open and transparent company.
We'd love to talk to you guys.
Well, thank you so much
for taking the time to speak with me.
We'll, of course, throw links to that
into the show notes.
Thank you again.
Awesome. Thanks a lot for having me, Corey.
It was really fun. Thank you.
Likewise.
Karthik Ranganathan,
CTO and co-founder of YugaByteDB.
I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star
review on your podcast platform of choice. Whereas if you've hated this podcast, please
leave a five-star review on your podcast platform of choice, along with an angry comment halfway
through, realizing that I'm not charging you anything for this podcast and converting the angry comment into a term sheet for a $100 million investment.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started. this has been a humble pod production
stay humble