Screaming in the Cloud - Yugabyte and Database Innovations with Karthik Ranganathan

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. You could go ahead and build your own coding and mapping notification system, but it takes time, and it sucks.

Starting point is 00:00:37 Alternately, consider Courier, who's sponsoring this episode. They make it easy. You can call a single send API for all of your notifications and channels. You can control the complexity around routing, retries, and deliverability, and simplify your notification sequences with automation rules. Visit courier.com today and get started for free.

Starting point is 00:01:00 If you wind up talking to them, tell them that I sent you, and watch them wince, because everyone does when you bring up my name. That's the glorious part of being me. Once again, you could build your own notification system, but why on God's flat earth would you do that? Visit courier.com today to learn more. This episode is sponsored in part by you, Gabite. Distributed technologies like Kubernetes are great, citation very much needed,

Starting point is 00:01:30 because they make it easier to have resilient, scalable systems. SQL databases haven't kept pace, though. Certainly not like no SQL databases have, like Route 53, the world's greatest database. We're still, other than that, using legacy, monolithic databases that require ever-growing instances of compute. Sometimes we'll try and bolt them together to make them more resilient and scalable, but let's be honest, it never works out well. Consider UGAbyteDB. It's a distributed SQL database that solves basically all of this. It is 100% open source, and there's no asterisk next to the open on that one. And it's designed to be resilient and scalable out of the box,

Starting point is 00:02:12 so you don't have to charge yourself to death. It's compatible with Postgres SQL, or Postgresquil, as I insist on pronouncing it, so you can use it right away without having to learn a whole new language and refactor everything. And you can distribute it wherever your applications take you, from across availability zones to other regions or even other cloud providers should one of those happen to exist. Go to yugabyte.com, that's Y-U-G-A-B-Y-T-E dot com, and try their free beta of Yugabyte Cloud, where they host and manage it for you. Or see what the open source project looks like. It's effortless distributed SQL for global apps. My thanks to you, Gabyte, for sponsoring this episode. Welcome to Screaming in the Cloud. I'm Corey Quinn.

Starting point is 00:02:57 Today's promoted episode comes from the place where a lot of my episodes do. I loudly and stridently insist that Route 53, or DNS in general, is the world's greatest database. And then what happens is a whole bunch of people who work at database companies get upset with what I've said. Now, please don't misunderstand me. They're wrong. But I'm thrilled to have them come on and demonstrate that, which is what's happening today. My guest is CTO and co-founder of YugaByte, Karthik Ranganathan. Thank you so much for spending the time to speak with me today. How are you? I'm doing great. Thanks for having me, Corey. We'll just go for YugaByteDB being

Starting point is 00:03:36 the second best database. Let's just keep the first one out of it. Okay, we're all fighting for number two. And besides, number two tries harder. It's like that whole branding thing from years past. So you were one of the original database engineers at Facebook, responsible for building a bunch of, well, nonsense, like Cassandra and HBase. You were an HBase committer, early contributor to Cassandra, even before it was open sourced. And then you look around and said, all right, I'm going to go start a company, roughly around 2016, if memory serves. And I'm going to go build a database and bring it to the world. Let's start at the beginning.

Starting point is 00:04:10 Why on God's flat earth do we need another database? Yeah, that's the question. That's the million dollar question, isn't it, Corey? So this is one, fortunately, that we've had to answer so many times from 2016 that I guess we've got a little good at it. So here's the learning that a lot of us had from Facebook. We were the original team, like all three of us founders, we met at Facebook and we not only built databases, we also ran them, right? And let me paint a

Starting point is 00:04:34 picture. Back in 2007, right, the public cloud really wasn't very common, right? And people were just going into multi-region, multi-data center deployments. And Facebook was just starting to take off to really scale. Now, forward to 2013, I was there through the entire journey. A number of things happened in Facebook. Like we saw the rise of the equivalent of Kubernetes, which was internally built. We saw like, for example, microservices. Tupperware equivalent there. Tupperware, exactly.

Starting point is 00:04:59 You know the name. Yeah, exactly. So, and we saw how we went from two data centers to multiple data centers and nearby and faraway data centers, zones and regions, what you know as today. A number of such technologies come up. And, you know, I was on the database side, and we saw how existing databases wouldn't work to distribute data across nodes, failover, et cetera, et cetera, right? So we had to build a new class of databases, what we now know as NoSQL. Now, back in Facebook, I mean, the typical difference between Facebook and an enterprise at large is Facebook has a few really massive applications. For example, you do a set of

Starting point is 00:05:29 interactions, you view profiles, you add friends, you talk with them, etc. These are super massive in their usage, but there are very few in their access patterns. At Facebook, we were mostly interested in dealing with scale and availability. Existing databases couldn't do it, so we built NoSQL. Now, forward a number of years, I can't tell you how many times I've had conversations with other people building applications that would say, hey, could I get a secondary index on the SQL database? Or how about that transaction? I only need it a couple of times. I don't need it all the time, but could you, like, for example, do multi-row transactions? And the answer was always no, because it was never

Starting point is 00:06:01 built for that. So today, what we're seeing is that transactional data and transactional applications are all going cloud native, and they all need to deal with scale and availability, right? And so the existing databases don't quite cut it. So the simple answer to why we need it is we need a relational database that can run in the cloud to satisfy just three properties, right? It needs to be highly available. Failures are no, upgrades are no, it needs to be available. It needs to scale on demand. So simply add or remove nodes and scale up or down. And it needs to be able to replicate data across zones, across regions in a variety

Starting point is 00:06:34 of different topologies. So availability, scale, and geographic distribution, along with retaining most of the RDBMS features, the SQL features, right? That's really what the gap we're trying to solve. I don't know that I've ever told this story on the podcast, but I want to say it was back in 2009. I flew up to Palo Alto and interviewed at Facebook. And it was a different time, a different era. It turns out that I'm not as good on the whiteboard as I am at running my mouth. So all right, I did not receive an offer. And I think everyone can agree at this point that was for the best. But I saw one of the most impressive things I've ever seen during a part of that interview process. My interview was scheduled for a conference room for, must have been 11 o'clock or something like that. And at 10.59, they're looking at their watch like, hang on 10 seconds. And then the person I was with

Starting point is 00:07:23 reached up to knock on the door to let the person know that their meeting was over and the door opened. So it's very clear that even in large companies, which Facebook very much was at the time, people had synchronized clocks. This seems to be a thing, as I've learned from reading the parts I could understand of the Google Spanner paper. When you're doing distributed databases, clocks are super important. At places like Facebook, that is,

Starting point is 00:07:45 I'm not going to say it's easy. Let's be clear here. Nothing is easy, particularly at scale, but Facebook has advantages in that they can mandate how clocks are going to be handled throughout every piece of their infrastructure. You're building an open source database and you can't guarantee in what environment and on what hardware that's going to run. And you must have an atomic clock hooked up, is not something you're generally allowed to tell people. How do you get around that? That's a great question. Very insightful. Cutting right to the chase. So the reality is we cannot rely on atomic clocks. We cannot mandate our users to use them,

Starting point is 00:08:19 or we'd not be very popularly used in a variety of different deployments. In fact, we also work in on-prem private clouds and hybrid deployments where you really cannot get these atomic clocks. So the way we do this is we come up with other algorithms to make sure that we're able to get the clocks as synchronized as we can. So think about it at a higher level. The reason Google uses atomic clocks is to make sure that they can wait to make sure every other machine is synchronized with them. And the wait time is about seven milliseconds, right? So the atomic clock service or the true

Starting point is 00:08:49 time service says no two machines are farther apart than about seven milliseconds. So you just wait for seven milliseconds. You know, everybody else has caught up with you. And the reason you need this is you don't want to write on a machine. You don't want to write some data and then go to a machine that has a future or an older time and get inconsistent results. So just by waiting seven milliseconds, they can ensure that no one is going to be older and therefore serve an older version of the data. So every write that was written, all the other machines see it. Now the way we do this is we only have NTP, the network time protocol, which does synchronization of time across machines, except it takes 150 to 200 milliseconds. Now, we wouldn't be a very good database if we said, look, every operation is

Starting point is 00:09:29 going to take 150 milliseconds. So within these 150 milliseconds, we actually do the synchronization in software. So we replace the notion of an atomic clock with what is called a hybrid logical clock. So one part using NTP and physical time and another part using counters and logical time and keep exchanging RPCs, which are needed in the course of the database functioning anyway, to make sure we start normalizing time very quickly. This, in fact, has some advantages and disadvantages. Everything was a trade-off. But the advantage it has over a true time style deployment is you don't even have to wait that seven milliseconds and a number of scenarios you can just instantly respond. So that means you get even lower latencies

Starting point is 00:10:10 in some cases. Of course, the trade off is there are other cases where you have to do more work and therefore more latency. The idea absolutely makes sense. You've started this as an open source project and it's thriving. Who's using it and for what purposes? Okay, so one of the fundamental tenets of building this database, right, I think back to your question of why does the world need another database, is that the hypothesis is not so much the world needs another database API. That's really what users complain against, right? You create a new API and even if it's SQL and you tell people, look, here's a new database, it does everything for you. It'll take them two years to figure out what the hell it does and build an app, and they'll put it

Starting point is 00:10:47 in production, and then they'll build a second and a third. And then by the time they hit the 10th app, they find out, okay, this database cannot do the following things. But you're five years in, you're stuck, you can only add another database. That's really the story of how NoSQL evolved. And it wasn't built as a general purpose database, right? So in the meanwhile, databases like, you know, Postgres, for example, have been around for so long that they like, you know, absorb and have such a large ecosystem and usage and people who know how to use Postgres and so on.

Starting point is 00:11:14 So we made the decision that we're going to keep the database API compatible with known things. So people really know how to use them from the get-go and enhance it at a lower level to make it cloud nativenative, right? So what does YugoByteDB do for people? It is the same as Postgres and Postgres features at the upper half.

Starting point is 00:11:32 It reuses the code, but it is built on the lower half to be shared nothing, scalable, resilient, and geographically distributed. So we're using the public cloud-managed database context. The upper half is built like Amazon Arura. The lower half is built like Google Spam. Now, when you think about workloads that can benefit from this, we're a transactional database that can serve user-facing applications and real-time applications that have lower latency. So the best way to think about it is people that are building transactional applications on top of, say, a database like Postgres, but the application

Starting point is 00:12:03 itself is cloud native, you'd have to do a lot of work to make this Postgres piece be highly available and scalable and replicate data and so on in the cloud. Well, with YugoByteDB, we've done all that work for you, and it's as open source as Postgres. So if you're building a cloud native app on Postgres that's user-facing or transactional, YugoByteDB takes care of making the database layer behave like Postgres, but become cloud native. Do you find that your users are using the same database instance, for lack of a better term? I know that instance is sort of a nebulous term.

Starting point is 00:12:34 We're talking about something that's distributed. But are they having database instances that span multiple cloud providers? Or is that something that is more talk than you're actually seeing in the wild? So I'd probably replace the word instance with cluster, just for clarity, right? So a cluster has a- I concede the point, absolutely. Okay, so we'll still keep Route 53 on top, though, so it's good. At that point, the replication strategy is called a zone transfer,

Starting point is 00:12:58 but that's neither here nor there. Please, by all means, continue. Okay, so a cluster database like Yoga by DB has a number of instances. Now, I think the question is, is it theoretical or real, right? What we're seeing is it is real, and it is real perhaps in slightly different ways than people imagine it to be. Okay. So I'll explain what I mean by that. Now, there's one notion of being multi-cloud where you can imagine there's like, say, the

Starting point is 00:13:21 same cluster that spans multiple different clouds, right? And you have your data being written in one cloud and being read from another, right? This is not a common pattern, although we have had one or two deployments that are attempting to do this. Now, a second deployment shifted once over from there is where you have multiple instances in a single public cloud and a bunch of other instances in a private cloud. So it stretches the database across public and private, right? You'd call this a hybrid deployment topology. That is more common.

Starting point is 00:13:51 So one of the unique things about YogaByteDB is we support asynchronous replication of data, just like your RDBMSs do, the traditional RDBMSs. In fact, we're the only one that straddles both synchronous replication of data as well as asynchronous replication of data. We do both. So once shifted over would be a cluster that's deployed in one of the clouds, but an asynchronous replica of the data going to another cloud. And so you can keep your reads and writes, even though they're a little stale, you can serve it from a different cloud. And then once again, you can make it an on-prem private cloud and another public cloud. And we see all of those deployments. Those are massively common, right? And then the last one over would be the same instance of an app or perhaps even

Starting point is 00:14:30 different applications, some of them running on one public cloud and some of them running on a different public cloud. And you want the same database underneath to have characteristics of scale and failover, right? Like for example, if you built an app on Spanner, what would you do if you went to Amazon and wanted to run it for a different set of users? That is part of the reason I tend to avoid the idea of picking a database that does not have a least theoretical exodus path, because reimagining your entire application's data model in order to migrate is not going to happen. So come hell or high water, you're stuck with something like that where it lives. So even though I'm a big proponent as a best practice, and again, there are exceptions where this does not make sense, but as a general piece of guidance,

Starting point is 00:15:08 I always suggest pick a provider, I don't care which one, and go all in. But that also should be shaded with the nuance of, but also at least have an eye toward theoretically, if you had to leave, consider that if there's a viable alternative. And in some cases, in the early days of Spandau, there really wasn't. So if you needed that functionality, okay, go ahead and use it, but understand the trade-off you're making. That's really what it comes down to from my perspective, understand the trade-offs. But the reason I'm interested in your perspective on this is because you are providing an open source database to people who are actually doing things in the wild. There's not much agenda there in the

Starting point is 00:15:45 same way among a user community of people reporting what they're doing. So you have, in many ways, one of the least biased perspectives on the entire enterprise. Oh, yeah, absolutely. And like I said, I started from the least common to the most common. Maybe I should have gone the other way. But we absolutely see people that want to run the same application stack in multiple different clouds for a variety of reasons. Oh, if you're a SaaS vendor, for example, you say, oh, we're only in this one cloud. Potential customers who are in other clouds say, well, if that changes, we'll give you money. Oh, money! You say other cloud. I thought you said something completely different. Here you go. Yeah, you've got to at some point. But the core of what you do, beyond what it takes to get that

Starting point is 00:16:21 application present somewhere else, you usually keep in your primary cloud provider. Exactly. Yeah, exactly. Crazy things sometimes dictate or have to dictate architectural decisions, right? Like, for example, you're seeing the rise of compliance. Different countries have different regulatory reasons to say, keep my data local or keep some subset of data local. And you simply may not find the right cloud providers present in those countries. You may be a pass or an API provider that's helping other people build applications. And the applications that the API providers customers are running could be across different clouds. And so they would want the data local. Otherwise, the transfer costs would be really high. So a number of reasons dictate or like a large company

Starting point is 00:16:58 may acquire another company that was operating in yet another cloud. Everything else is great, but they're in another cloud. They're not going to say no, because you're operating on another cloud. It still does what they want, but they're in another cloud. They're not going to say no because you're operating on another cloud. It still does what they want, but they still need to be able to have a common base of expertise for their app builders and so on. So a number of things dictate why people start looking at cross-cloud databases

Starting point is 00:17:15 with common performance and operational characteristics and security characteristics, but don't compromise on the feature set, right? That's starting to become super important from our perspective. I think what's most important is the ability to run the database with ease while not compromising on your developer agility or the ability to build your application, right?

Starting point is 00:17:32 That's the most important thing. When you founded the company back in 2016, you are VC-backed. So I imagine your investor pitch meetings must have been something a little bit surreal. They ask hard questions such as, why do you think that in 2016, starting a company to go and sell databases to people is a viable business model? At which point you obviously corrected them and said, oh, you misunderstand. We're building an open source database. We're not charging for it. We're giving it away. And they apparently said, oh, that's more like it. And then invested, as of the time of this recording, over $100 million in your company. Let me be the first to say there are aspects of money that I don't fully understand, and this is one of those. But what is the plan here? How do you wind up

Starting point is 00:18:15 building a business case around effectively giving something away for free? And I want to be clear here. Yugabyte is open source, and I don't have an asterisk next to that. It is not one of those source available licenses, or anyone can do anything they want with it except Amazon, or you're not allowed to host it and offer it as a paid service to other people. So how do you have a business, I guess, is really my question here. You're right, Corey. We're 100% open source under Apache 2.0, I mean, the database.

Starting point is 00:18:44 So our theory on day one, I mean, of course, this was a hard question and people did ask us this. And I'll take you guys back to 2016. It was unclear, even as of 2016, if open source companies were going doing a great job. Do you really need open source to succeed? There were a lot of such questions, right? And every company, every project, every space has to follow its own path, right? Just applying learnings. Like for example, Red Hat was open source and that really succeeded, but there's a number of others that may or may not have succeeded, right? So our plan back then was to tread the waters carefully in the sense we really had to make sure open source was the business model we wanted to go for. So under the advisement from our VCs, we said, we take it slowly. We want open source on day one. We talk to a number of our users and customers and make sure, you know, that is indeed the path we wanted to go. The conversations pretty clearly told us people wanted an open database that was very easy for them to understand.

Starting point is 00:19:39 Because if they are trusting their crown jewels, their most critical data, their systems of record, this is what the business depends on, into a database, they sure as hell want to have some control over it and some transparency as to what goes on, what's planned, what's on the roadmap. Look, if you don't have time, I will hire my people to go build for it. They want it to be able to invest in the database. So open source was absolutely non-negotiable for us. We tried the traditional technique for a

Starting point is 00:20:05 couple of years of keeping a small portion of the features of the database itself closed. So it's what you'd call open core. But on day one, we were pretty clear that the world was headed towards DBaaS, database as a service, and make it really easy to consume. At least the bad patterns as well, like, oh, if you want security, that's a paid feature. No, that is not optional. And the list then of what you can wind up adding as paid versus not gets murky. And you're effectively fighting your community when they try and merge some of those features in. And it just turns into a mess. Exactly. So it did for us for a couple of years. And then we said, look, we're not doing this nonsense. We're just going to make everything open and just make it simple, right?

Starting point is 00:20:41 Because our promise to the users was we're building everything that looks like Postgres, so it's as valuable as Postgres, and it'll work in the cloud, right? And people said, look, Postgres is completely open, and you guys are keeping a few features not open. What gives, right? And so after that, we had to conceive the point and just do that. But one of the other founding theses of the company, the business side, was that DBaaS and ability to consume the database is actually far more critical than whether

Starting point is 00:21:05 the database itself is open source or not, right? I would compare this to, for example, MySQL and Postgres being completely open source, but Amazon's Aurora being actually a big business. And similarly, it happens all over the place. So it is really the ability to consume and run business critical workloads that seem to be more important for our customers and enterprises that paid us. So the day one thesis was, look, the world is headed towards DBaaS. We saw that already happen with inside Facebook. Everybody was automated operations, simplified operations, and so on. But the reality is, we're a startup.

Starting point is 00:21:37 We're a new database. No one's going to trust everything to us, the database, the operations, the data. Hey, why don't we put it on this tiny company? And oh, it's just my most business-critical data, so what could go wrong, right? So we said, we're going to build a version of our DBaaS that is in software. So we call this YugoByte Platform, and it actually understands public clouds. It can spin up machines. It can completely orchestrate software installs, rolling upgrades, turnkey encryption, alerting, the whole nine yards. That's a completely different offering from the database.

Starting point is 00:22:05 It's not the database. It's just on top of the database and helps you run your own private cloud. So effectively, if you install it on your Amazon account or your Google account, it will convert it into what looks like a DynamoDB or a Spanner or what have you with YugaByte as DB as the database inside.

Starting point is 00:22:21 So that is our commercial product. That's source available, right? And that's what we charge for. The that is our commercial product. That's source available, right? And that's what we charge for. The database itself, completely open. Again, the other piece of the thinking is if we ever charge too much, our customers have the option to say, look, I don't want your DBAS plan. I'll go to the open source database and we're fine with that. So we really want to charge for value. And obviously we have a completely managed version of our database as well. So we reuse this platform for our managed

Starting point is 00:22:45 version. So you can kind of think of it as portability, not just of the database, but also of the control plane, the D-pass plane. They can run it themselves. We can run it for them. They can take it to a different cloud, so on and so forth. I like that monetization model a lot better than a couple of others. I mean, let's be clear here. You've spent a lot of time developing some of these concepts for the industry when you were at Facebook. And because it's Facebook, the other monetization models are kind of terrifying. Like, okay, we're going to just monetize the data you store in the open source database is terrifying.

Starting point is 00:23:13 Only slight less would be the Google approach of, ah, every time you wind up running a SQL query, we're going to insert ads. So I like the model of being able to offer features that only folks who already have expensive problems with money to burn on those problems to solve them will gravitate towards. You're not disadvantaging the community or the small startup who wants it but can't afford it. I like that model. Actually, the funny thing is we are seeing a lot of startups also consume our product

Starting point is 00:23:40 a lot. And the reason is because we only charge for the value we bring, right? Typically, the problems that a startup faces are actually much simpler than the complex requirements of an enterprise at scale, right? They are different. So the value is also proportional to what they want and how much they want to consume, and that takes care of itself. So for us, we see that startups, equally so as enterprises, have only limited amount of bandwidth. They don't really want to spend time on operationalizing the database, especially if they have an out to say, look, tomorrow this gets expensive. I can actually put in the time and money to move out and go run this myself.

Starting point is 00:24:15 Why don't I just get started because the budget seems fine and I couldn't have done it better myself anyway because I'd have to put people on it and that's more expensive at this point. So it doesn't change the fundamentals of the model. I just want to point out both sides are actually gravitating to this model. This episode is sponsored in part by our friends at Jellyfish. So you're sitting in your office chair,

Starting point is 00:24:36 bleary-eyed, parked in front of a PowerPoint, and oh, my sweet feathery Jesus, it's the night before the board meeting because of course it is. As you slot that crappy screenshot of traffic light colored Excel tables into your deck or sift through endless spreadsheets looking for just the right data set, have you ever wondered why is it that sales and marketing get all this shiny, awesome analytics and insight tools, whereas engineering basically

Starting point is 00:25:01 gets left with the dregs? Well, the founders of Jellyfish certainly did. That's why they created the Jellyfish Engineering Management Platform, but don't you dare call it JEMP. Designed to make it simple to analyze your engineering organization, Jellyfish ingests signals from your tech stack, including JIRA, Git, and collaborative tools. Yes, depressing to think of those things as your tech stack, but this is 2021. And they use that to create a model that accurately reflects just how the breakdown of engineering work aligns with your wider business objectives. In other words, it translates from code into spreadsheet.

Starting point is 00:25:40 When you have to explain what you're doing from an engineering perspective to people whose primary IDE is Microsoft PowerPoint, consider Jellyfish. That's jellyfish.co and tell them Corey sent you. Watch for the wince. That's my favorite part. companies prefer open source databases, and this is waved around as a banner of victory by a lot of, well, let's be honest, open source database companies. I posit that that is in fact crap and also bad data because what the open source purists of which I admit I used to be one, and now I solve business problems instead, believe that people are talking about freedom and choice and the rest. In practice, in my experience, what people are really distilling that down to is they don't want a commercial database. And it's not even about they're not willing to pay money for it, but they don't want to have a per core licensing challenge or even having to track licensing of where it is installed and how,

Starting point is 00:26:40 and wind up having to cut checks for folks. For example, I'm going to dunk on someone because why not? Azure, for a while, has had this campaign that it is five times cheaper to run some Microsoft SQL workloads in Azure than it is on AWS. As if this was some magic engineering feat of strength or something. It's absolutely not. It's that it is really expensive licensing-wise to run it on things that aren't Azure. And that doesn't make customers feel good. That's the thing they want to get away from. And what open source license it is, and in many cases, until the source available stuff starts trending towards, oh, you're going to pay us,

Starting point is 00:27:14 or you're not going to run it at all, that scares the living hell out of people, then they don't actually care about it being open. So at the risk of alienating, I'm sure, some of the more vocal parts of your constituency, where do you fall on that? We are completely open, but for a few reasons, right? Like multiple different reasons. The debate of whether it purely is open or is completely permissible to me, I tend to think a little more where people care about the openness more so than just the ability to consume at will without worrying about the license, but for a few different reasons, right? And it depends on which segment of the market you look at. If you're talking about small and medium businesses and startups, you're

Starting point is 00:27:53 absolutely right. It doesn't matter. But if you're looking at larger companies, they actually care that, for example, if they want a feature, they are able to control their destiny because you don't want to be half wedded to a database that cannot solve everything, especially when the time pressure comes or you need to do something. So you want to be able to control or to influence the roadmap of the project.

Starting point is 00:28:14 You want to know how the product is built, the good and the bad. You want a lot of people testing the product and the feedback to come out in the open so you at least know what's wrong. Many times people often feel like, hey, you know, my product doesn't work in these areas. It's actually a bad thing, right?

Starting point is 00:28:28 It's actually a good thing because at least those people won't try it and it'll be safe, right? Like customer satisfaction is more important than just the apparent whatever it is that you want to project about the product, right? At least that's what I've learned in all these years working with databases.

Starting point is 00:28:40 But there's a number of reasons why open source is actually good. There's also a very subtle reason that people may not understand, which is that, like, you know, legal teams, engineering teams that want to build products don't want to get caught up in a legal review that takes many months to really make sure, look, this may be a unique version of a license, but it's not a license the legal team has seen before. And there's going to be a back and forth for many months, and it's just going to derail their product and their timelines, not because the database didn't do its job or because the team wasn't ready, but because the

Starting point is 00:29:07 company doesn't know what the risk it will face in the future is. There's a number of these aspects where open source starts to matter for real. I'm not a purist, I would say. I'm a pragmatist, like so I've always been. But I would say that a number of reasons why, you know, I might be sounding like a purist, but a number of reasons why true open source is actually useful, right? And at the end of the day, if we've already established, at least at Bugabite, we're pretty clear about that. The value is in the consumption and is not in the tech, right? Like if you're pretty clear about that, because if you want to run a tier two workload or

Starting point is 00:29:38 a hobbyist app at home, would you want to pay for a database? Probably not. I just want to do something for a while and then shut it down and go do my thing, right? I don't care if the database is commercial or open source. In that case, being open source doesn't really take away. But if you're a large company betting, it does take away, right? Oh, it goes beyond that, because it's not even in the large company story whether it costs money, because regardless, I assure you, open source is not free. The most expensive thing that we see in all of our customer accounts, Again, our consultancy fixes

Starting point is 00:30:05 AWS bills, an expensive problem that hits everyone. The environment in AWS is always less expensive than the people who are working on the environment. Payroll is an expense that dwarfs the AWS bill for anyone that is not a tiny startup that is still not paying a market rate salary to its founders. It doesn't work that way. And the idea for those folks is not about the money. It's about the predictability. And if there's a 5x price hike from their database vendor, that suddenly completely disrupts their unit economic model, and they're in trouble.

Starting point is 00:30:38 That's the value of open source, in that it can go anywhere. It's a form of not being locked into any vendor where it's hosted, as well as no one company that has put it out there into the world. Yeah, and the source available license, right, we considered that also. The reason to vote against that was you can get into scenarios where the company gets competitive with its open source site, where the open source wants a couple of other features to really make it work for their own use case, like, you know, case in point, this is a startup, but the company wants to hold those features for the commercial side. And now the startup has that 5X price jump anyway. So at this point, it comes to a head-on

Starting point is 00:31:13 where the company or the startup is being charged, not for value, but because of the monetization model or the business model, right? So we said, you know what, the best way to do this is to truly compete against open source. And if someone wants to operationalize the database, great, but we've already done it for you. If you think that you can operationalize it at a lower cost than what we've done, great, that's fine. I have to ask, there has to have been a question somewhere along the way during the investment process of, well, what if AWS moves into your market? And I can already say part of the problem with that line of reasoning is, okay, let's assume that AWS turns YugoByte into a managed database offering. First, they're not going to be able to articulate for crap why you should use that over anything else,

Starting point is 00:31:53 because they tend to mumble when it comes time to explain what it is that they do. But it has to be perceived as a competitive threat. How do you think about that? Yeah, this absolutely came up quite a bit. And like I said, in 2016, this wasn't news back then. This is something that was happening in the world already. So I'll give you a couple of different points of view on this. The reason why AWS got so successful in building a cloud is not because they wanted to get into the database space. They simply wanted their cloud to be super successful and it required value-added services like these databases, right? Now, every time a new technology shift happens, right, it gives some set of people an unfair advantage, right?

Starting point is 00:32:28 In this case, database vendors probably didn't recognize how important the cloud was and how important it was to build a first-class experience on the cloud on day one, right, as the cloud came up because it wasn't proven and they had 20 other things to do, and it's rightfully so. Now, AWS comes up and they're trying to prove a point that the cloud is really useful and absolutely valuable for their customers. And so they start putting value-added services. And now suddenly you're in this open source battle, right? At least that's

Starting point is 00:32:51 how I would view that it kind of developed. With UgoByte, obviously, the cloud's already here, we know on day one, so we're kind of putting out our managed service. So we'd be as good as AWS or better. The database has its value, but the managed service has its own value. And so we'd want to make sure we provide at least as much value as AWS, but on any cloud, right anywhere. So that's the other part. And we also talked about the mobility of the DBaaS itself, so moving it to your private account and running the same thing, as well as for public, right? So these are some of the things that we have built, that we believe makes us super valuable. It's a better approach than a lot of your predecessor companies who decided,

Starting point is 00:33:26 oh, well, we built the thing. Obviously, we're going to be the best at running it in the end because they dramatically sold AWS's operational excellence short. And it turns out they're very good at running things at scale. So that's a challenging thing to beat them on. And even if you're able to, it's hard to differentiate among the differences because at that caliber of operational rigor, it's one of those you can only tell in the very niche cases. It's a hard thing to

Starting point is 00:33:52 differentiate on. I like your approach a lot better. Before we go, I have one last question for you. And normally it's one of those positive, uplifting ones of what workloads are best for YugaByte, but I think that's boring. Let's be more cynical and negative. What workloads are best for YugaByte? But I think that's boring. Let's be more cynical and negative. What workloads would run like absolute crap on YugaByteDB? Okay. We do have a thing for this because we don't want to take on workloads and, you know, everybody have a bad experience around. So we're a transactional database built for user-facing applications, real-time and so

Starting point is 00:34:21 on, right? We're not good at warehousing and analytic workloads. So like, for example, if you were using a Snowflake or a Redshift, those workloads are not going to work very well on top of YugoBot. Now, we do work with other external systems like, you know, Spark and Presto, which are like real-time analytic systems, but they translate the queries that the end user have into a more operational type of query pattern. However, if you're using it straight up for analytics, we're not a good bet. Similarly, there are cases where people want very high number of IOPS by using a cache or even a persistent cache.

Starting point is 00:34:54 You know, Amazon just came out with a number of persistent cache that does very high throughput and low latency serving. We're not good at that. We can do reasonably low latency serving and reasonably high IOPS and scale, but we're not the use case where you want to hit that same lookup over and over and over millions of times in a second. That's not the use case for us. And the third thing I'd say is we are a system of record. So people care about the data they put and they absolutely don't want to lose it and they want to show that it's transactional. So if there's a workload where

Starting point is 00:35:22 there's a lot of data and you're okay if you want to lose and it's just some sensor data and your reasoning is like, okay, if I lose a few data points, it's fine. I mean, you could still use us, but you know, at that point you'd really have to be a fan boy or something for, for YugoByte. I mean, there's other, other databases that probably do it better. Yeah. That's the problem is whenever someone says, oh yeah, database or any tool that they've built, like this is great.

Starting point is 00:35:43 What workloads is it not a fit for? And their answer is, oh, nothing. It's perfect for everything. Yeah, I want to believe you, but my inner bullshit sense is tingling on that one because nothing's fit for all purposes. It doesn't work that way. Honestly, this is going to be, I guess, heresy in the engineering world, but even computers aren't always the right answer for things. Who knew? As a founder, I struggled with this answer a lot initially. I think the problem is when you're thinking about a problem space, that's all you're thinking about. You don't know what other problem spaces exist. And when you are asked the question, what workloads is it a fit for? At least I used to

Starting point is 00:36:16 say initially everything, because I'm only thinking about that problem space as the world, and it's fit for everything in that problem space, except I don't know how to articulate the problem space. Right. And at some point too, you get so locked into one particular way of thinking as the world and it's fit for everything in that problem space, except I don't know how to articulate the problem space. Right. And at some point too, you get so locked into one particular way of thinking of the world that people ask about other cases. Oh, that wouldn't count. And then your follow-up question is, wait, what's a bank? And it becomes a different story. It's how do you wind up reasoning about these things? I want to thank you for taking all the time you have today to speak with me. If people want to learn more about Yugabyte, either the company or the DB, how can they do that? Yeah, thank you as well for having me. I think to learn about Yugabyte, just come join our community Slack channel. There's a lot of people. There's like

Starting point is 00:36:54 over 3,000 people. They're all talking interesting questions. There's a lot of interesting chatter on there. So that's one way. We have a industry-wide event. It's called the Distributed SQL Summit. It's coming up September 23rd, 22nd, 23rd, I think a couple of days. It's a two-wide event. It's called the Distributed SQL Summit. It's coming up September 23rd, 22nd, 23rd, I think a couple of days. It's a two-day event. That would be a great place to actually learn

Starting point is 00:37:10 from practitioners and people building applications and people in the general space and as adjacencies. And it's not necessarily just about Yoga Byte, right? It's generally about distributed SQL databases

Starting point is 00:37:18 in general, right? Hence, it's called the Distributed SQL Summit. And you can ask us on Twitter or any of the usual social channels as well, right? So we love interaction. So we are a pretty open and transparent company.

Starting point is 00:37:28 We'd love to talk to you guys. Well, thank you so much for taking the time to speak with me. We'll, of course, throw links to that into the show notes. Thank you again. Awesome. Thanks a lot for having me, Corey. It was really fun. Thank you.

Starting point is 00:37:39 Likewise. Karthik Ranganathan, CTO and co-founder of YugaByteDB. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment halfway through, realizing that I'm not charging you anything for this podcast and converting the angry comment into a term sheet for a $100 million investment.

Starting point is 00:38:10 If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started. this has been a humble pod production stay humble

Screaming in the Cloud - Yugabyte and Database Innovations with Karthik Ranganathan

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.