Software Huddle - Faster & Cheaper on PlanetScale Metal with Sam Lambert

Starting point is 00:00:00 First of all, you learn how creatively people can insult you on Twitter. Pretty funny, most of them actually, even the bad ones. But in seriousness, I get about email every day about what people wanting a free version of Cloud at scale. Still. Okay. Yeah. Yeah.

Starting point is 00:00:20 And I think it's extremely unlikely we'd ever do it again. What about just like the economics? Is it hard to make the economics work when you're paying so much infra to AWS? The thing that has been amazing for our customers is with Planescale Managed, it runs inside their account. So if you're a significant Amazon customer, you get to negotiate incredible commits with saving plans against these machines and just save extreme amounts. One database on Planescale, its daily operational cost went down by $20,000. You have really lived it in a super interesting way. What's up, everybody? This is Alex. I'm really excited about the show today because we have Sam Lambert back and he's the CEO at

Starting point is 00:00:58 PlanetScale. He was the first guest I ever had on the show. I just think he's got a really good, interesting opinion on database stuff. He sees a lot of database stuff on the show. And I just think he's got like a really good, interesting opinion on database stuff. Like he sees a lot of database stuff all the time. He knows like what's real, what's not real, what big companies actually need. So we talk about a lot of that. We talk about PlanetScale Meadow, which is a cool new release they have today,

Starting point is 00:01:16 which I think is really interesting. Some cool engineering stuff and like things that they're able to do that a lot of people aren't able to do. So I think it's really fascinating to go through that. So check it out. If you have any guests you want on, if you have any questions, things like that,

Starting point is 00:01:30 feel free to reach out to me or to Sean. But with that, let's get to the show. Sam, welcome to the show. Thank you. Thank you for having me again. Yeah, absolutely. And I should say welcome back because yeah, you were my very first guest on the show

Starting point is 00:01:42 and I love that conversation. And I think you're the first person that I've had back as well. So honor on both accounts. And yeah, I mean, you're just like one of my favorite people in the space. I love chatting with you last time and like your good Twitter follow where I feel like

Starting point is 00:01:57 in the database space, there's a lot of hype. There can be a lot of hype type stuff. And I think you tell like it is and kind of cut through the noise a lot in a great way. So I'm excited to have you on. I guess maybe for people that don't know you, maybe get a little background on you and PlanetScale and all that.

Starting point is 00:02:14 I didn't realize it was your first guest, first of all. You were my first guest, yeah. I feel so honored. Wow, wow, that's amazing. Yeah, no, I mean, I love the show and your audience specifically, it seems like a really great bunch. And again, same, not following on to it. Yeah, no, I mean, I love the show and your audience specifically, it seems like a really great bunch. And again, same, the following on today.

Starting point is 00:02:27 Yeah, so my name is Sam. I'm the CEO of a company called PlanetScale. We specialize in kind of building the world's most scalable performant cloud database. Lucky enough to support some of the world's largest consumer brands, ranging from Block with Cash App and Blizzard and all these really cool companies building cool things.

Starting point is 00:02:49 It's very fun to kind of exist in the world and know that the things you're using, the products you're using every day kind of, use your products, it's very, very cool. Before that, I was at Facebook running the traffic production engineering team, which is a very small group of people responsible for about 12% of the internet's traffic. And then, yeah, it's amazing still.

Starting point is 00:03:12 And then before that, I was VP of engineering at GitHub, a small little code hosting website that happened to have every developer on planet Earth using it. And yeah, before that, I kind of worked on a bunch of database systems at scale doing various different things. That's kind of how I learned to do what I'm doing now and went to GitHub to do the same stuff. Yeah, I've seen database problems at every company I've ever worked at and, you know, at all of our customers and it's good. It's fun to be building a company that kind of scratches the itch of paying down all those problems so that people don't have to run into them in the future. Yeah, yeah, for sure. I mean, I love learning about and reading about databases and all this sort of stuff. Like you have really lived it in like a super interesting way from GitHub to Facebook to now PlanetScale. And like you're saying all these huge companies that run on PlanetScale. Like, really, I imagine you just see some very interesting stuff

Starting point is 00:04:05 day to day, which is a lot of fun. You do. The database space is actually really strange in the sense that it's meant to be boring. And it's actually probably so boring that people get bored and then do ridiculous things with databases. And you're kind of like, yeah, we can't do those things. And nor would we. So yeah, it's a kind of, it's a

Starting point is 00:04:27 very interesting space to be in to try and do something dynamic and new, but also there's the rules. Right? You know, I was, I was kind of likening this to running an airline the other day, which is no one wants creativity kind of like in the engines and all of the bits, you know, you have to, you have to build something that operates and works at the worst of times and predictably, right? Like a 737 can land pretty much anywhere in the world and there's mechanics on call at any point to fix any single issue. There's spare parts, there's everything, you know, we try and operate that way. That's Ed. You know, you can have some fun with the new kind of business class interior and make those things nice.

Starting point is 00:05:05 As long as the fundamental rules stay kind of respected and it works, you can build something kind of fun and dynamic that has real customer impact. Yep. Yep. And that was like one thing I remember us talking about last time too, is like, Hey, the test is super interesting and like built for this normal scale. But then like PlanetScale also has this focus on developer productivity and utility that way. And that's where a lot of some of your interesting

Starting point is 00:05:32 innovation happens of just making that stuff easier while still giving that rock solid reliability from the database aspects itself. Yeah. Yeah, if you keep the boring stuff boring, I think that's the actual essence of developer experience, which is if the database is like really scary, and we come to companies where they haven't done a schema change for months, because of the last time it happened. I remember at GitHub, the users table was un-migrate, like you just couldn't do a browser

Starting point is 00:05:58 migration against it without taking down the website because you know, everything joined on it was, you know, before we started functionally partitioning out. And it just slows development down. So we just kind of created other tables to join on because it's the only thing to do. And we run into companies like they're in this situation all of the time. And it's nice to go in and get them kind of developing faster and shipping again. Again, it just comes down to when the boring stuff stays boring, or you can make it as boring as possible, and you embrace the eventuality of failure, you can then do some fun stuff around it and make it usable and kind of highly dynamic.

Starting point is 00:06:32 Yep. That's awesome. Well, like sort of on that note, like I want to talk about a lot of database stuff and sort of current events and what's happening and get your take on the database space. But let's start off with PlanetScale Metal. So you all have this new announcement today. Tell me about Metal and I have like a bunch of follow-up questions on how this is all working.

Starting point is 00:06:51 Yeah, so Metal is like a fundamental step change in performance and databases in the cloud and the cost profile associated with them. And I can unpack that a little bit. You know, we've always we always have run an AWS and Google, we have to write like, you know, it's really where the only serious cloud customers are if you if you can't be one hop from the app layer, like, it's just not. We run inside those clouds and we also run inside VPCs inside people's accounts. So it's a managed service. Nobody has to configure or use it. We wake up, we take the pager if anything goes wrong, but it still lives within someone's

Starting point is 00:07:37 cloud account. It runs on EC2 and it means you get this highly scalable database inside your existing infrastructure. And the thing that makes Metal really revolutionary is database companies are either doing one of the two. They're either hosting in like an Equinix kind of self-hosted situation where they have their own hardware that they rack. And that's kind of a few database companies have started doing this now. It's really cool because you get great performance. You get like NVMe straight to the CPU through the motherboard and PCI buses and whatnot. So it's awesome. However, you're in your own data center.

Starting point is 00:08:17 It's you kind of you're not your multiple hops from where Lambda is and S3 is, no one should be building their own Lambda or S3, right? Like it's just, you shouldn't. We use those essential services for the things we need to make boring. Like you can guess which service we put our backups on. Right? I imagine it's going to be S3. Yep. Oh yeah, of course. Absolutely. Yep. Yep. And we, we actually, we're gonna blog about this and we have our new backup system is incredibly cool. Like, you know, paralyzed backups, we can restore at line rate of the machine.

Starting point is 00:08:51 It's very, very cool. Anyway, I could go on forever about that stuff. Anyway, so you're either like out in your own data center. Yeah. And let me stop you there. I didn't know people were doing it. Like which database companies are running in their own? Prisma just announced a product where they're running.

Starting point is 00:09:06 And they're doing some really cool stuff with like micro kernels and very fast booting performance postgres, which is really, really cool. Issue being it's gonna be hops away from applications, which is gonna cause latency, but I think for sort of certain workloads, that could be fine.

Starting point is 00:09:21 Or you're running in the cloud, and this is where the major problem comes from, is when you run in the cloud, everything's true, like ephemeral, unless you're paying for it not to be ephemeral, and you're buying convenience from the cloud. So if you want a bare metal server in AWS, they exist. They're very cheap, very, very cheap, and extremely fast. However, if you terminate one, it's gone forever. Back in the data center, if you screw up, you have a raid controller. You have raid. You can physically get hands-on.

Starting point is 00:09:50 Like a GitHub, we ran in the data center. Although we wanted to be highly available, we had the safety of knowing that truly, truly, if something goes wrong, someone can pull out disks, put them into another blade and we're able to rebuild and get that data back even if it's a nasty downtime. And so there's these beautiful ephemeral machines inside the cloud that have incredible performance profiles and no one uses them for databases. In fact, we told Amazon we were doing this, and it was just utter disbelief that we would run databases on that. There's a reason we can do this in a way that no one else can, and it's that Vitesse and

Starting point is 00:10:32 our core technologies go on board, which is the Google predecessor to Google Oddities. It's fully ephemeral. There's just no assumption that you'll ever see the drives or the disks again. So Vitesse is durable and high-performing on ephemeral nodes and is extremely resilient, meaning we can basically buy the same servers Amazon buys for their services and build a full software stack. They give us like, it is file cracker, I think, yeah, but just a light kind of, you know, which really is for their convenience too, just to give you this like base operating system image

Starting point is 00:11:11 and everything else on top is built by us, which is highly, highly unusual. Everyone else normally has to do the separation of storage and compute and use EBS or their own like separated storage layer, which is fundamentally slow. And that's actually where the big step change comes with Metal, which is we're going back to how computers are meant to work, which is IOPS happen inside the machine rather than... We can really dig into why separating storage and compute is slow

Starting point is 00:11:45 and what it's good for. Like it's great for a certain type of database workload, not the ones that we care about. But yeah, so Planescale Meadow, it's running, it's extremely fast. We've announced it as having unlimited IO because you literally cannot exhaust the IO on these boxes. You run out of CPU away, way sooner.

Starting point is 00:12:06 In the announcement, you've seen just incredible graphs, just customer after customer after customer, just P99 just falls off a cliff, and they save a ton of money. It's a big win. Yeah, okay. Okay, so I wanna back up a little bit. So you talk about separation of storage and compute,

Starting point is 00:12:21 and it's like at a different layer than like what we usually talk about, I feel like, with databases and separations of storage and compute. And it's like at a different layer than like what we usually talk about, I feel like with databases and separations of storage and compute. Cause I think of that like Aurora or a neon or something like that, where you like, it's like different services sort of running in different places. But this is actually saying like, you know, usually when you're spinning up a database RDS or something

Starting point is 00:12:40 like that, you also have this attached EBS volume and, and data is traveling like over the network to that EBS. And like you're saying, that adds latency itself and things like that. Whereas with PlanetScale Metal, you're spinning up an EC2 instance that has the NBME SSDs attached to them locally. It's the instance-based storage. And you're reading and writing to that

Starting point is 00:13:01 rather than over the network to EBS. Correct. Correct. And both forms of separation and storage in compute is slow for OLTP. and you're reading and writing to that rather than over the network to EBS. Correct, correct. And both forms of separation, storage and compute is slow for OLTP. Like just leaving the server to do a page read is slow. Aurora and Neon, they do it by separating

Starting point is 00:13:16 kind of the query engine, right? That takes the query and then they do IO to their own storage layer, which is again across the network. Aurora does it. The Aurora paper is an exceptional paper. It's one of my favorites. I recommend everyone to read and they talk about how they have done that. And it's very performant for the goals they had, but it's still nowhere nearest performance in metal. We've moved to significant, this is the first, I'll tell you the story of how we should relaunch this later

Starting point is 00:13:47 if you're interested, but interested like the behind the scenes, but we basically, we have not seen an Aurora workload yet move that hasn't become faster, which is then significantly faster as well. And it's because we cut out just an immense amount of variability and entropy, right? If you leave a machine, you go out of the network stack,

Starting point is 00:14:06 even if it's like InfiniBand, they've done all of the optimizations they can. They use really proprietary, cool, but whatever. You're going out of your local operating system across a network where you're gonna hit load balancers, top of rack switches, rebalancing jobs. Like fundamentally, entropy has gone way up and just the databases are so incredibly chatty with disks like block reads, but also background threads, loading pools, updating cat. There's

Starting point is 00:14:40 just millions of things happening all at once inside a database. Doing them over a network, you just pay a constant latency penalty that is extremely bad for LTP workloads specifically. Yeah. Yeah. Interesting. And so you mentioned like that split between like, you know, the people that are running Equinox data centers versus referring on cloud, I guess, like for for enterprises that are moving from having their own data center somewhere else to AWS, are they often seeing like, hey, pretty significant latency hits just because of now they got the EBS tax sort of? Massively, like massively. And it's not just latency as cost as well. It's just extremely expensive to run.

Starting point is 00:15:25 We had a customer that did basically that and saw, first of all, a huge latency hit for their application. They also had to then move to I02 volumes to get over the lack of reliability for GP3. And the 16,001 I upon the I02 is like $2,000 more than the previous one. It's just unbelievably expensive and very failure prone. We had an AWS issue where we saw multiple customers across multiple accounts get a blip, but tested the right thing handled throughout the nodes that had access to those volumes.

Starting point is 00:16:05 But because we've got about half a million EBS volumes provisioned at any one point, we could see across multiple customers some issue that we eventually tracked down to being a top of rack issue for EBS. And so not only are we dealing with these strange behavior patterns that don't exist in the data center failures become partial dead node is extremely easy to recover you just like don't you throw away and you get anyone actually a planet scale in our fundamental part of our shared nothing architecture is. Everything to fix a shadow fix a cluster should always converge and just keep being able to kill nodes. Like we don't try and kind of really get too introspective of what's going on with the node because of the way our architecture is. Just kill it and then the test will always do the right thing to converge back to a sane and highly available cluster state. But when nodes kind of slow down and and if your readers really

Starting point is 00:17:06 want to entertain themselves, go and read the SLA docs VBS, which only guarantees acceptable performance for 90% of the day. So when nodes start to slow down, databases do really weird things which cascades up the stack and becomes really painful. It's just nice to have the node just fully die. These metal nodes we, you know, these metal nodes or data center nodes usually just in a very kind of binary fashion just hit the floor very quickly. And that's great. Yeah. Yeah.

Starting point is 00:17:34 And so tell me about that because you talk about killing the node but also like you don't have that background of EBS like where that volume is still available. So I guess what are you doing to ensure, right? How does the test handle just durability where you can shoot that node? It has the instance test storage.

Starting point is 00:17:50 That storage is now gone. Like, what do I have here? Yeah, so we basically always make sure there's three replicas for every shard, always available to take writes and we do semi-sync replication. So when we acknowledge a write, it has made it to another box.

Starting point is 00:18:09 It's not a same as like these kind of full quorum write systems that are also again, very slow. It's just enough to make sure that you get data off of the box and that it's safe and has made it to at least one node. And then there's just millions of lines of code in Vitesse that make orchestration incredibly easy and seamless to bring nodes back. And so that's the actual only real trade-off of moving away from EBS is EBS makes it really nice to just detach a volley, spin up a new pod and like just

Starting point is 00:18:46 reattach. So the way we do upgrades and we kill every single node at pilot scale, the longest a node can live at pilot scale is 29 days, and we do that by getting rid of the pod, bringing a new one, reattach the volume. Now we have to bring up a new node, restore from backup, bring it back into the replication pool. That is still a very quick process because of these machines. And because of our new backup system,

Starting point is 00:19:11 which does full parallelization, we use the full NIC, saturate the entire NIC of the box, bring it back online, and then we're rolling. So it's very mature and it fails in the traditional way all these databases should fail. It's very easy to reason around. And because this code path has run hundreds of millions of times, probably at scale. Yeah, it's very mature. I should also mention that today, to date, Metal has been online, you know, and you can see in our post that companies like Block for

Starting point is 00:19:47 Cash App, Intercom, and has served around 5 trillion queries across the 5 petabytes of data, which for relational databases is obviously a very large amount of data. So it's already well-worn battle tested. Yeah, and we're actually upping our SLA now. We rely on, you know, our committed contractual SLA now has gone up to four nines, which exceeds all of our competitors. And then for multi-region, we'll now commit to five nines. And that's because we're not relying on code

Starting point is 00:20:18 we have not written basically to provide the critical path. Yep, yep. Okay. Wow, that's super interesting. So basically, as I understand it, like a lot of people use EBS because of that durability, like EBS is going to replicate, of the critical path. Yep, yep, okay. Wow, that's super interesting. So basically as I understand it, a lot of people use EBS because of that durability. EBS is gonna replicate, I believe, a second time within the same AZ or something like that.

Starting point is 00:20:32 But you are already replicating to at least one other box somewhere. So it's like, hey, we don't need that EBS durability quite as much. We have our own durability there. One of my questions was, you talked about bringing up a new node. If I have a 3 terabyte shard and one of those replicas fails, how long does that take to bring up a 3 terabyte

Starting point is 00:20:54 node or something like that? What does that look like? It's as fast as, like I said, we go across the neck. I can't do the math in my head. It also depends how many changes have happened in the time from the backup to... Gotcha. So it's like backup plus change along type stuff. Yeah. To catch up. But it's very quick. It's like, it's acceptably quick considering it, you're not relying on it to bring you back online. Like you've, this is a scenario. We did this modeling, which is based on all of our years and years of failure rates and the bugs we've encountered that we'd even just shut a shard down, let alone ever risk data.

Starting point is 00:21:47 So it's just an incredibly low chance that this could ever happen. And even then, we still stream the logs elsewhere anyway, so we can always recover you back. It's essentially… The reason it's revolutionary is because you previously had to make a flexibility trade-off. You either had flexibility thanks to EBS because you couldn't trust your own. Most database companies or startups didn't have Google building your tech for YouTube like the test was. That's like 100 million R&D straight there.

Starting point is 00:22:24 Supporting one of the largest websites. like for test was, so that's like 100 more of R&D straight there. Supporting one of the largest websites, we've added another 100 million on top in terms of our own R&D. So now we have something extremely mature. If you're starting a database startup from day one, you pick your battles, right? You're going to use EBS or you're going to use your own kind of distributed file system to get yourself that flexibility back and enable certain things. The cool thing about Metal is we really don't make you make a flexibility trade-off. It feels exactly the same, but you get data center performance right inside the cloud. It's very, very special in that regard.

Starting point is 00:22:58 Yeah. And so will this be the recommended setup for all plan scale users going forward or is it more a certain class? I know all the examples you have are huge users. Is it more for them? Is it great for everyone? How does that break down? Yeah, that's a really great question. The general rule so far, and one of our companies we've moved that should have a video going live today, they are relatively

Starting point is 00:23:31 small. They're not sharded, but they're seeing some success. They're small in terms of the planet scale corpus of databases, but still very firmly a small startup. They have three employees. Metal has immediately impacted them. It's had a really positive impact. If you are tiny, just doing a tiny volume of mostly reads, we're going to put you on our lowest tier. Until we start chopping up Metal nodes for multi-tenant, it's like smaller PS10 type nodes, you're gonna be on EBS.

Starting point is 00:24:09 And that's fine for most, for a lot of people at very, very low volumes. By the time you start to get to spending five, 600 a month, metal starts to take over as being the option. Basically, nearly everyone that comes through like our contact us form as a sales served customer, we just put them straight on metal because it's gonna be cheaper and it's gonna be faster

Starting point is 00:24:32 than pretty much anything they're running on. And the general rule, if you're running on Aurora or Dynamo and spending more than around a thousand a month, you're probably gonna save money and get faster by moving to power scale. Unless you've got some really weird like, well, we just barely do any CPU but need to store 30 terabytes of data, you can like put a Pico node, attach it to a giant EBS volume.

Starting point is 00:24:59 That's very cool. It's just extremely niche. And so most people doing anything serious, spending about a grand amount of money on Aurora, they're probably going to almost definitely get faster and probably going to save some cash too. Gotcha. Gotcha. And are there any other, I guess, changes or requirements or anything to my topology or in settings? You all do semi-sync replication always anyway. Like, do you have even the option to do like full async with the test? Or do you do... Yeah. No. MySQL doesn't support fully asynchronous replication. And even if it did, we would never

Starting point is 00:25:35 turn it on. It's been completely impractical at scale. Like, it just doesn't. It makes everything slow. And it's just not necessary. We all kind of live on the spectrum from blockchain to... I could be mean and say Mongo, but they have fixed all those problems. But you know what I mean, blockchain to Redis or Memcached, you pick your trade-offs. I think Postgres and MySQL have got it exactly right in the middle, which is it's durable. You're not going to lose data, but it's also just not making crazy consistency guarantees that become very slow and extremely hard to debug. Yep, gotcha.

Starting point is 00:26:11 And also, what you were saying earlier about the modeling and how often failures happen in recovery, it's not like you need to put more replicas in a group or something like that if you're going to be on metal. It's still going to be three replicas, and that'll work. There's no less durable way to run planet scale. We will not let you run in a... I actually get a lot of requests for this

Starting point is 00:26:30 and maybe, maybe, maybe one day for the tiniest, but people want single node planet scale and we just don't ship that. That's the cool thing is, and that's the whole promise of what we do, is we know this stuff inside out. We're a very small company that has 660 years of combined experience running databases at scale. We default in the right things.

Starting point is 00:26:54 And, you know, sometimes we run across people on Aurora or RDS and we have to tell them, you know, you are a node failure from data loss. Do you know this? And they're like, no. We just click the, it's very expensive. And we click the buttons. And we assumed that we were being protected. It's not the case. PlanScale makes sure that you are in a highly available state and that it does the right thing pretty much all of the time.

Starting point is 00:27:18 Yeah. OK, one thing I want to go back to, you talked about IOPs, the provisioned IOPs, being expensive. And some of the charts I saw, I was surprised to see how much the cost of running a database was the provisioned IOPs, not the compute, not even the storage itself of EBS, but just the provisioned IOPs.

Starting point is 00:27:37 I was shocked. Is that pretty common? Is that one of those weird spaces that cost a ton of money and a lot of people don't talk about? Or what's going on there? Yes, we see this a lot and it's just disgustingly expensive to run these, you know, 16,000 ops is not even that much. But, you know, that's all you're getting before you start paying extreme amounts for anything above that. We're talking about each node, each metal node being able to do from starting around 250,000 IOPS up to millions for specific node types.

Starting point is 00:28:09 We have certain clusters now that are provisioned with 40 million IOPS across the node pool. Like it's just unbelievable. We have one customer that they're going to write a blog post soon. They save 70% on their bill and sell like 100x the amount of IOPS available to them to read and write from. We had a customer have a large one have an outage of a service that isn't backed by Planet Scale, but when it came back on, it hit the database that does run on planet scale. And it hit us with an additional unexpected 700,000 additional IOPS and we just tanked it. They didn't even like, no scaling up, nothing. Just like metal just ate it. And they just jumped in the chat and were like, did you

Starting point is 00:29:00 see that? And we didn't. We didn't notice, I mean. And they were like, yeah, you just, you just, we just hit you with like an immense amount more traffic. To put that 700,000 apps in perspective. That would be think of like a bunch of these small database companies and startups. That's probably more QPS than every single one combined in in one burst. It's pretty fun. It's fun to see this stuff really is. And just to put that amount of unbelievable horsepower behind a single connection string. Yeah.

Starting point is 00:29:33 Yeah. Whenever I talk to you, just like the numbers that you can say, it's pretty fun to hear all these. Okay. So, Metal available today. I can go sign up and get this if I want it. Yeah, you can get Kraken. It's GA, fully available.

Starting point is 00:29:46 It's been running on, like we said, extremely large databases with loved brands that make a significant amount of money from their software and be very upset if it was down. So, it's more than appropriate and available for anyone who wants to go out there and use it. Cool. Can I switch over if I have an existing cluster pretty easily?

Starting point is 00:30:06 Yeah, it's a fully online operation. You can just go into your cluster configuration, select the metal class of nodes, and it will just happen online. The first time it does it, it takes a little while because we're going to restore you from a backup rather than do the EBS times. And I had in the room I'm in now, I had one of those startups that, you know, we've just been kind of trickling people in to see if they want to get involved with the launch. And you'll have seen them all today, how exciting it is. But just to see their faces when they looked at their graphs and just drops off a cliff. It's just so funny. Like people just laughing. People just bursting out laughing. And like we had one company, their product team pinged and was like, the product's like so much faster.

Starting point is 00:30:47 And they just took a load of performance work off of the roadmap. They were just like, we just don't need to do this now. We've had certain customers of ours have their biggest customers notice and thank them for fixing some of that. Because your P99 is your best customers having a terrible time.

Starting point is 00:31:02 Like that's the funniest thing is like the people that use you the most get the worst experience and it's cause of your P99 sucks. This is a P99 killer. Interesting gap. Do you see any like, do you see performance impact at like P50 and stuff like that too? Or is it mostly at that like topic?

Starting point is 00:31:18 Okay. Oh, absolutely. The latency of pretty much every query goes down unless it's already cached in the buffer pool, pretty much everywhere. The P99 is just the funniest cause it's already cached in a buffer pool, pretty much everywhere. The P99 is just the funniest, because it's the most extreme. And the other thing to like,

Starting point is 00:31:29 for those that are going looking at these graphs now, if you go and look at the announcements on our blog, they'll link out to a little bit of other things. The other thing to have noticed, and in a post I'm gonna do in a couple of days time about the separation of storage and compute, and why it's wrong for OLTP, you'll also notice that the consistency is,

Starting point is 00:31:46 there's a much tighter band of variance for performance, which again is extremely important because a spike in database performance can exhaust a front-end tier extremely quickly from waiting, oh, by the way, we've also had people turn down front-end capacity because they're just literally not waiting for the database as long.

Starting point is 00:32:05 So it just releases pressure around the system in loads of interest. Like ETL jobs take less time. All of these things just run the database pretty much as fast as you can make it. All of these nice extra things happen. But yeah, like a big spike in P99 means you're holding like front end workers, you know, that's then piling up user requests in a queue that might trip some other threshold elsewhere. And now you're like cascading having an issue. If you keep this really narrow, like performance

Starting point is 00:32:38 kind of range, you get really predictable results and you can do more with your database and build an application that performs significantly better. Yeah, yeah. What about like looking at other database providers and things like that? Like, is this gonna be hard slash impossible for other ones to do for most of them?

Starting point is 00:33:00 I guess like, yeah, what would that look like for your standard Postgres provider to do that? They're just not gonna have like sort of the automation to recover from this stuff. I assume they are gonna work on this. I assume everyone is gonna work to try and do this. The fact no one has done it yet tells you how difficult it is.

Starting point is 00:33:16 You really, really have to be very careful. Like I'll put it this way, if people start just flipping this out there, you know, in the next five to six months, I would be very cautious of putting my data on it. The database market is brutal. The funding environment for database companies is brutal.

Starting point is 00:33:38 Companies are now reacting really quickly. We saw one recently that shipped a new version of their product and didn't do backups and lost customer data. This is all a reaction to how jumpy the bottom end of the database market is. This kind of like net new postgres end of the world where you're really just competing for like glamour and how pretty your website is. If these companies just react quickly and start doing this stuff, I'd be very, very wary because we knew it works on ephemeral storage. It's proven running the second largest website on planet Earth, yet we spent four years before

Starting point is 00:34:20 we were ready to do it because you just have to be mature. If you were like, as the anesthesia is kicking in in the surgery, if you heard the surgeon go, oh, I hope this goes well, it's my first time, terrifying, right? We at least are coming at this from, we're as mature as Amazon. You can trust Amazon, you're, we're as mature as Amazon. You can trust Amazon.

Starting point is 00:34:46 You're not getting fired if you trust Amazon. You'll see from some of these blog posts, we've exceeded the reliability. Intercom did a post previously about them moving away from Aurora, and they did it publicly just to tell their users that we're getting out of this nightmare of downtime that Aurora is causing for us. We at least know we're more mature than that. We know that we have these customers running on it. If I was a little database startup,

Starting point is 00:35:07 this net building, like net new infrastructure, you know, it's very risky and scary to do this. Are you surprised that like already the RDS team hasn't done this already? They have ways of doing this if you use their DRDB cluster replication, which is slow and has an incredible amount of trade-offs. Aurora runs on these machines, but does the network file

Starting point is 00:35:37 system on top. And it enables really cool stuff. There's things Aurora does that we don't do. I just don't think they're important. You can instantly add replicas to Aurora. That's great. Okay, cool. But every single query, like at the P99, being twice as slow for the flexibility.

Starting point is 00:35:55 And I think actually that's, let me say nice things about separation, storage, and compute, so I don't seem just like crazy biased. It's awesome for flexibility. It's just really awesome. You can scale to zero. Not that anyone with a serious business needs to do that, but it's awesome for flexibility. Like it's just really awesome. You can scale to zero. Not that anyone in the serious business needs to do that, but it's doable. You can expand storage continually

Starting point is 00:36:12 without having to upgrade nodes. Like if you scale up to the next tier of metal, we roll through your nodes. Fine, right? You know, you can attach replicas instantly. Aurora doesn't get replication delay. If you do a crazy amount of silly stuff to your part-scale database, you may delay your replicas.

Starting point is 00:36:31 It's a lot harder to do with part-scale metal, that's for sure. But you could do that. That's possible. Again, this is cool that you can do these flexible things. None of it worth every single query you do being slower. It's just not worth it. It's the same way people use some databases that auto shard and we make you explicitly shard.

Starting point is 00:36:55 Well, if you explicitly shard up front, your data doesn't get randomly allocated around nodes and every query is faster. It's just this kind of like, nothing's, if there was a magic way, we're still just dealing with physics. If you could talk over a network and scale to zero and do all the things that these distributed file systems do

Starting point is 00:37:13 and be as fast as physical NBME, you would do that. You just can't. And our kind of operational superiority, I guess, has led us to being able to provide you this level of performance and speed inside the cloud. And that's kind of the big, the big unlock. So I'm not going to say that these things aren't bad, not wonderful feats of engineering, like I said, I love the Aurora paper, there's tons of flexibility that can be had with these systems. They're just not worth it at

Starting point is 00:37:38 pretty much any scale. Yeah, yeah. Okay, last thing on metal before we move on, I guess, like, what's the story and timeline of when you start thinking, hey, this could be a good idea, this might be worth it. And we think we could pull it off. Like, I mean, you said something about four years ago, is it that long of you sort of been thinking about this or what did that? Okay. Four year journey has just been like a long maturity journey.

Starting point is 00:38:02 Right. We knew we had Vitesse. I always say that the true kind of secret source value of PlanetScale is two things. Well, three actually. It's Vitesse, which is – I don't need to list all the companies that run on Vitesse. We know this. It's the top end of the internet. It's our operator. So we run extreme amounts of state at Kubernetes. Go to KubeCon's really funny, by the way, because some people don't actually realize that.

Starting point is 00:38:31 It's like a topic. It's like a contrabot. Should you run state on Kubernetes? Well, that's shipper sale because there's petabytes, tens of hundreds of petabytes running on Kubernetes via the test. And then it's our operational expertise, which is if you go on the core about page, it's all of the companies we've worked at. You know, the first infrastructure engineer employee eight at Instagram

Starting point is 00:38:53 works here, the team that built Clover and earlier at Square and earlier at GitHub and you know, just YouTube and everywhere has just worked on this stuff. So you take four years to like, which is nothing in the database world, absolutely nothing to kind of converge into a state that we're really happy while ramping up these gigantic databases for these gigantic companies. And so there was so much else to do. And maturing everything else that goes around the database to make a really mature kind of service. We're not just building a kind of storage engine that people install and run themselves. It's a full service. You pay us for uptime and we take that trust very seriously. We started Metal in

Starting point is 00:39:39 anger about eight months ago knowing we could do it. We just knew that we were at the level of maturity, we could model it well, and we were just in a time of complete autonomy to go and build this and put our entire engineering effort towards it. So we did. We actually were ready for reInvent to go in the beta, but we spoke to our customers and every single one of them wanted it, so we made the agreement. Well, we'll skip reInvent. We won't go and hide it up at reInvent. We'll move you, and then you cannot announce it with us. That's why this launch is essentially two paragraphs from me and then a link to our customers. Because every other launch in tech is boring. It's like the headline is,

Starting point is 00:40:31 company thinks their own product is good, a ton of animations and renders and my browser's crashed. It's that. And it's just like, oh, cool, whatever. What are you actually doing? For us, it was like, who gives a shit about the database? It's a database. whatever, like what are you actually doing? For us, it was like, who gives a shit about the database? It's a database.

Starting point is 00:40:49 People are building cool companies on top of this stuff. Drive to work every day, see the billboards of three of our customers. Make them front of center. If you're doing something consequential, it shouldn't be hard to find well-known brands that are using your stuff. So we did that. We made the launch this way. So it's been around for quite a while now, but in prod and working really, really well

Starting point is 00:41:12 at scale. And that's why the launch is kind of the way it is. It's just a very simple, it's our customers. Just don't trust us, trust them. It is a great launch. I love use case studies and things like that, especially the ones with the charts that are in some of these. They're good ones, for sure.

Starting point is 00:41:29 So yeah, that's exciting. I love it. Cool. All right. I want to switch gears a little bit and talk about sort of just catching up on planet scale in the last year and a half and things like that. So one thing I want to talk about is,

Starting point is 00:41:42 and you sort of mentioned this earlier about all these sort of database companies like fighting over the same like glamour customers and seeming cool. Whereas like a year ago, you all were like, hey, we're gonna focus on profitability. We're getting rid of the hobby plan. Also change like the design of your website,

Starting point is 00:41:57 which Holly has some good stuff on. Like tell me about that whole thing and how like looking back a year, how that's been like, did it go as you expected? Like what do you think about that? I did not enjoy Letting go of a significant chunk of the company and it was terrible experience for them

Starting point is 00:42:15 No one should forgive me for that or have any sympathy That's what you do in leadership and it was way more painful for the people that were let go and who stayed right? It was a scary time. Being a profitable company is the most incredible advantage, strategic advantage. Even though it was a horrible moment, it was one of the best decisions we've made because it gave us, it changed our future. It meant we could do things like this. If you're on an 18-month runway, you have to just ship some junk to get out there to get the hype cycles going, to build pipelines so your sales team has something to sell because

Starting point is 00:42:58 if VCs don't like what your last six months of growth, you're screwed. We were like, okay, we're going to take our time. We're going to roll it out with're screwed. We were like okay we're gonna take our time we're gonna rock like we're gonna roll out with giant customers we're gonna test it. So thoroughly because what would it what would a safe database company do by a company that's safe from. The kind of ticking doomsday clock of the cash out date right for those who don't run startups here, if the company is not profitable, the CFO, the COO, and the CEO, and a ton of the rest of the company know the day where you run out of money. Having that in your head is really, really scary and sucks for your customers. It just really sucks. With the brands that run on us, even our business practices have to be safe and durable. We've now built a durable business. Our numbers are healthy. We grow. We ended up with millions of dollars

Starting point is 00:44:04 more in our bank out at the end of last year than we actually expected we were going to have. And that's money. Like I'm sat in our brand new office. It's beautiful. It wasn't just like blowing VC dollars up the wall. It was paid for with money we earned. And there's more coming in every time someone pays a bill. You know why our support is just the best out there. If you write into our support team, shout out to Omar and Jay Greit, they write back an incredible reply. Do you know why?

Starting point is 00:44:32 It's not a waste of time because you're paying us money. You're a customer. It's EV positive for us. It's the best thing. And so I miss the people that aren't here anymore. And it was a year last week actually. And I thought a lot about it and it was, it was really sad and it was things I would have definitely done differently. But look, when you look customers in the eye and say, we're going to, we're here.

Starting point is 00:44:57 We're going to be around, right? We, you know, and then, and then being able to use all of that optionality, like I think 50% of the external behaviors or weirdness you see from startups is just the fact that they're dying slowly. And they're constantly. Just grasping at straws. Grasping at straws, just desperate.

Starting point is 00:45:17 Like I said, we talked about that database company shipping without backups. No one can tell, I don't think they tell like, I don't think that they're idiots. I think I don't think anyone would do that unless they're under some form of purest, like fear of another startup that's just got funding or another hype cycle. And if you're on that train, it's painful and you don't want your database provider going through that. We're not and it's been

Starting point is 00:45:43 very special and it means we have the optionality to to do that the We're not. And it's been very special. And it means we have the optionality to do the things we do. Yeah. Do you think you'll ever raise again, like, for anything? Or are you just like, Hey, no, we're going to be profitable for the rest of time and go from there? Or is that too hard to say? You never say never, right? You know, there's ways in which you could raise that mean you don't risk profitability, right? Like you can invest's ways in which you could raise that mean you don't risk profitability, right? Like you can invest that money in net new things and know that, you know, if those things didn't work, then you can reallocate the spending, you can do various things. I think it's more about being stable, reliable, and knowing you can always get back to a really, really healthy runway.

Starting point is 00:46:32 You don't need infinite life. You need enough to get to a stage where the business just healthily prints money and your customer base are always supported and enabled. So it may happen, it may happen, but it'll be done in a way that doesn't risk the durability of the business. And we hold on to that. So so dearly. We also have things like, you know, we have secured lines of credit, for example, right, that just, we don't take the option, we're never going to likely take the option, but you have them because just in case things happen.

Starting point is 00:47:06 And by the way, the amount of banks and like, like, finance firms that just want to work with us because they see the health of our businesses, they see our customers, they see who's coming on board, their own customers telling them about us. It's really just about having a very healthy financial picture. And all of these database startups that are raising these hype rounds, they're doing it because their customers, there are negative gross margins. Even their paying customers cost them money. Growth is like, it's kind of a toxicity for them, but in the sense that it means something

Starting point is 00:47:37 good, but it doesn't really answer the fundamental question of how you invert that into profitability. And so they raise money to stay alive. We raise money to get bigger, to grow more, knowing that the core is kind of durable and protected. So, and tell me about getting rid of the hobby plan. I know there was criticism of that and like, hey, you're killing the low end of the market or all the people are gonna grow with you.

Starting point is 00:48:02 I guess, what has it been like getting rid of the hobby plan and what have you noticed business wise? First of all, you learn how creatively people can insult you on Twitter. Pretty funny. Most of them actually, even the bad ones, but in seriousness, you know, I get about email every day about what people wanting a free version of, of, of cloud at scale. Still. Okay.

Starting point is 00:48:23 Yeah. Yeah. And, and I think it's extremely unlikely we'd ever do it again. And I'll just I'll tell you why it's because the conversion rates are horrible. They're just absolutely horrible. And, you know, my prediction is over the next two, three years, you're just going to see bargain like sort of bargain-based acquisitions of database or any infrastructure companies that run much infrastructure for you

Starting point is 00:48:51 because they can't outgrow the overwhelming need for the free tier, right? You look at so many of these companies providing infrastructure for free and it's really like 99.9% is students that just learn and want to abandon, you know, like everything's a power law. And we saw this at GitHub, just the absolute extreme power law of repos that get even one commit a year. It's just, it's unreal. And then the rest is just storage. And it's so much worse when you're running infrastructure and running databases. Because there's a real cost. yeah, to all that stuff. It's a real cost. It truly is a real cost. You can scale zero, you can do all this stuff.

Starting point is 00:49:31 It's an optimization, but it's not an optimization that's indicative or useful at any form of scale. It just makes your free tier cheaper, which is I think it's a real thing if that's your market, right? But I love hobbyists. I don't want people to think I don't like hobbyists. your market, right? But I love hobbyists. I don't want people to think I don't like hobbyists. So, you know, I obviously worked at a company that gave me so much in terms of career, learnings and love and happiness, and that was centered around the fact that so many new people wake up and want to become software engineers and develop and learn. That said, I want to build something enduring that matters and backs the incredible products that are being built in tech. And to do that, you need to have a sustainable company. And a Fortune 500 company that may be a plan scale customer does not show up

Starting point is 00:50:20 with the inability to pay $39 for a month of a test database. And if a big Fortune 500 company emails us and says, hey, we're interested in becoming a customer, we let them try for free anyway. It doesn't change anything really. It just means that you don't have this incredibly aggressive burn. And it means that you're competing on real value and real features. There's so many companies and you see people talking to us on Twitter, they just love paying for a thing that really fuels their business and they know they get great support and they know what they just value. It just feels like it's such a little bar to clear for people that makes them really treasure what they're purchasing and appreciate it.

Starting point is 00:51:03 And then we just, at the bottom end, it's not like we make loads of money from you giving us $39. You know, it gets easier the more we have, right? Like you can just pick a commitment, look at the house and do all this kind of stuff. But it means that it doesn't hurt our runway or just overall functioning of the business

Starting point is 00:51:23 and need more head count and do all this sort of stuff to manage. And it just converted terribly and it converts terribly everywhere else. I know this. I know a lot of people and a lot of people in tech and I talk to people. It just doesn't change anything. The other way is we stopped doing these kind of partnerships with other vendors either

Starting point is 00:51:44 because they want everyone's free to begin with. And at the end of the day, when you start to hit actual database problems, they phone us anyway. So it didn't really make any changes. Yeah. Yeah, and you're talking about the additional runway, or just the cost of it.

Starting point is 00:52:01 But it's like the company cost, too, of you're talking about, you provide really great support in detailed stuff. And it just like, if you have a ton of free to your customers, you're just overwhelmed with sort of the support stuff. It's harder to provide that really great support to your actual paying customers on that sort of thing. It also just like changes your company focus, I think.

Starting point is 00:52:20 If you're just like, how many free to your customers can we get? It changes the features you are getting. It changes like what people within the company care about. Like I've seen that sort of like eat up a company, I think. Oh, sure. Think about that stuff, you know. And then when 99% of your users are only that audience, you're facing, you're just going to destroy yourselves in front of them by taking it away. You just don't have a choice. So now you're just, it's just this kind of cancerous thing that eats away at your bank balance. And yeah, you know, the support

Starting point is 00:52:46 requests when we had a free tier were unbelievable. Just, I still remember that someone wrote in and they just pasted from some log, the SQL and they're like, what does this error mean? And they said, that's this, that's what your IRM is. That's the SQL your IRM is actually sending us. They just didn't know. And it's fine. We were all new. I'm really bad at so many things in tech that I do equally dumb things. It's just not like I'm here to build a business, build something for a company. I'm just not here to solve those problems. It's just not... And it's just that side. I see those problems. It's just not, it's just, it's not, and it's just that side. Like I see people, you know, people want to put their growth charts, whatever,

Starting point is 00:53:31 put revenue charts, if you really want to brag, otherwise just leave it alone. Like if the, if the, if the goal is burn tens of millions of dollars, giving $3 away for $1, I still think I can do better at that, but I just don't want to. You know, it's just not, I wanna build a business and that is a next level set of requirements that does not involve cheapening what you build to the point where, like you said, it becomes all you build. Yeah, yeah.

Starting point is 00:53:56 It's tough with that, like the juggle the script ecosystem on Twitter and just like that whole thing. It just like, it can really, it can feel like you're making awesome progress and stuff. But it's like, are you making progress towards the things you want? Which like you're saying, it's like making a sustainable business over time.

Starting point is 00:54:11 It can be deceptive in terms of like, yeah, what you're actually doing. Yeah. GitHub stars, I can tell you, as someone who's there for eight years, they don't mean shit and the 99% of them are spam. You can just buy them. You know what I mean?

Starting point is 00:54:22 Like you just, just like you can buy Twitter followers. What does it mean? Like it's quite cheap to hit a star. Doesn't mean an enterprise contracts coming from that. You know, I just, it's, it's all mirages and smoke and mirrors. I think it's getting worse where you see these kind of like people, you know, tweeting a picture of a mattress on the floor and a $5,000 a month San Francisco apartment, like living the kind of grind set or whatever. Just charge more for something, you know, charge an amount for something people value that is above your costs. You know, then we'll then we can

Starting point is 00:54:55 talk about it. Yeah, this is funny. Your JavaScript Twitter is interesting. And I sometimes get into debates with people. And the overall thing that I've realized with the Hacker News and the Twitter crowd is every tech works when it's at a certain scale. Like you can do SQLite on the server if you want. You can do anything, you can do flat file. It just doesn't matter. Like it's no point arguing with people

Starting point is 00:55:20 that's entire world view is so small in terms of the problem set that you actually have to solve at scale that anything works. You just can't, you can't, you know, when you see Facebook scale, you see they've got hundreds of lawyers that just do power and light contracts for data centers, right? You know, and teams of thousands of people that are doing, laying, you know, building data centers or the team that lays under C cables because you need actual internet doesn't have enough capacity for you to do internet data center traffic. Like that's what scale is about.

Starting point is 00:55:50 That's why again, you can just shave so much money out of the cost of a customer doing the things like we've done with Meadow. That's the kind of problems I want to work on and using the opinions and getting into the fray of a disguise, because those people don't go on Twitter to talk about what they're doing. They go and meet a private user group. Because they exist, by the way.

Starting point is 00:56:15 If you're not in them, you're not doing that stuff, right? There was a DBA group that used to meet from 2013 to 2016, which was just the top probably hundred logos on the internet. We always get together at services, go and share actual problems and bitch about MySQL. that used to meet from like 2013 to 2016, which was just the top probably 100 logos on the internet where you always get together at some of these, go and share like actual problems and bitch about MySQL. Or you can just go and listen to the opinions of a lot of junior developers

Starting point is 00:56:35 that just want to be out all day. This is, it doesn't, it's like winning in that crowd is not indicative of any future success. Yeah, exactly. I still love you all though, you're still fun. I mean, we can all have some fun. I like the JavaScript Twitter community, but yes, you gotta make sure you know

Starting point is 00:56:52 which opinions you're taking and which ones you're leaving. There's some good ones as well. There's some really responsible voices there that kind of, I'm old and I don't wanna be old, but the energy of this community that's, they are going to ship their future. They'll rewrite it to run on, you know, other things and they'll move away from postgres that's fine, right? Like, but at the end of the day, the next internet is going to be built by these

Starting point is 00:57:15 people and we can't become cynical to those things either. There's so much to teach us. And, you know, I like to think that we market with taste. I like to think that we have a brand that is kind of cool-ish. Um, it takes watching what younger people are doing and I find them incredibly inspiring. The same way I could, you know, I made paper planes that no one should ask me to make an airplane. Yeah, absolutely. Yeah, cool. I appreciate that, that like introspection and looking back on that.

Starting point is 00:57:45 I wanna ask another one, in that sort of same area, I don't mean to make you do all that sort of stuff, but one feature I thought that was really cool that you had was plan-to-scale boost, and it was this Norea data flow type thing. I saw that you all deprecated it. Can you walk me through, was it customers just didn't need it as much as you

Starting point is 00:58:03 initially thought, or I guess what's the story behind boost? Yeah, so boost is incredibly cool tech. Yeah, it's super cool. Yeah. And again, for test makes doing boost really, you know, really simple. There is some fundamental issues though with with well one, we've realized since that just for tests is materialized views are just so good and continually updatable that we should just ship that and we will and it'll be significantly better than and it will fit most new use cases what we found was there were so

Starting point is 00:58:38 many like caveats to which queries that were non-deterministic that could be supported that it became this like horrible user experience nightmare and really just let people down. And it just wasn't our full focus. I'm actually just kind of one of my just learnings of growing into doing the job I do is that you can only focus on it. So honestly, you just can't, it's just, you know, too many side projects, too many things. Like we either would put, if it comes back, it will be our full force, but it just wasn't at the time and there was other things to be done, bringing big customers on. And so leaving something that kind of sets people up for a crappy experience, it's not

Starting point is 00:59:18 good, add a support bed and do it or do it well. And you know, we just decided, rather than live with all the caveats and whatnot, just deprecate it and say goodbye. I think something that serves that need will be back one day, also metal so fast that just rip it down to the end, who cares, just ride the lightning, it's great.

Starting point is 00:59:44 But yeah, it was just, we could have done better listening to our users, we could have done better just, you know, taking projects on in that time. I'm sure something like that will be back one day. Yep. Yep. Very cool. Okay. I want to switch to like the broader database space and I've got, we've gotten a little bit around the startup-y type things. I wanna talk about AWS, who is one of the big elephants in the room for databases, right? It's the default. One of them, V.

Starting point is 01:00:16 Yeah, V big. In terms of, yeah, yeah, exactly. So there's always the standard stuff. We got the new stuff too at reInvent. So reInvent, I think one of the talks of the town was D-SQL. I guess what are your thoughts on D-SQL? Embarrassing launch, truly embarrassing, slideware, like honestly.

Starting point is 01:00:36 The engineers behind it are phenomenal engineers, but it just speaks to the position that large clouds are in now, which is spread so thin, like innovators dilemma, to the extreme degree, 400 services, going down every single rabbit hole. The slide looked amazing. It's cool as a spanner killer, which again, fundamentally not the right architecture for most workloads.

Starting point is 01:01:01 Amazing tech, but just not for really anyone outside of Google to need. So then DSEQ is this reaction and the slides just looked too good to be true. The docs, like the next day, you know, fully highlighted that it was just not even true, or good. So it was just a very bad launch, honestly. I mean, I don't know if they fixed this now, but we found out the day after you can't add an index on a table that has data. So it's like, it's a non-production database then, right? Or you what?

Starting point is 01:01:34 We copy our database out and do an online migration every time we need to change. It's just botched essentially, I think as a launch, even though the engineers working on it of extremely, extremely good engineers, I just get the feeling it shipped a year too early as a reaction to something. Yeah, yeah, that is interesting. I guess like on that same note of, I mean, I guess something that's actually shipped now, Aurora Limitless is now GA, which is like probably the closest thing to to Vitesse and Planescale, I'd say. Like, I guess what happened with Aurora Limitless? I to Vitesse and Planescale, I'd say. I guess, what happened with the limitless? I don't know.

Starting point is 01:02:10 It's been a gift. It's been such a nice thing for us because one of our customers was like, we really enjoyed finding its limits in production. Oh, really? Okay. Yeah. It was like it has limits. I mean, it shipped again without the ability to do read replicas. What is that? This is what happens when you just become massive and PM. I think they fixed that now, but I believe in a bunch of scenarios. That's the thing, right? Eventually you just become a massive company with product managers everywhere just ticking boxes. And it never really completes the job to be done, right?

Starting point is 01:02:39 And because we've run our stuff on behalf of our customers, it kind of changes how we build software, right? It's very practical. We're still waking ourselves up if it goes wrong, right? And yeah, I think it showed people that the model is right. It's just not executed in a way that is viable. And so we have a bunch of customers paying us a lot of money that tried it. They had the pitch. The pitch was great. And they came to us for the delivery. Yeah. Yeah. Yeah. And it's also just interesting, I think, how they've confused that Aurora product line, I think, by naming all these things Aurora, but they're like, very different.

Starting point is 01:03:20 You can't, it's like you can switch from one to the other. Correct. You know? Yeah. Yeah. Yeah, why don't you allow online upgrades, the most absolutely basic thing you need to do with a database before you start doing things like limitless or D-SQL or whatever. It's easy to go from MySQL 5.7 to 8 by leaving Aurora

Starting point is 01:03:39 and coming to PlanetSkills, and it is doing it on Aurora. Like all this blue-green deployment stuff, it's technically possible if you read like 15 blog posts, you still screw it up because why are you paying that money to then go and build this ball, knotted ball of hell to just upgrade your database? It's craziness. It's great.

Starting point is 01:03:58 It's amazing for us. I think it's indicative of a trend. I think if you are building a startup that competes with Amazon, it's the absolute best time to be doing that for all of these reasons. And you just have to study companies for the last tech companies, last 40 years. It's not even more nuanced than the innovator's dilemma. Yep. Yep. What about just like the economics? Is it hard to make the economics work when you're paying so much infra to AWS? I mean, obviously you all are doing it with the profitability,

Starting point is 01:04:23 but like that seems just like amazing to me that you can like build on top of AWS and still like eat out enough margin to make that work. Well, this is where metal is transformational for us and our customers. Because if you're buying convenience from Amazon, like EBS, yeah, I mean, it's you can't beat them, right? How do you eventually win? When you're buying the same metal servers they are, and you build, like they get a little margin on top of that, but the price is set by N number of massive vendors. Like, you know what I mean?

Starting point is 01:04:54 Like at the end of the day, if you're just buying metal machines off them, you know, aside from their market advantage and ecosystem, Equinex are doing similar amounts for you, right? S3 is cheap and things to build against. The thing that has been amazing for our customers is with PlanScale Managed, it runs inside their account. So if you're a significant Amazon customer, you get to negotiate incredible commits

Starting point is 01:05:15 with saving plans against these machines and just save extreme amounts of money. Look, just one database on PlanScale, its daily operational cost went down by $20,000. Wow. Yep. Yep. Because of Metal. So this is why it's so good. I mean, we actually like, most of the time you're competing inside their ecosystem using their tools and the house wins. We are really just getting a server from them. And all the better they make the tools around us like S3 Lambda, the more reason is to like buy the best database. And by all of those services, you should never use yourself. And that's why metal is so exciting, which is we can just compete in a way

Starting point is 01:06:02 that is very unusual to be able to compete with that with that wasn't Amazon and it takes might it takes software might that's it it's like it's actually just a fair fight for once it's like you write code we write code we've we're better databases so that's how that goes yeah interesting are they a good partner to work with generally um yes i would say so um with Yes, I would say so. Sometimes it's obvious that we compete. Sometimes it's obvious that we can do really good things together and you just duke it out. I met the head of partnerships for one of, who personally runs two of the largest Amazon partnerships. And he was like, I don't want to keep you the impression

Starting point is 01:06:46 that there's not daily spats between the two of us over small things. Like that's just what two giant partners that kind of compete. But, you know, I think the thing that's made it a lot better and I do love working. They, I know like they, they, it's a business that I respect so much. They've built an incredible company.

Starting point is 01:07:03 You go to re-invent and if you haven't been to re-invent, I truly recommend it. If you want to, I tell just random software engineers that they barely like leaving the house, let alone like going to a conference with 70,000 people across five casinos. I tell them to go to like stand inside the physical representation of the commercial side

Starting point is 01:07:23 of the tech industry. This is like kind of gigantic, gigantic, just like hundreds and hundreds of millions. A lot, the largest booth there for the floor space is two and a half million. It's unbelievable. And so they built this phenomenal business, just, you know, up until the last three years, the cloud was Amazon and no one else mattered. Now people are catching up. And so I respect them. And I adore them in so many ways. Like we've

Starting point is 01:07:49 people have built incredible products and experiences. But yeah, they're Amazon, right? Like they're the ones that the ones that they're the Goliath. And there's a lot of Davids. And you have to, you have to try and win. But we work really well together. And the thing that's really helped is them seeing how beloved we are for some of their biggest customers, right? People that are spending a lot of money with them, saying, you know, if PlanetScale still likes running on you and works well, then we're good.

Starting point is 01:08:20 You know, like, it's kind of that. It's just like, you know, they took notice and we had some giant marketplace kind of transactions go through and they're like, who are they, you know, and then that, that helps build a relationship. And yeah, they've done marvelous things. And they are truly customers obsessed to that. Yeah, like I said, the big, the big differences when our customers told our joint customers told them how much they love us and that the pie can grow and get bigger for all of us. It's good, it's good. Yeah, yeah, for sure. Yeah, that, in that point, going back to the biggest booth at reInvent being two and a half million,

Starting point is 01:08:59 I would love just to see the P&L just for reInvent and see how that, because it got to be fabulously expensive, all the stuff they're doing. But like you're saying, the sponsor booths are crazy. It's $2,000 a ticket and 70,000 people come, which not all of them are playing full boat, but half of them, that's a lot of money. I don't know, I'd just be very curious just to see the P&L

Starting point is 01:09:19 for the conference itself. It'd be very interesting. It would be amazing to see those numbers. And then I would just love to see the kind of economic impact on Vegas, because it's not just like it's, you have to be like, it's March right now, if you're not trying to book venues for reinvent, you're not getting one. Like every restaurant, every, I do three dinners a day at reinvent, like I just stack them up because that's like where the customer is.

Starting point is 01:09:48 You get to see these amazing scenes just to go on a complete tangent. I once saw someone who was probably a mid-market rep for a firewall company throwing up in a bar next to a fully bejeweled cowboy because it happened to be at the same time there was a rodeo. Yeah, and a rodeo. Yeah. And if ours, yeah. Yeah. Where else are you seeing that? Like it's incredible. It is just amazing.

Starting point is 01:10:10 Um, yeah, it's, it's just a, it's a thing to behold really is in the, you know, Google rent the big orb, which happens to be line of sight in my hotel room and just shone through GCP ads through my curtains the whole time. It's just amazing. I recommend it. Yeah, anyone who hasn't been should just go because it's, how's it for you? You must be a bit of a celeb there. People must harass you a bit. Yeah, it's always fun to go and just like, I mean, for me, like I live in Omaha, so that's like my one time a year to just see everyone on the internet and just like all kinds of people. So like, I, yeah, I love going for that. It's very tiring, but it's, it's awesome.

Starting point is 01:10:47 Yeah, for sure. Yeah. Yeah. Yeah. By the time you're at the, like the Kygo show at the end, I'm just like dying, but at least it's over. Yeah. I tell you, I met casually, I, you know, met you or mentioned someone, Oh yeah, I go to Vegas every year. And they're like, Oh, cool. They like, no, not cool. Not actually not. Yeah. I have to go. And it's this. Yeah, the COVID year was really strange too. I caught COVID while there. And all the vent. Yeah, all the I got, yeah, got the reinvent branded strain of, you know, sponsored by Splunk. But yeah, but yeah,

Starting point is 01:11:18 like, that was really strange, because the vendors, we all showed up, the audience like hell no, like, this is optional for us. And no one went. So it was this kind of weird reinvent where all the vendors, we all showed up, the audience like, hell no, like this is optional for us and no one went. So it was this kind of weird reinvent where all the vendors were actually just like hanging out and going to each other's parties. And I was at one party, they were like, yeah, we're $5,000 underspent on the bar. So just start ordering the top line drinks

Starting point is 01:11:39 and me and our CTO had enjoyed ourselves. But yeah, it's great, it's great fun. Google Next is getting similar scale too, which will be there this year as well. Interesting. I've never been, I need to go to one of those too. I just remember seeing that one, that guy with like his,

Starting point is 01:11:55 I can't remember his name, but like the goofy outfit last year, like running around and yelling and screaming and banging on drums. Did you see that at Google Next? Oh no, I did not. I did not. Oh man, I can't remember his name.

Starting point is 01:12:03 He's like some performance artist. Oh, on stage. Yeah, yeah, yeah. Yes, yes. Yeah, I know. Mark no, I did not. I did not. Oh man, I can't remember his name. He's like some performance artist. Oh, on stage. Yeah, yes, yes, yes. Mark Simpson, I think his name is on there. It's like a search from Silicon Valley, yeah. Yeah. Yeah, we'll be going. Metal's available obviously on GCP as well.

Starting point is 01:12:14 Okay, yeah. And so we'll be there with the presence going and seeing everyone. Should be fun. Yeah, yeah, for sure. Okay, two quick questions I wanna talk about in the database space. Number one, you talked about migration from five, seven to five, eight. I know like Mark Callahan

Starting point is 01:12:30 and a few others have talked about just like perf regressions in my SQL to go on all the way to eight. Like is, are you seeing that too? Is that like some isolated workloads have issues or like, what do you think about, about that? So there are perf regressions in certain ways, none that we've really seen in production. This is kind of reveals the difficulty with benchmarks, which is they're not real. I never know what's true and what's not. Yeah.

Starting point is 01:12:56 So we had a pathological case with Metal where we were like, wow, why is it benchmarking this way? And we then shipped it with customers and never saw the problem. And like, you know, no one runs their databases 100%, which, and for benchmarks, benchmark to 100%. You know what, that nice 10%, things go wacky. You know, like things are really crazy.

Starting point is 01:13:18 Where did wacky come from? That's not a word I really use often. But yeah, like benchmarks, I always say to the team because because we truly know benchmarks are mostly not useful for prod. And so does Mark, Mark will tell you this mark and Mark is a, you know, if Mark's listening, I'll say an absolute legend in the database world and seeing scale unlike any other and you know, it's very interesting to see his work.

Starting point is 01:13:46 I think it puts pressure on the MySQL team. But it also from our lived experience of having a lot of people running on eight, it's, we're not really seeing it as much. But you will get, but there's, there's real regressions, you know, that happen all the time that benchmarks won't catch, right, as well. You know, it's a very, you get, you only really get to, and we don't, you know, even then, you know, it takes a lot of sophistication to have a good synthetic environment or a load testing environment to really test. Really you need an architecture that allows you to incrementally roll your database out.

Starting point is 01:14:25 So if you're sharding, that's the best way we find regressions is that we can slowly roll out things shard by shard. Interesting. Will you put shards onto... Even within one deployment, you'll have some shards that are on higher versions and just seeing how that works and things like that. Correct. Yeah.

Starting point is 01:14:44 And then it will roll slowly and we'll look for any deviations and metrics. And we have to roll these changes out across a very large amount of computers. And so we catch things very, very early. Like, I mean, if you're processing the millions of QBS that we're doing, you'll find things pretty quick as you have these issues. We also do the core SQL, MySQL work ourselves. So we can find and fix some of those issues pretty quickly. Man, I'm trying to think like how you all could be, I got better benchmark than some of just like the,

Starting point is 01:15:16 you know, fake benchmarks. I'm just like, hey, this is what we're seeing and are we rolling this out? We're not seeing impacts from all these customers. Like, yeah. Yeah, we could do more to actually talk about that. We just can't get on with it, I guess. You know, like it's okay.

Starting point is 01:15:32 We just kind of fix the problem and just move on, you know? Like it just, yeah. But benchmarks have value, but not as much as I wish they did, because they're a very convenient way of doing things, but they're just, they're just a benchmarketing benchmarketing is it's a whole other thing. Yeah, for sure. Okay. Last like database related question, sort of going back to AWS is like we had dsql come out, but also reinvent like Dynamo had multi-region strong consistency and like it seems like something

Starting point is 01:16:02 is underpinning both of those, like the same sort of underlying tech. I guess like it seems like something is underpinning both of those like the same sort of underlying tech. I guess like is that something that you see a lot for, I mean I know that's not for everyone, is that something you see a lot for customers with tests having that sort of use case, whether it's like zero RPO, you know, if a whole region goes down like that for square or something like that or I guess like how often do you see that? Do you think it's useful? Cool. I guess like where are you at with some of that?

Starting point is 01:16:26 So strong consistency cross-region is a bad idea. Yeah. Seems expensive, right? That's a lot of latency to wait for. Correct. So we have lots of very large multi-region deployments with extremely high RPO because we do the asynchronous style of replication across the country, meaning it gets out of the data center and is able to catch up.

Starting point is 01:16:55 And a thing we see with a lot of folks is all of the other complexities of doing a real cross-region failover is so difficult that even if the strongly consistent database is ready, it's just a matter. I mean, there's just so many other things going on. Again, you've just eaten that performance for the whole time. Every single right, yeah. Right. We just want a customer against Spanner.

Starting point is 01:17:23 They could not then they were using all of that because like why use Spanner if you're not going to do that stuff? What's the point single region Spanner? They could not get a write down less than 12 milliseconds. That's ridiculously slow with sub millisecond. Like it's crazy. The extra capacity after provision to handle this stuff is nuts. For what? I mean, amazon.com still runs in one region. If they can do it, everyone else can. This whole, even Verso have kind of dialed back the kind of like that whole, um, global stuff. Yeah. The global stuff in a, in a edge. Yeah. Yeah, exactly. Because it's just, it's not,

Starting point is 01:18:01 it's not, people just don't need it. It's just, it's not, it's, it's, it's a good checkbox for if USC's one really shits the bed, are we going to get to come back the same day is essentially what most companies want. But even like the reason cross region failovers are not automatic on planet scale is because nobody wants them automatically. In fact, they make sure that it's not going to happen because if your database evacs and everything else doesn't, and then we're not solving cash warming for you, we're not solving bringing front ends up, it's a requirement. I get it. There's a lot of databases out there that are very niche that are set up to do this and make it really easy. Just tiny, extremely low transaction volumes. You just can't do it any other way than our model, really, at that scale.

Starting point is 01:18:50 Yep, yep, makes sense. All right, well, Sam, I always appreciate it. Again, all those sort of things. I've been curious about it. It's good to have someone with a reasonable, strong perspective on it. So thanks for relying on us, and congrats on the meta launch. This is super cool, And yeah, excited to see

Starting point is 01:19:06 people use it. Thank you. And thank you for being a place where we can talk about these things, right? There's, you know, there's the corners of tech that people doing real things and, you know, not just hyping everything up is getting smaller. And so it's great to have a format for doing that. And I will also pass on that when I mentioned in our marketing channel that I was chatting to today, you have a lot of fans out there

Starting point is 01:19:25 that plan to get to listen to you. Really? I'm surprised at the MySQL coming. So I love hearing that. That's good to hear. So yeah, thanks for coming on. I really appreciate it. Yeah, we'll link to all the metal stuff

Starting point is 01:19:37 and of course your Twitter account and all that. But yeah, best of luck. Amazing. Thank you so much. Thank you. Thanks.

Software Huddle - Faster & Cheaper on PlanetScale Metal with Sam Lambert

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.