Software Huddle - Faster & Cheaper on PlanetScale Metal with Sam Lambert
Episode Date: March 11, 2025Today, we have Sam Lambert back on the show! Sam is the CEO of PlanetScale, and if you follow him on X, you know he’s one of the sharpest voices in the database space—cutting through the hype with... deep experience and a no-nonsense approach. In this episode, we dive into PlanetScale’s new Metal offering, which has been battle-tested with PlanetScale’s high-scale cloud business partners and is now GA. Sam also shares why staying profitable is crucial—not just for the business but for the stability and reliability it guarantees for customers. While many cloud infrastructure companies chase the next hype cycle, Sam prefers to keep it boring—delivering rock-solid performance with no surprises Finally, we close with Sam's thoughts on other happenings in the database space -- Aurora DSQL, Aurora Limitless, MySQL benchmarks, and multi-region strong consistency. Tune in for a deep dive into databases, cloud infrastructure, and what it takes to build a sustainable, high-performance tech company. Timestamps 01:34 Start 06:42 PlanetScale Metal 11:15 The problem with separation of storage and compute 15:02 EBS Tax 17:32 How does Vitess handle durability 22:58 Metal recommended for all PlanetScale users? 27:20 The hidden expense of IOPS for cloud databases 37:41 Timeline of creating PlanetScale Metal 41:32 Focus on profitability 47:52 Removal of hobby plan 57:45 Deprecation of PlanetScale Boost 01:00:24 DSQL 01:01:51 Aurora Limitless 01:04:15 AWS as a partner 01:07:00 The spectacle of AWS re:Invent 01:12:22 Benchmarks and benchmarketing 01:15:51 AWS Databases + multi-region strong consistency
Transcript
Discussion (0)
First of all, you learn how creatively people can insult you on Twitter.
Pretty funny, most of them actually, even the bad ones.
But in seriousness, I get about email every day about what people wanting a free version
of Cloud at scale.
Still.
Okay.
Yeah.
Yeah.
And I think it's extremely unlikely we'd ever do it again.
What about just like the economics?
Is it hard to make the economics work when you're paying so much infra to AWS?
The thing that has been amazing for our customers is with Planescale Managed, it runs inside their account.
So if you're a significant Amazon customer, you get to negotiate incredible commits with saving plans against these machines and just save extreme amounts.
One database on Planescale, its daily operational cost went down by $20,000.
You have really lived it in a super interesting way. What's up, everybody? This is Alex. I'm
really excited about the show today because we have Sam Lambert back and he's the CEO at
PlanetScale. He was the first guest I ever had on the show. I just think he's got a really good,
interesting opinion on database stuff. He sees a lot of database stuff on the show. And I just think he's got like a really good, interesting opinion on database stuff.
Like he sees a lot of database stuff all the time.
He knows like what's real, what's not real,
what big companies actually need.
So we talk about a lot of that.
We talk about PlanetScale Meadow,
which is a cool new release they have today,
which I think is really interesting.
Some cool engineering stuff and like things
that they're able to do that a lot of people
aren't able to do.
So I think it's really fascinating to go through that.
So check it out.
If you have any guests you want on,
if you have any questions, things like that,
feel free to reach out to me or to Sean.
But with that, let's get to the show.
Sam, welcome to the show.
Thank you.
Thank you for having me again.
Yeah, absolutely.
And I should say welcome back because yeah,
you were my very first guest on the show
and I love that conversation.
And I think you're the first person
that I've had back as well.
So honor on both accounts.
And yeah, I mean, you're just like one of my favorite people
in the space.
I love chatting with you last time
and like your good Twitter follow where I feel like
in the database space, there's a lot of hype.
There can be a lot of hype type stuff.
And I think you tell like it is
and kind of cut through the noise a lot in a great way.
So I'm excited to have you on.
I guess maybe for people that don't know you,
maybe get a little background on you
and PlanetScale and all that.
I didn't realize it was your first guest, first of all.
You were my first guest, yeah.
I feel so honored.
Wow, wow, that's amazing.
Yeah, no, I mean, I love the show
and your audience specifically,
it seems like a really great bunch. And again, same, not following on to it. Yeah, no, I mean, I love the show and your audience specifically, it seems like a really great bunch.
And again, same, the following on today.
Yeah, so my name is Sam.
I'm the CEO of a company called PlanetScale.
We specialize in kind of building
the world's most scalable performant cloud database.
Lucky enough to support some of the world's
largest consumer brands,
ranging from Block with Cash App and Blizzard
and all these really cool companies building cool things.
It's very fun to kind of exist in the world
and know that the things you're using,
the products you're using every day kind of,
use your products, it's very, very cool.
Before that, I was at Facebook running
the traffic production engineering team,
which is a very small group of people responsible for about 12% of the internet's traffic.
And then, yeah, it's amazing still.
And then before that, I was VP of engineering at GitHub,
a small little code hosting website that happened to have every developer on planet Earth using it.
And yeah, before that, I kind of worked on a bunch of database systems at scale doing
various different things. That's kind of how I learned to do what I'm doing now and went to
GitHub to do the same stuff. Yeah, I've seen database problems at every company I've ever
worked at and, you know, at all of our customers and it's good. It's fun to be building a company
that kind of scratches the itch of paying down all those problems so that people don't have to run into them in the future.
Yeah, yeah, for sure. I mean, I love learning about and reading about databases and all this sort of stuff. Like you have really lived it in like a super interesting way from GitHub to Facebook to now PlanetScale. And like you're saying all these huge companies that run on PlanetScale. Like, really, I imagine you just see some very interesting stuff
day to day, which is a lot of fun.
You do.
The database space is actually really strange in the sense
that it's meant to be boring.
And it's actually probably so boring that people get bored
and then do ridiculous things with databases.
And you're kind of like, yeah, we can't do those things.
And nor would we. So yeah, it's a kind of, it's a
very interesting space to be in to try and do something dynamic and new, but also there's
the rules. Right? You know, I was, I was kind of likening this to running an airline the
other day, which is no one wants creativity kind of like in the engines and all of the
bits, you know, you have to, you have to build something that operates and works at the worst of times and predictably,
right? Like a 737 can land pretty much anywhere in the world and there's mechanics on call
at any point to fix any single issue. There's spare parts, there's everything, you know,
we try and operate that way. That's Ed. You know, you can have some fun with the new kind
of business class interior and make those things nice.
As long as the fundamental rules stay kind of respected and it works, you can build something
kind of fun and dynamic that has real customer impact.
Yep.
Yep.
And that was like one thing I remember us talking about last time too, is like, Hey,
the test is super interesting and like built for this normal scale.
But then like PlanetScale also has this focus on developer productivity and utility that way.
And that's where a lot of some of your interesting
innovation happens of just making that stuff easier
while still giving that rock solid reliability
from the database aspects itself.
Yeah.
Yeah, if you keep the boring stuff boring,
I think that's the actual essence of developer experience, which is if the database is like really scary, and we come to companies
where they haven't done a schema change for months, because of the last time it happened.
I remember at GitHub, the users table was un-migrate, like you just couldn't do a browser
migration against it without taking down the website because you know, everything joined
on it was, you know, before we started functionally partitioning out. And it just slows development down. So we
just kind of created other tables to join on because it's the only thing to do. And
we run into companies like they're in this situation all of the time. And it's nice to
go in and get them kind of developing faster and shipping again. Again, it just comes down
to when the boring stuff stays boring, or you can make it as boring as possible, and
you embrace the eventuality of failure, you can then do some fun stuff around it and make
it usable and kind of highly dynamic.
Yep.
That's awesome.
Well, like sort of on that note, like I want to talk about a lot of database stuff and
sort of current events and what's happening and get your take on the database space.
But let's start off with PlanetScale Metal.
So you all have this new announcement today.
Tell me about Metal and I have like a bunch
of follow-up questions on how this is all working.
Yeah, so Metal is like a fundamental step change
in performance and databases in the cloud
and the cost profile associated with them.
And I can unpack that a little bit.
You know, we've always we always have run an AWS and Google, we have to write like, you know, it's really where the only serious cloud customers are if you if you can't be one hop from the app layer, like, it's just not. We run inside those clouds and we also run inside VPCs inside people's accounts.
So it's a managed service.
Nobody has to configure or use it.
We wake up, we take the pager if anything goes wrong, but it still lives within someone's
cloud account.
It runs on EC2 and it means you get this highly scalable database inside your existing infrastructure.
And the thing that makes Metal really revolutionary is database companies are either doing one of the two.
They're either hosting in like an Equinix kind of self-hosted situation where they have their own hardware that they rack.
And that's kind of a few database companies have started doing this now.
It's really cool because you get great performance. You get like NVMe straight to the CPU through the motherboard and PCI buses and whatnot.
So it's awesome.
However, you're in your own data center.
It's you kind of you're not your multiple hops from where Lambda is and S3 is, no one should be building
their own Lambda or S3, right? Like it's just, you shouldn't. We use those essential services
for the things we need to make boring. Like you can guess which service we put our backups on.
Right? I imagine it's going to be S3. Yep. Oh yeah, of course. Absolutely. Yep. Yep. And we,
we actually, we're gonna blog about this
and we have our new backup system is incredibly cool.
Like, you know, paralyzed backups,
we can restore at line rate of the machine.
It's very, very cool.
Anyway, I could go on forever about that stuff.
Anyway, so you're either like out in your own data center.
Yeah.
And let me stop you there.
I didn't know people were doing it.
Like which database companies are running in their own?
Prisma just announced a product where they're running.
And they're doing some really cool stuff
with like micro kernels
and very fast booting performance postgres,
which is really, really cool.
Issue being it's gonna be hops away from applications,
which is gonna cause latency,
but I think for sort of certain workloads,
that could be fine.
Or you're running in the cloud,
and this is where the major problem comes from,
is when you run in the cloud, everything's true, like ephemeral, unless you're paying for it not
to be ephemeral, and you're buying convenience from the cloud. So if you want a bare metal server
in AWS, they exist. They're very cheap, very, very cheap, and extremely fast. However, if you
terminate one, it's gone forever. Back in the data center, if you screw up, you have a raid controller.
You have raid.
You can physically get hands-on.
Like a GitHub, we ran in the data center.
Although we wanted to be highly available, we had the safety of knowing that truly, truly,
if something goes wrong, someone can pull out disks, put them into another blade and we're able to rebuild
and get that data back even if it's a nasty downtime. And so there's these beautiful ephemeral
machines inside the cloud that have incredible performance profiles and no one uses them for
databases. In fact, we told Amazon we were doing this, and it was just utter disbelief that we would
run databases on that.
There's a reason we can do this in a way that no one else can, and it's that Vitesse and
our core technologies go on board, which is the Google predecessor to Google Oddities.
It's fully ephemeral.
There's just no assumption that you'll ever see the drives or the disks again. So Vitesse is durable and high-performing on ephemeral nodes and is extremely resilient,
meaning we can basically buy the same servers Amazon buys for their services
and build a full software stack. They give us like, it is file cracker, I think, yeah,
but just a light kind of, you know,
which really is for their convenience too,
just to give you this like base operating system image
and everything else on top is built by us,
which is highly, highly unusual.
Everyone else normally has to do the separation
of storage and compute and use EBS
or their own like separated storage layer, which is fundamentally slow.
And that's actually where the big step change comes with Metal, which is we're going back
to how computers are meant to work, which is IOPS happen inside the machine rather than...
We can really dig into why separating storage and compute is slow
and what it's good for.
Like it's great for a certain type of database workload,
not the ones that we care about.
But yeah, so Planescale Meadow, it's running,
it's extremely fast.
We've announced it as having unlimited IO
because you literally cannot exhaust the IO on these boxes.
You run out of CPU away, way sooner.
In the announcement, you've seen just incredible graphs,
just customer after customer after customer,
just P99 just falls off a cliff,
and they save a ton of money.
It's a big win.
Yeah, okay.
Okay, so I wanna back up a little bit.
So you talk about separation of storage and compute,
and it's like at a different layer
than like what we usually talk about, I feel like, with databases and separations of storage and compute. And it's like at a different layer than like what we usually talk about, I feel like with databases and separations of
storage and compute.
Cause I think of that like Aurora or a neon or something
like that, where you like, it's like different services sort
of running in different places.
But this is actually saying like, you know,
usually when you're spinning up a database RDS or something
like that, you also have this attached EBS volume and,
and data is traveling like over the network to that EBS.
And like you're saying, that adds latency itself
and things like that.
Whereas with PlanetScale Metal, you're spinning up an EC2
instance that has the NBME SSDs attached to them locally.
It's the instance-based storage.
And you're reading and writing to that
rather than over the network to EBS.
Correct.
Correct.
And both forms of separation and storage in compute is slow for OLTP. and you're reading and writing to that rather than over the network to EBS. Correct, correct.
And both forms of separation, storage and compute
is slow for OLTP.
Like just leaving the server to do a page read is slow.
Aurora and Neon, they do it by separating
kind of the query engine, right?
That takes the query and then they do IO
to their own storage layer,
which is again across the network. Aurora does it. The Aurora paper is
an exceptional paper. It's one of my favorites. I recommend everyone to read and they talk about
how they have done that. And it's very performant for the goals they had, but it's still nowhere
nearest performance in metal. We've moved to significant, this is the first,
I'll tell you the story of how we should relaunch this later
if you're interested, but interested
like the behind the scenes,
but we basically, we have not seen an Aurora workload yet
move that hasn't become faster,
which is then significantly faster as well.
And it's because we cut out
just an immense amount of variability and entropy, right?
If you leave a machine, you go out of the network stack,
even if it's like InfiniBand,
they've done all of the optimizations they can.
They use really proprietary, cool, but whatever.
You're going out of your local operating system
across a network where you're gonna hit load balancers,
top of rack switches, rebalancing jobs.
Like fundamentally, entropy has gone way up and just the databases are so incredibly chatty
with disks like block reads, but also background threads, loading pools, updating cat. There's
just millions of things happening all at once inside a database. Doing them over a network, you just pay a constant latency penalty
that is extremely bad for LTP workloads specifically.
Yeah. Yeah. Interesting.
And so you mentioned like that split between like, you know,
the people that are running Equinox data centers versus referring on cloud,
I guess, like for for enterprises that are moving from having their own data center somewhere else to
AWS, are they often seeing like, hey, pretty significant latency hits just because of now
they got the EBS tax sort of? Massively, like massively. And it's not just latency as cost as well. It's just extremely expensive to run.
We had a customer that did basically that and saw,
first of all, a huge latency hit for their application.
They also had to then move to I02 volumes
to get over the lack of reliability for GP3.
And the 16,001 I upon the I02 is like $2,000 more than the previous one.
It's just unbelievably expensive and very failure prone.
We had an AWS issue where we saw multiple customers across multiple accounts get a blip,
but tested the right thing handled throughout the nodes that had access to those volumes.
But because we've got about half a million EBS volumes provisioned at any one point,
we could see across multiple customers some issue that we eventually tracked down to being a top of
rack issue for EBS. And so not only are we dealing with these strange behavior patterns that don't exist in the data center failures become partial dead node is extremely easy to recover you just like don't you throw away and you get anyone actually a planet scale in our fundamental part of our shared nothing architecture is.
Everything to fix a shadow fix a cluster should always converge and just keep being
able to kill nodes. Like we don't try and kind of really get too introspective of what's going on
with the node because of the way our architecture is. Just kill it and then the test will always do
the right thing to converge back to a sane and highly available cluster state. But when nodes
kind of slow down and and if your readers really
want to entertain themselves, go and read the SLA docs VBS, which only guarantees acceptable
performance for 90% of the day. So when nodes start to slow down, databases do really weird
things which cascades up the stack and becomes really painful. It's just nice to have the node
just fully die. These metal nodes we, you know, these metal nodes
or data center nodes usually just in a very kind
of binary fashion just hit the floor very quickly.
And that's great.
Yeah. Yeah.
And so tell me about that
because you talk about killing the node
but also like you don't have that background of EBS
like where that volume is still available.
So I guess what are you doing to ensure, right?
How does the test handle just durability
where you can shoot that node?
It has the instance test storage.
That storage is now gone.
Like, what do I have here?
Yeah, so we basically always make sure
there's three replicas for every shard,
always available to take writes
and we do semi-sync replication.
So when we acknowledge a write,
it has made it to another box.
It's not a same as like these kind of
full quorum write systems that are also again, very slow.
It's just enough to make sure that you get
data off of the box and that it's safe
and has made it to at least one node.
And then there's just
millions of lines of code in Vitesse that make orchestration incredibly easy and seamless to bring nodes back. And so that's the actual only real trade-off of moving away from EBS is
EBS makes it really nice to just detach a volley, spin up a new pod and like just
reattach. So the way we do upgrades and we kill every single node at pilot scale,
the longest a node can live at pilot scale is 29 days, and we do that by
getting rid of the pod, bringing a new one, reattach the volume. Now we
have to bring up a new node, restore from backup,
bring it back into the replication pool.
That is still a very quick process
because of these machines.
And because of our new backup system,
which does full parallelization,
we use the full NIC, saturate the entire NIC of the box,
bring it back online, and then we're rolling.
So it's very mature and it fails in the traditional way
all these databases should fail.
It's very easy to reason around. And because this code path has run hundreds of millions of times,
probably at scale. Yeah, it's very mature. I should also mention that today, to date,
Metal has been online, you know, and you can see in our post that companies like Block for
Cash App, Intercom, and has served around 5 trillion queries across the 5 petabytes of data,
which for relational databases is obviously a very large amount of data. So it's already
well-worn battle tested. Yeah, and we're actually upping our SLA now.
We rely on, you know, our committed contractual SLA now
has gone up to four nines, which exceeds
all of our competitors.
And then for multi-region, we'll now commit to five nines.
And that's because we're not relying on code
we have not written basically to provide the critical path.
Yep, yep. Okay.
Wow, that's super interesting.
So basically, as I understand it,
like a lot of people use EBS because of that durability, like EBS is going to replicate, of the critical path. Yep, yep, okay. Wow, that's super interesting. So basically as I understand it,
a lot of people use EBS because of that durability.
EBS is gonna replicate, I believe,
a second time within the same AZ or something like that.
But you are already replicating
to at least one other box somewhere.
So it's like, hey, we don't need
that EBS durability quite as much.
We have our own durability there.
One of my questions was,
you talked about bringing up a new node. If I have
a 3 terabyte shard and one of those replicas fails, how long does that take to bring up a 3 terabyte
node or something like that? What does that look like?
It's as fast as, like I said, we go across the neck. I can't do the math in my head.
It also depends how many changes have happened in the time from
the backup to...
Gotcha. So it's like backup plus change along type stuff. Yeah.
To catch up. But it's very quick. It's like, it's acceptably quick considering it, you're
not relying on it to bring you back online. Like you've, this is a scenario. We did this modeling, which is based on all of our years and years of failure rates and the bugs we've encountered that we'd even just shut a shard down, let alone
ever risk data.
So it's just an incredibly low chance that this could ever happen.
And even then, we still stream the logs elsewhere anyway, so we can always recover you back.
It's essentially…
The reason it's revolutionary is because you previously had to make a flexibility trade-off.
You either had flexibility thanks to EBS because you couldn't trust your own.
Most database companies or startups didn't have Google building your tech for YouTube
like the test was.
That's like 100 million R&D straight there.
Supporting one of the largest websites. like for test was, so that's like 100 more of R&D straight there.
Supporting one of the largest websites, we've added another 100 million on top in terms of our own R&D.
So now we have something extremely mature.
If you're starting a database startup from day one, you pick your battles, right?
You're going to use EBS or you're going to use your own kind of distributed
file system to get yourself that flexibility back and enable certain things.
The cool thing about Metal is we really don't make you make a flexibility trade-off. It feels exactly the same,
but you get data center performance right inside the cloud. It's very, very special in that regard.
Yeah. And so will this be the recommended setup for all plan scale users going forward or is it more
a certain class?
I know all the examples you have are huge users.
Is it more for them?
Is it great for everyone?
How does that break down?
Yeah, that's a really great question.
The general rule so far, and one of our companies we've moved that should have a video going live today, they are relatively
small.
They're not sharded, but they're seeing some success.
They're small in terms of the planet scale corpus of databases, but still very firmly
a small startup. They have three employees.
Metal has immediately impacted them. It's had a really positive impact. If you are tiny,
just doing a tiny volume of mostly reads, we're going to put you on our lowest tier.
Until we start chopping up Metal nodes for multi-tenant, it's like smaller PS10 type nodes,
you're gonna be on EBS.
And that's fine for most, for a lot of people
at very, very low volumes.
By the time you start to get to spending five, 600 a month,
metal starts to take over as being the option.
Basically, nearly everyone that comes through
like our contact us form as a sales served customer,
we just put them straight on metal
because it's gonna be cheaper and it's gonna be faster
than pretty much anything they're running on.
And the general rule, if you're running on Aurora or Dynamo
and spending more than around a thousand a month,
you're probably gonna save money
and get faster by moving to
power scale.
Unless you've got some really weird like, well, we just barely do any CPU but need to
store 30 terabytes of data, you can like put a Pico node, attach it to a giant EBS volume.
That's very cool.
It's just extremely niche.
And so most people doing anything serious, spending about a grand amount of money on Aurora, they're probably going to almost definitely get faster
and probably going to save some cash too. Gotcha. Gotcha. And are there any other,
I guess, changes or requirements or anything to my topology or in settings? You all do semi-sync
replication always anyway. Like, do you have
even the option to do like full async with the test? Or do you do... Yeah.
No. MySQL doesn't support fully asynchronous replication. And even if it did, we would never
turn it on. It's been completely impractical at scale. Like, it just doesn't. It makes everything
slow. And it's just not necessary. We all kind of live on the spectrum from blockchain to...
I could be mean and say Mongo, but they have fixed all those problems. But you know what I mean,
blockchain to Redis or Memcached, you pick your trade-offs. I think Postgres and MySQL have got
it exactly right in the middle, which is it's durable. You're not going to lose data, but it's
also just not making crazy consistency guarantees
that become very slow and extremely hard to debug.
Yep, gotcha.
And also, what you were saying earlier about the modeling
and how often failures happen in recovery,
it's not like you need to put more replicas in a group
or something like that if you're going to be on metal.
It's still going to be three replicas, and that'll work.
There's no less durable way to run planet scale.
We will not let you run in a...
I actually get a lot of requests for this
and maybe, maybe, maybe one day for the tiniest,
but people want single node planet scale
and we just don't ship that.
That's the cool thing is,
and that's the whole promise of what we do,
is we know this stuff inside out.
We're a very small company that has 660 years of combined experience running databases at scale.
We default in the right things.
And, you know, sometimes we run across people on Aurora or RDS and we have to tell them, you know, you are a node failure from data loss.
Do you know this? And they're like, no.
We just click the, it's very expensive.
And we click the buttons.
And we assumed that we were being protected.
It's not the case.
PlanScale makes sure that you are in a highly available state
and that it does the right thing pretty much all of the time.
Yeah.
OK, one thing I want to go back to, you talked about IOPs,
the provisioned IOPs, being expensive.
And some of the charts I saw, I was
surprised to see how much the cost of running a database
was the provisioned IOPs, not the compute,
not even the storage itself of EBS,
but just the provisioned IOPs.
I was shocked.
Is that pretty common?
Is that one of those weird spaces
that cost a ton of money and a lot of people don't talk about?
Or what's going on there?
Yes, we see this a lot and it's just disgustingly expensive to run these, you know, 16,000 ops is not even that much.
But, you know, that's all you're getting before you start paying extreme amounts for anything above that.
We're talking about each node, each metal node being able to do from starting around 250,000 IOPS up to millions for specific node types.
We have certain clusters now that are provisioned with 40 million IOPS across the node pool.
Like it's just unbelievable.
We have one customer that they're going to write a blog post soon. They save 70% on their bill and sell like 100x the amount of IOPS available to them
to read and write from.
We had a customer have a large one have an outage of a service that isn't backed by Planet
Scale, but when it came back on, it hit the database that does run on planet scale. And it hit us with an additional unexpected 700,000
additional IOPS and we just tanked it. They didn't even like, no scaling up, nothing.
Just like metal just ate it. And they just jumped in the chat and were like, did you
see that? And we didn't. We didn't notice, I mean. And they were like, yeah, you just, you just, we just hit you with like
an immense amount more traffic. To put that 700,000 apps in
perspective. That would be think of like a bunch of these small
database companies and startups. That's probably more QPS than
every single one combined in in one burst. It's pretty fun.
It's fun to see this stuff really is.
And just to put that amount of unbelievable horsepower behind a single connection string.
Yeah.
Yeah.
Whenever I talk to you, just like the numbers that you can say, it's pretty fun to hear
all these.
Okay.
So, Metal available today.
I can go sign up and get this if I want it.
Yeah, you can get Kraken.
It's GA, fully available.
It's been running on, like we said,
extremely large databases with loved brands
that make a significant amount of money from their software
and be very upset if it was down.
So, it's more than appropriate and available
for anyone who wants to go out there and use it.
Cool.
Can I switch over if I have an existing cluster pretty easily?
Yeah, it's a fully online operation. You can just go into your cluster configuration,
select the metal class of nodes, and it will just happen online. The first time it does it,
it takes a little while because we're going to restore you from a backup rather than do the EBS
times. And I had in the room I'm in now, I had one of those startups that, you know, we've just been kind of trickling people in to see if they want to get involved with the launch.
And you'll have seen them all today, how exciting it is.
But just to see their faces when they looked at their graphs and just drops off a cliff.
It's just so funny. Like people just laughing. People just bursting out laughing. And like we had one company, their product team pinged and was like,
the product's like so much faster.
And they just took a load of performance work
off of the roadmap.
They were just like, we just don't need to do this now.
We've had certain customers of ours
have their biggest customers notice
and thank them for fixing some of that.
Because your P99 is your best customers
having a terrible time.
Like that's the funniest thing is like the people
that use you the most get the worst experience
and it's cause of your P99 sucks.
This is a P99 killer.
Interesting gap.
Do you see any like, do you see performance impact
at like P50 and stuff like that too?
Or is it mostly at that like topic?
Okay.
Oh, absolutely.
The latency of pretty much every query goes down
unless it's already cached in the buffer pool,
pretty much everywhere. The P99 is just the funniest cause it's already cached in a buffer pool, pretty much everywhere.
The P99 is just the funniest,
because it's the most extreme.
And the other thing to like,
for those that are going looking at these graphs now,
if you go and look at the announcements on our blog,
they'll link out to a little bit of other things.
The other thing to have noticed,
and in a post I'm gonna do in a couple of days time
about the separation of storage and compute,
and why it's wrong for OLTP,
you'll also notice that the consistency is,
there's a much tighter band of variance for performance,
which again is extremely important
because a spike in database performance
can exhaust a front-end tier extremely quickly
from waiting, oh, by the way,
we've also had people turn down front-end capacity
because they're just literally not waiting
for the database as long.
So it just releases pressure around the system in loads of interest.
Like ETL jobs take less time.
All of these things just run the database pretty much as fast as you can make it.
All of these nice extra things happen.
But yeah, like a big spike in P99 means you're holding like front end workers, you know, that's
then piling up user requests in a queue that might trip some
other threshold elsewhere. And now you're like cascading having
an issue. If you keep this really narrow, like performance
kind of range, you get really predictable results and you can
do more with your database and build an application
that performs significantly better.
Yeah, yeah.
What about like looking at other database providers
and things like that?
Like, is this gonna be hard slash impossible
for other ones to do for most of them?
I guess like, yeah, what would that look like
for your standard Postgres provider to do that?
They're just not gonna have like sort of the automation
to recover from this stuff.
I assume they are gonna work on this.
I assume everyone is gonna work to try and do this.
The fact no one has done it yet
tells you how difficult it is.
You really, really have to be very careful.
Like I'll put it this way,
if people start just flipping this out there,
you know, in the next five
to six months, I would be very cautious
of putting my data on it.
The database market is brutal.
The funding environment for database companies is brutal.
Companies are now reacting really quickly.
We saw one recently that shipped a new version
of their product
and didn't do backups and lost customer data. This is all a reaction to how jumpy the bottom end of
the database market is. This kind of like net new postgres end of the world where you're really
just competing for like glamour and how pretty your website is. If these companies just react quickly and start doing this stuff, I'd be very, very
wary because we knew it works on ephemeral storage.
It's proven running the second largest website on planet Earth, yet we spent four years before
we were ready to do it because you just have to be mature.
If you were like, as the anesthesia is kicking in in the surgery,
if you heard the surgeon go,
oh, I hope this goes well, it's my first time,
terrifying, right?
We at least are coming at this from, we're as mature as Amazon.
You can trust Amazon, you're, we're as mature as Amazon.
You can trust Amazon.
You're not getting fired if you trust Amazon.
You'll see from some of these blog posts, we've exceeded the reliability.
Intercom did a post previously about them moving away from Aurora, and they did it publicly
just to tell their users that we're getting out of this nightmare of downtime that Aurora
is causing for us.
We at least know we're more mature than that.
We know that we have these customers running on it.
If I was a little database startup,
this net building, like net new infrastructure,
you know, it's very risky and scary to do this.
Are you surprised that like already the RDS team
hasn't done this already?
They have ways of doing this if you use their DRDB cluster
replication, which is slow and has an incredible amount
of trade-offs.
Aurora runs on these machines, but does the network file
system on top.
And it enables really cool stuff.
There's things Aurora does that we don't do.
I just don't think they're important.
You can instantly add replicas to Aurora.
That's great.
Okay, cool.
But every single query, like at the P99, being twice as slow for the flexibility.
And I think actually that's, let me say nice things about separation, storage, and compute,
so I don't seem just like crazy biased.
It's awesome for flexibility.
It's just really awesome.
You can scale to zero. Not that anyone with a serious business needs to do that, but it's awesome for flexibility. Like it's just really awesome. You can scale to zero.
Not that anyone in the serious business needs to do that,
but it's doable.
You can expand storage continually
without having to upgrade nodes.
Like if you scale up to the next tier of metal,
we roll through your nodes.
Fine, right?
You know, you can attach replicas instantly.
Aurora doesn't get replication delay.
If you do a crazy amount of silly stuff to your part-scale database,
you may delay your replicas.
It's a lot harder to do with part-scale metal, that's for sure.
But you could do that.
That's possible.
Again, this is cool that you can do these flexible things.
None of it worth every single query you do being slower.
It's just not worth it.
It's the same way people use some databases
that auto shard and we make you explicitly shard.
Well, if you explicitly shard up front,
your data doesn't get randomly allocated around nodes
and every query is faster.
It's just this kind of like, nothing's,
if there was a magic way,
we're still just dealing with physics.
If you could talk over a network and scale to zero
and do all the things that these distributed file systems do
and be as fast as physical NBME, you would do that.
You just can't.
And our kind of operational superiority, I guess,
has led us to being able to provide you this level
of performance and speed
inside the cloud. And that's kind of the big, the big unlock. So I'm not going to say that these
things aren't bad, not wonderful feats of engineering, like I said, I love the Aurora paper,
there's tons of flexibility that can be had with these systems. They're just not worth it at
pretty much any scale. Yeah, yeah. Okay, last thing on metal before we move on, I guess, like,
what's the story and timeline
of when you start thinking, hey, this could be a good idea, this might be worth it.
And we think we could pull it off.
Like, I mean, you said something about four years ago, is it that long of you sort of
been thinking about this or what did that?
Okay.
Four year journey has just been like a long maturity journey.
Right.
We knew we had Vitesse. I always say that the true kind of secret
source value of PlanetScale is two things. Well, three actually. It's Vitesse, which
is – I don't need to list all the companies that run on Vitesse. We know this. It's the
top end of the internet. It's our operator.
So we run extreme amounts of state at Kubernetes.
Go to KubeCon's really funny, by the way, because some people don't actually realize
that.
It's like a topic.
It's like a contrabot.
Should you run state on Kubernetes?
Well, that's shipper sale because there's petabytes, tens of hundreds of petabytes running
on Kubernetes via the test.
And then it's our operational expertise, which is if you go on the
core about page, it's all of the companies we've worked at.
You know, the first infrastructure engineer employee eight at Instagram
works here, the team that built Clover and earlier at Square and earlier
at GitHub and you know, just YouTube and everywhere has just worked on this stuff.
So you take four years to like, which is nothing in
the database world, absolutely nothing to kind of converge into a state that we're really happy
while ramping up these gigantic databases for these gigantic companies. And so there was so much
else to do. And maturing everything else that goes around the database to make a really mature kind of service.
We're not just building a kind of storage engine that people install and run themselves. It's a
full service. You pay us for uptime and we take that trust very seriously. We started Metal in
anger about eight months ago knowing we could do it. We just knew that we were at the level of maturity,
we could model it well, and we were just in a time of complete autonomy to go and build this and put
our entire engineering effort towards it. So we did. We actually were ready for reInvent to go in the beta, but we spoke to our customers
and every single one of them wanted it, so we made the agreement. Well, we'll skip reInvent. We won't
go and hide it up at reInvent. We'll move you, and then you cannot announce it with us. That's why
this launch is essentially two paragraphs from me and then a link to our customers.
Because every other launch in tech is boring.
It's like the headline is,
company thinks their own product is good,
a ton of animations and renders and my browser's crashed.
It's that.
And it's just like, oh, cool, whatever.
What are you actually doing?
For us, it was like, who gives a shit about the database? It's a database. whatever, like what are you actually doing?
For us, it was like, who gives a shit about the database?
It's a database.
People are building cool companies on top of this stuff.
Drive to work every day, see the billboards of three of our customers.
Make them front of center.
If you're doing something consequential, it shouldn't be hard to find well-known brands
that are using your stuff.
So we did that.
We made the launch this way.
So it's been around for quite a while now, but in prod and working really, really well
at scale.
And that's why the launch is kind of the way it is.
It's just a very simple, it's our customers.
Just don't trust us, trust them.
It is a great launch.
I love use case studies and things like that,
especially the ones with the charts that are in some of these.
They're good ones, for sure.
So yeah, that's exciting.
I love it.
Cool.
All right.
I want to switch gears a little bit and talk about sort of just
catching up on planet scale in the last year and a half
and things like that.
So one thing I want to talk about is,
and you sort of mentioned this earlier
about all these sort of database companies
like fighting over the same like glamour customers
and seeming cool.
Whereas like a year ago, you all were like,
hey, we're gonna focus on profitability.
We're getting rid of the hobby plan.
Also change like the design of your website,
which Holly has some good stuff on.
Like tell me about that whole thing
and how like looking back a year,
how that's been like, did it go as you expected?
Like what do you think about that? I
did not
enjoy
Letting go of a significant chunk of the company and it was terrible experience for them
No one should forgive me for that or have any sympathy
That's what you do in leadership and it was way more painful for the people that were let go and who stayed right?
It was a scary time.
Being a profitable company is the most incredible advantage, strategic advantage.
Even though it was a horrible moment, it was one of the best decisions we've made because
it gave us, it changed our future. It meant we could do things like
this. If you're on an 18-month runway, you have to just ship some junk to get out there
to get the hype cycles going, to build pipelines so your sales team has something to sell because
if VCs don't like what your last six months of growth, you're screwed. We were like, okay,
we're going to take our time. We're going to roll it out with're screwed. We were like okay we're gonna take our time we're gonna rock like we're gonna roll out with giant customers we're gonna test it.
So thoroughly because what would it what would a safe database company do by a company that's safe from.
The kind of ticking doomsday clock of the cash out date right for those who don't run startups here, if the company is not profitable, the CFO, the COO,
and the CEO, and a ton of the rest of the company know the day where you run out of money.
Having that in your head is really, really scary and sucks for your customers. It just really sucks. With the brands that
run on us, even our business practices have to be safe and durable. We've now built a
durable business. Our numbers are healthy. We grow. We ended up with millions of dollars
more in our bank out at the end
of last year than we actually expected we were going to have. And that's money. Like
I'm sat in our brand new office. It's beautiful. It wasn't just like blowing VC dollars up
the wall. It was paid for with money we earned. And there's more coming in every time someone
pays a bill. You know why our support is just the best out there.
If you write into our support team, shout out to Omar and Jay Greit, they write back
an incredible reply.
Do you know why?
It's not a waste of time because you're paying us money.
You're a customer.
It's EV positive for us.
It's the best thing.
And so I miss the people that aren't here anymore. And it was a year last week actually.
And I thought a lot about it and it was, it was really sad and it was things I
would have definitely done differently.
But look, when you look customers in the eye and say, we're going to, we're here.
We're going to be around, right?
We, you know, and then, and then being able to use all of that optionality,
like I think 50% of the external behaviors or weirdness
you see from startups is just the fact
that they're dying slowly.
And they're constantly.
Just grasping at straws.
Grasping at straws, just desperate.
Like I said, we talked about that database company
shipping without backups.
No one can tell, I don't think they tell like, I don't think
that they're idiots. I think I don't think anyone would do that
unless they're under some form of purest, like fear of another
startup that's just got funding or another hype cycle. And if
you're on that train, it's painful and you don't want your
database provider going through that. We're not and it's been
very special and it means we have the optionality to to do that the We're not. And it's been very special. And it means we have the optionality to
do the things we do. Yeah. Do you think you'll ever raise again, like, for anything? Or are you
just like, Hey, no, we're going to be profitable for the rest of time and go from there? Or is
that too hard to say? You never say never, right? You know, there's ways in which you could raise
that mean you don't risk profitability, right? Like you can invest's ways in which you could raise that mean you don't risk profitability,
right? Like you can invest that money in net new things and know that, you know, if those things
didn't work, then you can reallocate the spending, you can do various things. I think it's more about
being stable, reliable, and knowing you can always get back to a really, really healthy runway.
You don't need infinite life.
You need enough to get to a stage where the business just healthily prints money and your
customer base are always supported and enabled. So it may happen, it may happen,
but it'll be done in a way that doesn't risk the durability of the
business. And we hold on to that. So so dearly. We also have things
like, you know, we have secured lines of credit, for example,
right, that just, we don't take the option, we're never going to
likely take the option, but you have them because just in case things happen.
And by the way, the amount of banks and like,
like, finance firms that just want to work with us because they see the health of our businesses,
they see our customers, they see who's coming on board, their own customers telling them about us.
It's really just about having a very healthy financial picture.
And all of these database startups that are raising these hype rounds, they're doing it
because their customers, there are negative gross margins.
Even their paying customers cost them money.
Growth is like, it's kind of a toxicity for them, but in the sense that it means something
good, but it doesn't really answer the fundamental question of how you invert that into profitability.
And so they raise money to stay alive.
We raise money to get bigger, to grow more,
knowing that the core is kind of durable and protected.
So, and tell me about getting rid of the hobby plan.
I know there was criticism of that and like,
hey, you're killing the low end of the market
or all the people are gonna grow with you.
I guess, what has it been like getting rid of the hobby plan
and what have you noticed business wise?
First of all, you learn how creatively people can insult you on Twitter.
Pretty funny.
Most of them actually, even the bad ones, but in seriousness, you know, I get about
email every day about what people wanting a free version of, of, of cloud at scale.
Still.
Okay.
Yeah.
Yeah.
And, and I think it's extremely unlikely we'd ever do it again.
And I'll just I'll tell you why it's because the conversion
rates are horrible. They're just absolutely horrible. And, you
know, my prediction is over the next two, three years, you're
just going to see bargain like sort of bargain-based acquisitions of
database or any infrastructure companies that run much infrastructure for you
because they can't outgrow the overwhelming need for the free tier, right?
You look at so many of these companies providing infrastructure for free and it's really like 99.9%
is students that just learn and want to abandon,
you know, like everything's a power law. And we saw this at GitHub, just the absolute extreme
power law of repos that get even one commit a year. It's just, it's unreal. And then the rest
is just storage. And it's so much worse when you're running infrastructure and running databases.
Because there's a real cost. yeah, to all that stuff.
It's a real cost. It truly is a real cost. You can scale zero, you can do all this stuff.
It's an optimization, but it's not an optimization that's indicative or useful at any form of scale.
It just makes your free tier cheaper, which is I think it's a real thing if that's your market,
right? But I love hobbyists. I don't want people to think I don't like hobbyists.
your market, right? But I love hobbyists. I don't want people to think I don't like hobbyists. So, you know, I obviously worked at a company that gave me so much in terms of career,
learnings and love and happiness, and that was centered around the fact that so many new people
wake up and want to become software engineers and develop and learn. That said, I want to build something enduring that matters and backs the incredible
products that are being built in tech. And to do that, you need to have a sustainable
company. And a Fortune 500 company that may be a plan scale customer does not show up
with the inability to pay $39 for a month of a test database. And if a big Fortune 500 company
emails us and says, hey, we're interested in becoming a customer, we let them try for free anyway.
It doesn't change anything really. It just means that you don't have this incredibly aggressive
burn. And it means that you're competing on real value and real features. There's so many companies
and you see people talking to us on Twitter, they just love paying for a thing that really fuels their business
and they know they get great support and they know what they just value. It just feels like
it's such a little bar to clear for people that makes them really treasure what they're
purchasing and appreciate it.
And then we just, at the bottom end,
it's not like we make loads of money
from you giving us $39.
You know, it gets easier the more we have, right?
Like you can just pick a commitment,
look at the house and do all this kind of stuff.
But it means that it doesn't hurt our runway
or just overall functioning of the business
and need more head count
and do all this sort
of stuff to manage.
And it just converted terribly and it converts terribly everywhere else.
I know this.
I know a lot of people and a lot of people in tech and I talk to people.
It just doesn't change anything.
The other way is we stopped doing these kind of partnerships with other vendors either
because they want everyone's
free to begin with.
And at the end of the day, when you
start to hit actual database problems, they phone us anyway.
So it didn't really make any changes.
Yeah.
Yeah, and you're talking about the additional runway,
or just the cost of it.
But it's like the company cost, too, of you're talking about,
you provide really great support in detailed stuff.
And it just like, if you have a ton of free
to your customers, you're just overwhelmed
with sort of the support stuff.
It's harder to provide that really great support
to your actual paying customers on that sort of thing.
It also just like changes your company focus, I think.
If you're just like,
how many free to your customers can we get?
It changes the features you are getting.
It changes like what people within the company care about. Like I've seen that sort of like eat up a
company, I think. Oh, sure. Think about that stuff, you know. And then when 99% of your users are only
that audience, you're facing, you're just going to destroy yourselves in front of them by taking it
away. You just don't have a choice. So now you're just, it's just this kind of cancerous thing that
eats away at your bank balance. And yeah, you know, the support
requests when we had a free tier were unbelievable. Just, I still remember that
someone wrote in and they just pasted from some log, the SQL and they're like,
what does this error mean? And they said, that's this, that's what your IRM is.
That's the SQL your IRM is actually sending us. They just didn't know.
And it's fine. We were all new. I'm really bad at so many things in tech that I do equally dumb things. It's just not like I'm here to build a business, build something for a company.
I'm just not here to solve those problems. It's just not... And it's just that side. I see
those problems. It's just not, it's just, it's not, and it's just that side.
Like I see people, you know, people want to put their growth charts, whatever,
put revenue charts, if you really want to brag, otherwise just leave it alone.
Like if the, if the, if the goal is burn tens of millions of dollars, giving $3 away for $1, I still think I can do better at that, but I just don't want to.
You know, it's just not, I wanna build a business
and that is a next level set of requirements
that does not involve
cheapening what you build to the point where,
like you said, it becomes all you build.
Yeah, yeah.
It's tough with that, like the juggle the script ecosystem
on Twitter and just like that whole thing.
It just like, it can really,
it can feel like you're making awesome progress and stuff.
But it's like, are you making progress
towards the things you want?
Which like you're saying,
it's like making a sustainable business over time.
It can be deceptive in terms of like,
yeah, what you're actually doing.
Yeah.
GitHub stars, I can tell you,
as someone who's there for eight years,
they don't mean shit and the 99% of them are spam.
You can just buy them.
You know what I mean?
Like you just, just like you can buy Twitter followers.
What does it mean? Like it's quite cheap to hit a star. Doesn't mean
an enterprise contracts coming from that. You know, I just, it's, it's all mirages and
smoke and mirrors. I think it's getting worse where you see these kind of like people, you
know, tweeting a picture of a mattress on the floor and a $5,000 a month San Francisco apartment,
like living the kind of grind set or whatever. Just charge more for
something, you know, charge an amount for something people
value that is above your costs. You know, then we'll then we can
talk about it. Yeah, this is funny. Your JavaScript Twitter
is interesting. And I sometimes get into debates with people.
And the overall thing that I've realized with the Hacker News and the Twitter crowd is
every tech works when it's at a certain scale.
Like you can do SQLite on the server if you want.
You can do anything, you can do flat file.
It just doesn't matter.
Like it's no point arguing with people
that's entire world view is so small in terms of the problem set that
you actually have to solve at scale that anything works. You just can't, you can't, you know,
when you see Facebook scale, you see they've got hundreds of lawyers that just do power
and light contracts for data centers, right? You know, and teams of thousands of people
that are doing, laying, you know, building data centers or the team that lays under C
cables because you need actual internet doesn't have enough capacity
for you to do internet data center traffic.
Like that's what scale is about.
That's why again, you can just shave so much money
out of the cost of a customer doing the things
like we've done with Meadow.
That's the kind of problems I want to work on
and using the opinions and getting into the fray
of a disguise, because those people don't go on Twitter to talk about what they're doing.
They go and meet a private user group.
Because they exist, by the way.
If you're not in them, you're not doing that stuff, right?
There was a DBA group that used to meet from 2013 to 2016, which was just the top probably
hundred logos on the internet.
We always get together at services, go and share actual problems and bitch about MySQL. that used to meet from like 2013 to 2016, which was just the top probably 100 logos on the internet
where you always get together at some of these,
go and share like actual problems and bitch about MySQL.
Or you can just go and listen to the opinions
of a lot of junior developers
that just want to be out all day.
This is, it doesn't, it's like winning in that crowd
is not indicative of any future success.
Yeah, exactly.
I still love you all though, you're still fun.
I mean, we can all have some fun.
I like the JavaScript Twitter community,
but yes, you gotta make sure you know
which opinions you're taking and which ones you're leaving.
There's some good ones as well.
There's some really responsible voices there
that kind of, I'm old and I don't wanna be old,
but the energy of this community that's, they are going to ship their future.
They'll rewrite it to run on, you know, other things and they'll move away from
postgres that's fine, right?
Like, but at the end of the day, the next internet is going to be built by these
people and we can't become cynical to those things either.
There's so much to teach us.
And, you know, I like to think that we market with taste.
I like to think that we have a brand that is kind of cool-ish.
Um, it takes watching what younger people are doing and I find them incredibly inspiring. The
same way I could, you know, I made paper planes that no one should ask me to make an airplane.
Yeah, absolutely. Yeah, cool. I appreciate that, that like introspection and looking
back on that.
I wanna ask another one, in that sort of same area,
I don't mean to make you do all that sort of stuff,
but one feature I thought that was really cool
that you had was plan-to-scale boost,
and it was this Norea data flow type thing.
I saw that you all deprecated it.
Can you walk me through,
was it customers just didn't need it as much as you
initially thought, or I guess what's the story behind boost?
Yeah, so boost is incredibly cool tech.
Yeah, it's super cool.
Yeah.
And again, for test makes doing boost really, you know, really simple.
There is some fundamental issues though with with well one, we've realized since that just for tests is materialized views are just
so good and continually updatable that we should just ship that and we will and it'll be
significantly better than and it will fit most new use cases what we found was there were so
many like caveats to which queries that were non-deterministic that could be supported that
it became this like horrible user experience nightmare and really just let people down.
And it just wasn't our full focus.
I'm actually just kind of one of my just learnings of growing into doing the job I do is that
you can only focus on it.
So honestly, you just can't, it's just, you know, too many side projects, too many things. Like we either would put, if it comes back, it will be our full force, but it just wasn't
at the time and there was other things to be done, bringing big customers on.
And so leaving something that kind of sets people up for a crappy experience, it's not
good, add a support bed and do it or do it well.
And you know, we just decided,
rather than live with all the caveats and whatnot,
just deprecate it and say goodbye.
I think something that serves that need
will be back one day, also metal so fast
that just rip it down to the end,
who cares, just ride the lightning, it's great.
But yeah, it was just, we could have done better listening to our users, we could have done
better just, you know, taking projects on in that time. I'm sure something like that will be back
one day. Yep. Yep. Very cool. Okay. I want to switch to like the broader database space and
I've got, we've gotten a little bit around the startup-y type things.
I wanna talk about AWS, who is one of the big elephants
in the room for databases, right?
It's the default.
One of them, V.
Yeah, V big.
In terms of, yeah, yeah, exactly.
So there's always the standard stuff.
We got the new stuff too at reInvent.
So reInvent, I think one of the talks of the town was D-SQL.
I guess what are your thoughts on D-SQL?
Embarrassing launch, truly embarrassing,
slideware, like honestly.
The engineers behind it are phenomenal engineers,
but it just speaks to the position
that large clouds are in now, which is spread so thin,
like innovators dilemma, to the extreme degree, 400 services, going down every single rabbit
hole.
The slide looked amazing.
It's cool as a spanner killer, which again, fundamentally not the right architecture for
most workloads.
Amazing tech, but just not for really anyone outside of Google to need. So then DSEQ is this reaction and
the slides just looked too good to be true. The docs, like the
next day, you know, fully highlighted that it was just not
even true, or good. So it was just a very bad launch,
honestly. I mean, I don't know if they fixed this now, but we found out the day after you can't add
an index on a table that has data.
So it's like, it's a non-production database then, right?
Or you what?
We copy our database out and do an online migration every time we need to change.
It's just botched essentially, I think as a launch, even though the engineers working
on it of extremely, extremely good engineers,
I just get the feeling it shipped a year too early as a reaction to something.
Yeah, yeah, that is interesting. I guess like on that same note of, I mean, I guess something
that's actually shipped now, Aurora Limitless is now GA, which is like probably the closest thing
to to Vitesse and Planescale, I'd say. Like, I guess what happened with Aurora Limitless? I
to Vitesse and Planescale, I'd say. I guess, what happened with the limitless? I don't know.
It's been a gift. It's been such a nice thing for us because one of our customers was like, we really enjoyed finding its limits in production.
Oh, really? Okay.
Yeah. It was like it has limits. I mean, it shipped again without the ability to do read replicas.
What is that? This is what happens when you just become massive and PM. I think they fixed that now, but I believe in a bunch of scenarios.
That's the thing, right?
Eventually you just become a massive company
with product managers everywhere just ticking boxes.
And it never really completes the job to be done, right?
And because we've run our stuff on behalf of our customers,
it kind of changes how we build software, right?
It's very practical. We're still waking ourselves up if it goes wrong, right? And yeah, I think it showed
people that the model is right. It's just not executed in a way that is viable. And so we have
a bunch of customers paying us a lot of money that tried it. They
had the pitch. The pitch was great. And they came to us for the delivery.
Yeah. Yeah. Yeah. And it's also just interesting, I think, how they've confused that Aurora
product line, I think, by naming all these things Aurora, but they're like, very different.
You can't, it's like you can switch from one to the other.
Correct. You know?
Yeah. Yeah.
Yeah, why don't you allow online upgrades,
the most absolutely basic thing you
need to do with a database before you
start doing things like limitless or D-SQL or whatever.
It's easy to go from MySQL 5.7 to 8 by leaving Aurora
and coming to PlanetSkills, and it is doing it on Aurora.
Like all this blue-green deployment stuff,
it's technically possible if you read like 15 blog posts,
you still screw it up because why are you paying that money
to then go and build this ball, knotted ball of hell
to just upgrade your database?
It's craziness.
It's great.
It's amazing for us.
I think it's indicative of a trend.
I think if you are building a startup that competes
with Amazon, it's the absolute best time to be doing that for all of these reasons. And you just have
to study companies for the last tech companies, last 40 years. It's not even more nuanced
than the innovator's dilemma.
Yep. Yep. What about just like the economics? Is it hard to make the economics work when
you're paying so much infra to AWS? I mean, obviously you all are doing it with the profitability,
but like that seems just like
amazing to me that you can like build on top of AWS and still like eat out enough margin to make
that work. Well, this is where metal is transformational for us and our customers. Because
if you're buying convenience from Amazon, like EBS, yeah, I mean, it's you can't beat them, right? How
do you eventually win?
When you're buying the same metal servers they are, and you build, like they get a little
margin on top of that, but the price is set by N number of massive vendors.
Like, you know what I mean?
Like at the end of the day, if you're just buying metal machines off them, you know,
aside from their market advantage and ecosystem, Equinex are doing similar amounts for you,
right?
S3 is cheap and things to build against.
The thing that has been amazing for our customers
is with PlanScale Managed, it runs inside their account.
So if you're a significant Amazon customer,
you get to negotiate incredible commits
with saving plans against these machines
and just save extreme amounts of money.
Look, just one database on PlanScale,
its daily operational cost went down by $20,000. Wow. Yep. Yep.
Because of Metal. So this is why it's so good. I mean, we actually like, most of the time you're competing inside their ecosystem using their tools and the house wins. We are really just getting a server from them. And all the better they make the tools around us like S3 Lambda,
the more reason is to like buy the best database. And by all of
those services, you should never use yourself. And that's why
metal is so exciting, which is we can just compete in a way
that is very unusual to be able to compete with that with that
wasn't Amazon and it
takes might it takes software might that's it it's like it's actually just a fair fight for once
it's like you write code we write code we've we're better databases so that's how that goes
yeah interesting are they a good partner to work with generally um yes i would say so um with
Yes, I would say so. Sometimes it's obvious that we compete. Sometimes it's obvious that we can do really good things together and you just duke it out. I met the head of partnerships
for one of, who personally runs two of the largest Amazon partnerships.
And he was like, I don't want to keep you the impression
that there's not daily spats between the two of us
over small things.
Like that's just what two giant partners that kind of compete.
But, you know, I think the thing that's made it a lot better
and I do love working.
They, I know like they, they,
it's a business that I respect so much.
They've built an incredible company.
You go to re-invent and if you haven't been to re-invent,
I truly recommend it.
If you want to, I tell just random software engineers
that they barely like leaving the house,
let alone like going to a conference
with 70,000 people across five casinos.
I tell them to go to like stand inside
the physical representation of the commercial side
of the tech industry.
This is like kind of gigantic, gigantic,
just like hundreds and hundreds of millions.
A lot, the largest booth there for the floor space is two and a half million.
It's unbelievable. And so they built this phenomenal business, just, you know,
up until the last three years, the cloud was Amazon and no one else mattered.
Now people are catching up. And
so I respect them. And I adore them in so many ways. Like we've
people have built incredible products and experiences. But
yeah, they're Amazon, right? Like they're the ones that the
ones that they're the Goliath. And there's a lot of Davids. And
you have to, you have to try and win. But we work really well
together. And the thing that's really helped is them seeing how beloved we are for some of their
biggest customers, right?
People that are spending a lot of money with them, saying, you know, if PlanetScale still
likes running on you and works well, then we're good.
You know, like, it's kind of that.
It's just like, you know, they took notice and we had some giant marketplace kind of transactions go through and they're like, who are they, you know, and then that, that helps build a relationship.
And yeah, they've done marvelous things. And they are truly customers obsessed to that. Yeah, like I said, the big, the big differences when our customers told our joint customers told them how much they love us
and that the pie can grow and get bigger for all of us.
It's good, it's good.
Yeah, yeah, for sure.
Yeah, that, in that point, going back to the biggest booth
at reInvent being two and a half million,
I would love just to see the P&L just for reInvent
and see how that, because it got to be fabulously expensive,
all the stuff they're doing.
But like you're saying, the sponsor booths are crazy.
It's $2,000 a ticket and 70,000 people come,
which not all of them are playing full boat,
but half of them, that's a lot of money.
I don't know, I'd just be very curious just to see the P&L
for the conference itself.
It'd be very interesting.
It would be amazing to see those numbers.
And then I would just love to see the kind of economic impact on Vegas, because it's
not just like it's, you have to be like, it's March right now, if you're not trying to book
venues for reinvent, you're not getting one.
Like every restaurant, every, I do three dinners a day at reinvent, like I just stack them
up because that's like where the customer is.
You get to see these amazing scenes just to go on a complete tangent. I once saw someone who was
probably a mid-market rep for a firewall company throwing up in a bar next to a fully bejeweled
cowboy because it happened to be at the same time there was a rodeo.
Yeah, and a rodeo. Yeah. And if ours, yeah.
Yeah.
Where else are you seeing that?
Like it's incredible.
It is just amazing.
Um, yeah, it's, it's just a, it's a thing to behold really is in the, you know,
Google rent the big orb, which happens to be line of sight in my hotel room and
just shone through GCP ads through my curtains the whole time.
It's just amazing. I
recommend it. Yeah, anyone who hasn't been should just go because it's, how's it for you? You must
be a bit of a celeb there. People must harass you a bit. Yeah, it's always fun to go and just like,
I mean, for me, like I live in Omaha, so that's like my one time a year to just see everyone on
the internet and just like all kinds of people. So like, I, yeah, I love going for that. It's very tiring, but it's, it's awesome.
Yeah, for sure.
Yeah. Yeah. Yeah. By the time you're at the, like the Kygo show at the end,
I'm just like dying, but at least it's over. Yeah. I tell you, I met casually,
I, you know, met you or mentioned someone, Oh yeah, I go to Vegas every year.
And they're like, Oh, cool. They like, no, not cool. Not actually not. Yeah.
I have to go. And it's this. Yeah, the COVID year
was really strange too. I caught COVID while there. And all the vent. Yeah, all the I got,
yeah, got the reinvent branded strain of, you know, sponsored by Splunk. But yeah, but yeah,
like, that was really strange, because the vendors, we all showed up, the audience like hell no, like,
this is optional for us. And no one went. So it was this kind of weird reinvent where all the vendors, we all showed up, the audience like, hell no, like this is optional for us and no one went.
So it was this kind of weird reinvent where all the vendors
were actually just like hanging out
and going to each other's parties.
And I was at one party, they were like,
yeah, we're $5,000 underspent on the bar.
So just start ordering the top line drinks
and me and our CTO had enjoyed ourselves.
But yeah, it's great, it's great fun.
Google Next is getting similar scale too,
which will be there this year as well.
Interesting.
I've never been, I need to go to one of those too.
I just remember seeing that one,
that guy with like his,
I can't remember his name,
but like the goofy outfit last year,
like running around and yelling and screaming
and banging on drums.
Did you see that at Google Next?
Oh no, I did not.
I did not.
Oh man, I can't remember his name.
He's like some performance artist.
Oh, on stage. Yeah, yeah, yeah. Yes, yes. Yeah, I know. Mark no, I did not. I did not. Oh man, I can't remember his name. He's like some performance artist. Oh, on stage.
Yeah, yes, yes, yes.
Mark Simpson, I think his name is on there.
It's like a search from Silicon Valley, yeah.
Yeah.
Yeah, we'll be going.
Metal's available obviously on GCP as well.
Okay, yeah.
And so we'll be there with the presence going
and seeing everyone.
Should be fun.
Yeah, yeah, for sure.
Okay, two quick questions I wanna talk about
in the database space.
Number one, you talked about migration from five, seven to five, eight. I know like Mark Callahan
and a few others have talked about just like perf regressions in my SQL to go on all the way
to eight. Like is, are you seeing that too? Is that like some isolated workloads have issues or like,
what do you think about, about that? So there are perf regressions in certain
ways, none that we've really seen in production.
This is kind of reveals the difficulty with benchmarks,
which is they're not real.
I never know what's true and what's not.
Yeah.
So we had a pathological case with Metal where we were like,
wow, why is it benchmarking this way?
And we then shipped it with customers
and never saw the problem.
And like, you know, no one runs their databases 100%,
which, and for benchmarks, benchmark to 100%.
You know what, that nice 10%, things go wacky.
You know, like things are really crazy.
Where did wacky come from?
That's not a word I really use often.
But yeah, like benchmarks, I always
say to the team because because we truly know benchmarks are
mostly not useful for prod. And so does Mark, Mark will tell
you this mark and Mark is a, you know, if Mark's listening, I'll
say an absolute legend in the database world and seeing scale
unlike any other and you know, it's very interesting to see his work.
I think it puts pressure on the MySQL team. But it also from our lived experience of having a lot
of people running on eight, it's, we're not really seeing it as much. But you will get, but there's,
there's real regressions, you know, that happen all the time that benchmarks
won't catch, right, as well.
You know, it's a very, you get, you only really get to, and we don't, you know, even then,
you know, it takes a lot of sophistication to have a good synthetic environment or a
load testing environment to really test.
Really you need an architecture that allows you to incrementally roll your database out.
So if you're sharding, that's the best way we find regressions is that we can slowly
roll out things shard by shard.
Interesting.
Will you put shards onto...
Even within one deployment, you'll have some shards that are on higher versions and just
seeing how that works and things like that.
Correct.
Yeah.
And then it will roll slowly and we'll look for any deviations and metrics.
And we have to roll these changes out across a very large amount of computers. And so we catch
things very, very early. Like, I mean, if you're processing the millions of QBS that we're doing,
you'll find things pretty quick as you have these issues. We also do the core SQL, MySQL work ourselves.
So we can find and fix some of those issues
pretty quickly.
Man, I'm trying to think like how you all could be,
I got better benchmark than some of just like the,
you know, fake benchmarks.
I'm just like, hey, this is what we're seeing
and are we rolling this out?
We're not seeing impacts from all these customers.
Like, yeah.
Yeah, we could do more to actually talk about that.
We just can't get on with it, I guess.
You know, like it's okay.
We just kind of fix the problem and just move on, you know?
Like it just, yeah.
But benchmarks have value,
but not as much as I wish they did,
because they're a very convenient way of doing things,
but they're just, they're just a benchmarketing benchmarketing is it's a whole other thing. Yeah, for sure. Okay.
Last like database related question, sort of going back to AWS is like we had dsql come out,
but also reinvent like Dynamo had multi-region strong consistency and like it seems like something
is underpinning both of those, like the same sort of underlying tech. I guess like it seems like something is underpinning both of those like the same sort of underlying tech.
I guess like is that something that you see a lot for, I mean I know that's not for everyone,
is that something you see a lot for customers with tests having that sort of use case, whether
it's like zero RPO, you know, if a whole region goes down like that for square or something
like that or I guess like how often do you see that?
Do you think it's useful?
Cool.
I guess like where are you at with some of that?
So strong consistency cross-region is a bad idea.
Yeah.
Seems expensive, right?
That's a lot of latency to wait for.
Correct.
So we have lots of very large multi-region deployments
with extremely high RPO because we do the asynchronous style of replication across the country, meaning
it gets out of the data center and is able to catch up.
And a thing we see with a lot of folks is all of the other complexities of doing a real
cross-region failover is so difficult that
even if the strongly consistent database is ready, it's just a matter.
I mean, there's just so many other things going on.
Again, you've just eaten that performance for the whole time.
Every single right, yeah.
Right.
We just want a customer against Spanner.
They could not then they were using all of
that because like why use Spanner if you're not going to do that stuff? What's the point
single region Spanner? They could not get a write down less than 12 milliseconds. That's
ridiculously slow with sub millisecond. Like it's crazy. The extra capacity after provision
to handle this stuff is nuts. For what? I
mean, amazon.com still runs in one region. If they can do it, everyone else can. This
whole, even Verso have kind of dialed back the kind of like that whole, um, global stuff.
Yeah. The global stuff in a, in a edge. Yeah. Yeah, exactly. Because it's just, it's not,
it's not, people just don't need it. It's just, it's not, it's, it's, it's a good checkbox for if USC's one really shits the bed, are we going to get to come
back the same day is essentially what most companies want. But even like the reason cross
region failovers are not automatic on planet scale is because nobody wants them automatically. In
fact, they make sure that it's not going to happen because if your database evacs and everything else doesn't, and then we're not solving cash
warming for you, we're not solving bringing front ends up, it's a requirement. I get it.
There's a lot of databases out there that are very niche that are set up to do this and make
it really easy. Just tiny, extremely low transaction volumes. You just can't do it any other way than our model,
really, at that scale.
Yep, yep, makes sense.
All right, well, Sam, I always appreciate it.
Again, all those sort of things.
I've been curious about it.
It's good to have someone with a reasonable, strong
perspective on it.
So thanks for relying on us, and congrats on the meta launch.
This is super cool, And yeah, excited to see
people use it.
Thank you. And thank you for being a place where we can talk
about these things, right? There's, you know, there's the
corners of tech that people doing real things and, you know,
not just hyping everything up is getting smaller. And so it's
great to have a format for doing that. And I will also pass on
that when I mentioned in our marketing channel that I was
chatting to today, you have a lot of fans out there
that plan to get to listen to you.
Really?
I'm surprised at the MySQL coming.
So I love hearing that.
That's good to hear.
So yeah, thanks for coming on.
I really appreciate it.
Yeah, we'll link to all the metal stuff
and of course your Twitter account and all that.
But yeah, best of luck.
Amazing. Thank you so much.
Thank you.
Thanks.