The Changelog: Software Development, Open Source - Bringing Vitess to Postgres (Interview)
Episode Date: July 23, 2025Sugu Sougoumarane, creator of Vitess, comes off sabbatical to bring Vitess to Postgres. We discuss what motivated Sugu to come off sabbatical, why now is the time, the technical challenges of doing so..., the implementation details of Multigres (Vitess for Postgres). We also discuss the state of Postgres at scale.
Transcript
Discussion (0)
What's up friends welcome back.
This is your favorite podcast the changelog yes.
Feature the hackers the leaders and those buddy the test for Postgres multigrass today
we're joined by a sugoo sugoo Maran the creator of a test and you know we're Postgres for
life around here.
We talked to sugoo about his sabbatical he took, what VITES is, taking it from side project at YouTube
to full on company.
Now at SuperBase, bringing VITES to Postgres
with Multigres.
We cover what's required to take VITES
to the Postgres world,
building the team to do it, implementing Multigres,
Multigres at SuperBase, and so much more.
A massive thank you to our friends
and our partners at fly.io.
That is the home of changewall.com.
Learn more at fly.io.
Okay, let's multigress.
Well friends, it's all about faster builds. Teams with faster builds ship faster and win over the competition. It's just science. And I'm here with Kyle Galbraith, co-founder
and CEO of Depot. Okay. So Kyle, based on the premise that most teams want faster builds,
that's probably a truth. If they're using CI provider for their stock configuration
or GitHub actions, are they wrong?
Are they not getting the fastest builds possible?
I would take it a step further and say,
if you're using any CI provider with just the basic things
that they give you, which is,
if you think about a CI provider,
it is in essence, a lowest common denominator generic VM.
And then you're left to your own devices to essentially configure that VM It is, in essence, a lowest common denominator generic VM.
And then you're left to your own devices to essentially configure that VM
and configure your build pipeline.
Effectively pushing down to you, the developer,
the responsibility of optimizing and making those builds fast.
Making them fast, making them secure,
making them cost effective, like all pushed down to you.
The problem with modern day CI providers is there's
still a set of features and a set of capabilities that a CI provider could give a developer that
makes their builds more performant out of the box, makes the builds more cost-effective out of the
box and more secure out of the box. I think a lot of folks adopt GitHub actions for its ease
of implementation and being close to where their source code already lives
inside of GitHub and they do care about build performance and they do put in the
work to optimize those builds but fundamentally CI providers today don't
prioritize performance. Performance is not a top-level entity inside of generic
CI providers. Yes, okay friends, save your time,
get faster builds with Depot, Docker builds,
faster get up action runners,
and distributed remote caching for Bazel, Go,
Gradle, Turbo repo, and more.
Depot is on a mission to give you back your dev time
and help you get faster build times
with a one line code change.
Learn more at depot.dev,
get started with a seven day free trial.
No credit card required.
Again, depot.dev. Today we're joined by Sugu Sugumaran, co-creator of Attest, co-founder of Planet Scale and now
the head of Multigrass at Subabe Sugu.
We are excited to have you here.
I'm very excited to be here too.
I'm excited to see you back at work.
You've been, you've been off work.
You said you just told me you took a three year sabbatical.
Is that accurate?
Yeah.
Yeah.
I, it was, I even wondered if I was going to come back until, you know, I started feeling the itch.
And yeah, that's what brought me back.
Gardening, you know, ripe tomatoes.
What kind of vegetable did you start growing?
I actually went on self-reinvention of some sort.
I kind of, you know, did a full debug of myself
of some sort.
Did you climb a tall mountain or how do you do that?
I don't know, just sit at home, think, walks,
and a lot of thinking.
And I even got into doing some voluntary work and a lot of thinking.
I even got into doing some voluntary work because I realized as a human,
I realized basically I thought I should be more human
and that led me into a bunch of voluntary work
which I actually started to do.
And I even thought of going full-time.
do. And I even thought of going full time. So then I realized it's not something you can do full time. And that's when I realized that I'm starting to miss the fun of doing
technical work. So I started doing some advisory work and stuff and slowly I got pulled back
in and realized, oh my God, we gotta do this project.
Which is how I am where I am today.
Did it take, how long did it take
before you started getting that itch?
Because three years is a pretty long one.
I think by that point, I might be detached enough to
I agree.
Not get the itch again.
It was strange, like I think last year sometime,
late last year, so for about six to eight months, I've been
feeling the itch.
I think it started because people, because I'm still in touch with friends and stuff,
they used to ask me for questions and help about things.
And then as you answer, you kind of enjoy that you did it. And you say, okay, maybe I can,
then I talked to some people and they said,
oh, you know, someone like you is still useful.
And as you realize the last three years,
everything changed in the industry.
Right.
AI actually, the first GPT came out
just when I went on sabbatical,
around mid 2022, I think. And when I thought of coming back,
I thought, you know, maybe I'll be obsolete or something,
because I don't know anything about AI.
And like, but then it turns out that
infra is still way behind.
Yeah.
And they're here,
what you know people desperately need.
Databases still matter, turns out.
Databases matter, infrastructure matters,
nothing much has changed in that area.
So, I came back and slowly got back into things.
What pushed you away in the first place?
Why did you go on that sabbatical?
Had you accomplished all your goals?
Were you burning out?
What was the situation that made you set a long side?
Both of those actually.
Both, okay.
So this was 2022.
I had been working on VITESS for 12 years straight.
We started VITESS in 2010 at YouTube.
And VITESS is an enormous project. For a while, I was the sole maintainer of VITES and it's a pretty big load on your brain. After I founded PlanetScale, hired a team and trained them,
and finally the team was out producing what I could do.
That's when I realized, okay,
it's now time to recover from
this huge load that I've been putting myself through.
That's what led to my taking the sabbatical.
What year was that again?
Was it around the time that you...
I'm not familiar with the story behind the scenes really with Sam Lambert, but I'm, I
guess, tangential friends by having him on the show and hearing his backstory on Planet
Scale's early story and mainly taking over the CEO role and what that did for PlanetSkills at that
time. What time was that you stepped away and was that the time that you brought on
Sam as a CEO?
So I think my memory may be hazy here. I think he came on board around 2020. Sam has been
a fan of PlanetSkills from the beginning. So he even did a Sequoia Scout investment when we launched.
And so he has been following us.
He's been actually advising us also
in the early couple of years.
And then in 2020, he came on board as, I think,
CPO, Chief Product Officer.
And he changed the whole image of PlanetScape.
Until then, we were very, what I would call,
a back-end, serious back-end database,
but in some respect, boring of some sort.
We were pretty serious.
We would do things. But now, we were like, he brought the excitement
in when he came on board.
And the first feature he launched was the schema branching and merging, which the dev
community completely loved. And that was around 22, Jiten at the time was CEO and Sam talked and
they came to an agreement that if Sam became the CEO, that would be really effective because
then he could control the whole story. He became the CEO probably much earlier, probably 2021 would be my guess.
I'll have to look it up.
It was probably announcements.
So he had been the CEO for a while and then in 22 is when I left also.
By that time the team was all wrapped up,
like Vitesse had a tech lead,
so everything was looking great.
So I almost felt like I was a warm body.
You know?
Yeah.
Well, that's a good feeling.
I mean, you've done a lot of the things you set out to do.
For those of us playing catch up,
can you describe the tests, what it is
and how it helps scale databases?
Yeah, yeah, definitely.
The tests actually, when we started with tests,
we thought it was a six month project.
So YouTube was actually in deep trouble scalability wise.
There were like more than a handful of outages every day,
like sometimes 10, 12 outages per day,
and they were all caused by the database.
I wouldn't say outages, 10, 12 pages per day.
Many of them would lead to outages
and they were all caused by the database.
That's the time when
my co-creator, Mike Solomon and me kind of decided to solve this problem once and for all.
So we decided to take ourselves out of this firefighting mode and think of a solution that
will leap us ahead of where we are today.
Because it felt like what you are doing now was a losing battle.
It was like whatever we are doing, things
were getting worse by the day.
And that's how Withus was born.
And we wrote code for about a year, actually.
It took us about a year to launch the first version.
And the first version that we launched
was just a connection puller, which
was actually the biggest problem we were having with MySQL
at that time.
And that immediately gave some breathing room for YouTube.
And the way we solved it was we didn't solve it
like how you would solve a problem, which is let's solve
connection pooling.
Because we sat and brainstormed about all future problems
that we're going to have, we actually
built an intelligent proxy that would not just connection pool
but would also understand your query.
So we actually built a parser in that connection pooler.
And once this got launched,
people started noticing that this so-called
connection pooler is intelligent
and it can do other things beyond
just taking care of connections.
Like for example, fielding bad queries
or watching queries and killing them
if they run for too long.
And so we slowly started adding those features
and it became an intelligent connection pooler
that would protect the database completely.
And that eventually evolved into,
then we added sharding solutions to it,
like how do you re-shard? And then we added sharding solutions to it, like how do you re-shard?
And then we added one more layer,
which is the routing layer.
And then we realized, oh my God,
this is now almost a fully distributed database.
Why don't we make it a fully distributed database,
which was the final step
where we just became a fully sharded solution.
By the time we started in 2010,
I think what we call is the V3 was around 2014 or so,
about four years later,
we test could be adopted by anyone outside of YouTube.
And then the rest is history.
The rest is history, right?
Open source, mass adoption, planet scale,
build a business around it.
Yeah.
It's very successful business to this day.
Really cool stuff.
Kind of an open source success story.
I was going to say prototypical or stereotypical,
but like not necessarily, but just a good one.
Just a good open source success story
with, you know, money involved in the business
that could be built and kind of what everybody wants
is like build some cool tech, open source it,
help companies scale their businesses,
like provide massive values once and for all, like you said.
And then also build a business around it
and be able to be successful and then take a three year sabbatical when you're all like you said. And then also build a business around it and be able to be successful
and then take a three year sabbatical
when you're all said and done.
You know like, all right, I'm done for now.
Yeah.
I think to extend from what you're saying there,
I think it's wild, you know,
so the one thing you mentioned was like
the nine month or six month you thought
that the project would take.
Yeah.
In terms of like it's done this,
but then it turned done this, but then
it turned into a company and then obviously PlanetSkills has been able to, and I think
others too have been able to hire and maintain full-time open source maintainers on the test
project, not just to support PlanetSkills, but to support the project.
I think that's just so wild how you think like it's's just this little thing, or it's a certain amount of time
and solving these skill problems,
the next thing you know,
it's something much bigger than that.
Yeah, yeah, like the first adapter
was actually a company called Flipkart.
I believe they still use the older version of Vitesse.
Then HubSpot came on board.
They still use Vitesse and they have their own operator.
They were the first ones to launch,
among the first ones to launch Vitesse in Kubernetes.
And then later Slack came on board and GitHub came on board.
So now it's like, they are still pretty substantial
contributors to the project at this point.
Well, you said once and for all,
and I said once and for all,
but the technical facts about Vitesse is it's not for all,
it's for all who are running specific database systems.
And one of those alls who aren't part of the all
is the Postgres ecosystem,
which has been awesome for many years,
but has been exploding, I would say,
in the last five or 10 years,
in usage, in features, in capital support.
I mean, there's just lots of excitement
and growth in Postgres,
none of which Vitesse could provide for, right?
It just wasn't, it was just MySQL specific, right?
Yeah, yeah.
So that was, I think the Postgres community
took note of Vitesse a long time ago.
As far back as 2018 was the first conversation I had
with someone from the Postgres side, and they said,
hey, we need something like with this for Postgres.
And there have been a few false starts on this project.
I think the one where we got pretty serious about it
was actually in 2020.
Someone called, I don't know, he's
pretty well known in the Postgres community, Nikole.
So he and I talked.
And we felt that we should do this thing for, you know,
port with us for Postgres.
We even like formed a sub team and started talking about it.
But you know, 2020 planet scale was on the rise.
It was just too busy.
And unfortunately, I had to make a call that, you know, like, I have this company to take
care of, I have this company to build, I can't afford this distraction at this time.
And I actually kind of had to back out saying that I don't think I can do this at this time.
I still feel bad about having done this.
And this, so Postgres has been in the back of my mind all along.
Even after I went on sabbatical, I reached out to some of the older contributors and
told them, you know, like, maybe you can start this project.
But no one was, no one at that time felt like they all had their jobs and this would all
be something on the side.
And until this year, when I said, you know,
this it's now or never, you know?
And that's when I said, okay,
I'm going to take this plunge.
I think mimicry or ports or whatever you wanna call it
is kind of a tried and true part of the open source culture.
You know, take this good idea from this ecosystem,
which I appreciate, but I'm not data user of,
and bring it to my piece of the world
and reimplement it over here, or steal the ideas.
And it's kind of, Vitesse has been so successful
and so helpful, especially to large companies
who are well funded.
And helping them scale, I'm surprised that all these years
there is not already a VITES for Postgres.
Community built or coming out of a startup or something
like no, Sugu has to do it, right?
Like it's back to the VITES people are gonna do it.
I don't get this.
Yeah, I don't know why I had to come and start this.
Is it that hard?
Because usually you look at something and you think,
it can't be that hard.
I'll do it over here.
Is it that hard of a technical problem
or is MySQL and Postgres implementation wise
so dramatically different that even the concepts
don't necessarily come
across?
Like what's that chasm look like?
I think it is more of the first problem.
The fact is like sharding, if you go and ask in a university sharding, they'll say sharding
is one of the oldest concepts in databases.
It's over 30 years old.
But the issue with sharding is that nobody ever worked on it.
Nobody ever developed it into something practical
that will actually work in production at scale
with performance, which is actually strenuously hard.
Like for me to make the leap from an application sharded system
to a system where the application does not care about sharding
and generically implement it in a database,
it took me about, like I literally sat and thought for three months straight, nothing other than
just pure thinking, understanding SQL, understanding query complexity, understanding relational
algebra, going back all the way because, yeah, okay, how do I shard it? You go look for,
you go hit the books, you go look at databases.
Nobody has ever done this before.
And so then I realized, you know,
this needs to be basically reinvented from the ground up.
So that was one of the hardest,
one of the most strained full thinking things
that I have ever done in my life.
There were times when I used to get headaches thinking about it. It was that hard.
And finally, I
I one day I had the aha mode.
It looks like it looks like I have an answer to all the questions
to make this thing work as a sharded environment, as a sharded system.
And it's possibly that is that like a typical,
when you get a query, the typical way you break it down is different from how you would break it
down in a sharded system. Because you get a select statement, you parse it, and then you identify each of those nodes
and you just assign operations to it
in a traditional database.
And then you run the optimizer that rearranges these things.
But in a sharded system,
you have to identify groups of things
that actually don't need to be changed at all
because there is an underlying database
that can execute that entire query with no changes. Because what's underneath is
also a database. So being able to analyze queries and identify these subparts and
be able to outsource it to a database and then do the rest of it on your proxy
that's actually sending the queries,
required a different kind of thinking.
I think that is one reason that I could think of
why sharding is hard.
Can you describe your aha moment?
Like what your realization was?
Or did you just try to describe it in other words
and I missed it?
Because it sounds like you're describing the problem and the system with
the proxy in there, but like what was it that cracked the nut that said, aha, we can do
it this way?
Yeah. So how did I do it? The aha moment was actually the, there was an early aha question,
which is until the day when I said that,
so before sharding, before this generic sharding was there,
the way the application worked was it would actually send a query
and along with the query it will tell you where to find the rows for this query.
We called it the keyspace ID, which basically helped you.
There's a lookup where you look up the keyspace ID.
It tells you, oh, this keyspace ID can be found in this shard.
The application had to guarantee that all the rows
for that query lived in that shard.
So that made it extremely simple.
And key space ID was also a value of a column
in every table that we had, which
made it safe and simple, which means that you can't go wrong.
And the question I asked myself is, this key space ID
is a crutch.
Can we move without it?
And it was scary because the application
knows the key space ID.
But how would this system know the key space ID?
Vtest does not know the key space ID.
So if you just sent me a query, what do I do with it?
I don't know where to find this.
And so I went to the application and the application
is figuring out what this key space ID is.
How is it figuring it out?
And that's when I realized, oh,
it's figuring it out based on the user ID.
So there is a source.
And it always turns out that that source is also
in the where clause of every query that it sent.
And that's when I realized, oh, well,
if the application is doing it, what if I did it for you?
You say where user ID equal to x,
and I will do this computation,
which is a hash function,
and figure out where it is and send it.
Then the question is, then in that case,
how about the other queries?
There were other complex ways by which application
was computing the keyspace ID.
And that's like, what if we internalized each and every one
of those things?
Then the application does not need
to ever compute the keyspace ID.
We will do it.
So that was the first aha moment.
And then the second aha moment was, the second question is,
but in that case, only these queries.
So that limited the number of queries
the application could send.
The next question was, but then if we took this over,
what if there's a query that came where a key space
ID could not be computed?
What would we do?
That's when I realized, OK, then I
have to understand the query.
What if it's a join?
What if it's a correlated subquery?
What if it is a non-correlated?
So that's when I had to go into relational algebra
and find a way to map any random query to a query plan,
where I would say, oh, this query, I need to send it to
all shards or this query, I can send it to just this one shard or this query is in multiple
shards but I cannot send it to all shards because I need to take data from here and
then data from somewhere else.
So the aha moment is when I identified all these primitives.
And I could prove to myself that using these primitives,
they are called in the route primitives or the sharded
primitives, if I used these, I can satisfy any query.
The full SQL language can be supported eventually.
We just need to do a, so the first implementation
only supported some constructs,
and then we slowly started adding more and more.
But we knew that theoretically,
all SQL can be supported in the sharded system.
So to the key space ID points,
I understand the relational algebra
and the entire SQL dialect,
like supporting all the keywords over time.
That makes sense to me.
The Keyspace ID as a non-VTES user,
I'm not sure how it works.
Do you have to go, does the infraperson
or the team that sets this up,
do they have to go in and somehow configure it
to know how you should shard against
this particular application
so that it can do the Keyspace ID thing?
Or is it just, because different applications,
different databases have different, you know, as you know.
Yeah, so there's a cool story there.
The first Keyspace ID was actually a random assignment.
Okay.
So we would take a user and say, I
don't know if it was a salt, what kind of,
I actually don't remember how we initially computed.
We used to randomly assign them to shards.
Actually, now I remember.
A user was assigned a shard, and we
would put the shard number in a lookup table.
It's random.
So when a user gets created, they are assigned a shard,
and the shard number is put in a random table.
And then came the time when we had to re-shard it.
And we realized, oh my god.
How are you going to re-shard this?
Because now there is this table.
That's when we transitioned to Keyspace ID,
where we assigned a unique ID to a user.
And then we actually assigned that Keyspace ID.
We changed actually this lookup table
to replace the shard number by the Keyspace ID.
And at that time, we populated every table
with the keyspace ID.
And then we realized that we are wasting this lookup
for no reason at all.
Then we did a full resharding where
we did ranges based on keyspace IDs, in which case,
you just hash the user.
You have the keyspace ID. And then that goes to that shard.
So that's how we evolved at YouTube
from where we started to.
But at that time, the keyspace ID was a column.
And the reason the keyspace ID was important there
was because we were using MySQL replication to do the resharding, and
we had to use that value to decide where the row should go when we resharded.
And that was actually another hurdle I had to cross, which is, can you reshard if you
get rid of that row?
In which case, then we'd have to make
MySQL do that computation to do the replication, which means
that it had to be a function that MySQL supported.
The interesting story here is, at that time,
there were a large amount of arguments about what
is the best sharding scheme?
Is range-based sharding good?
Or is a hash-based sharding good?
We spent hours fighting on it.
There was no, or a lookup-based sharding, right?
There's like so many sharding schemes.
And that's when I realized that there is no correct answer.
And anything you choose, somebody is going to hate it.
And that's when I came up with this idea of no first class
citizens in sharding, which means that Vitesse will actually,
every sharding scheme is a plugin, is an extension.
Gotcha, so you pick the one you wanna use.
You pick the one you want to use.
And you don't have any opinions on that as Vitesse.
Exactly, exactly.
That actually goes back to actually a story from, I don't know if you've heard of Illustra.
So this is 1990s, Michael Stonebreaker had, I think, just launched Postgres.
And from that, he founded a company called Illustra, which is a commercialization
of Postgres.
And the claim to fame for Illustra is that you, indexes need not be defined by the database.
You can define your own index and do it as a plugin.
And at that time I was at Informix.
And Illustra was actually winning deals
against Informix as a startup.
And Informix actually ended up acquiring Illustra.
And I loved that concept so much that I actually transferred
myself into the Illustra team
to develop pluggable indexes.
And that inspiration remained with me all these years.
And when I said, you know, we have to develop indexing for sharding schemes, I said, well,
you know what, I'm going to use this idea where all indexes in the sharding scheme are externally defined.
Yeah.
It makes sense because, I mean,
oftentimes opinionated software is better.
Like we do have a golden path
or we do have a way that you should go.
You can swap it out if you want,
but this is what the framework or the tool
thinks you should do.
When it comes to this, like I said,
and like you know,
like the shape of people's data is so different and custom
that there isn't necessarily a best way of sharding,
as you guys found.
So it's what's best for your circumstance.
And so why have a first-class citizen
when you can just say, no,
you're gonna figure this one out yourself
because we can't actually figure it out
for you, I think that makes a lot of sense.
Yeah, the thing that I'm now starting to realize,
I'm now learning Postgres, like a little bit more deeply.
I find the same ideas still there in Postgres,
where indexes are pluggable. You can plug in data types.
You can define operators.
So I'm kind of feeling like, oh my god,
there's a lot of similarity between.
You've got a false protocol.
Yeah, so it's pretty exciting to see this. Well friends, I'm here with a new friend of mine, Harjot Gill, co-founder and CEO of
CodeRabbit, where they're cutting code review time in half with their AI code review platform.
So Harjot, in this new world of AI generated code, we are at the perils of code review, getting good code into our code bases,
reviewed, and getting it into production.
Help me understand the state of code review
in this new AI era.
The success of AI in code generation
has been just mind-blowing, like how fast
some of the companies like Cursor and GitHub Copilot
itself have grown.
The developers
are picking up these tools and running with it pretty much. I mean, there's a lot more
code being written. And in that world, the bottleneck shifts of code review becomes like
even more important than it was in the past. Even in the past, like, yes, I mean, companies
cared about code quality, had all this pull request model for code reviews and a lot of checks.
But post-gen AI, now we are looking at, first of all, a
lot more code being written. And interestingly, a lot of this code being written is not perfect.
Right? So the bottleneck and the importance of code review is even more so than it was
in the past. You have to really understand this code in order to ship it. You can't just
wipe code and ship. You have to first understand what the AI did. That's where CodeRabbit comes
in. It's kind of like a, think it as a second-order effect where the first-order effect has been Gen. AI and code generation. Rapid success
there now as a second-order effect. There's a massive need in the market for tools like CodeRabbit
to exist and solve that bottleneck. And a lot of the companies we know have been struggling to run
with, especially the newer AI agents. If you look at the code generation AI, the first generation of
the tools were just tab completion,
which you can review in real time.
And if you don't like it, don't accept it.
If you like it, just press tab, right?
But those systems have now evolved
into more agentic workflows,
where now you're starting with a prompt
and you get changes performed on like multiple files
and multiple equations in the code.
And that's where the bottleneck
has now become code review bottleneck. Every developer is now evolving into a code reviewer. A lot of
the code being written by AI. That's where the need for CodeRabbit started and that's
being seen in the market. Like CodeRabbit has been non-linearly growing. I would say
it's a relatively young company, but it's been trusted by a hundred thousand plus developers
around the world. Okay friends, well good. Next step is to go to coderabbit.ai.
That's C-O-D-E-R-A-B-B-I-T.AI.
Use the most advanced AI platform for code reviews
to cut code review time in half,
bugs in half, all that stuff instantly.
You got a 14 day free trial,
two easy, no credit card required,
and they are free for open source.
Learn more at codeRabbit.ai.
So what's different then? It seems like the concepts, the breakthroughs you had with VITESS,
the proxy with the parser and the smart way of handling these key spaces and the
and the smart way of handling these key spaces and the bring your own sharding, like all this stuff seems like it would apply at least conceptually across to Postgres.
But I guess at the nitty gritty, it's just wildly different or?
So the part that I'm still trying to figure out is so in other words every idea that exists in with us is
Will just smoothly port over to Postgres, but as you said there may be details, you know that
Where it may not port as well. So there's a few
At this point we have made a few policy decisions.
Let's put it this way.
The biggest policy decision is this has to be Postgres native.
So that part, we do not want to compromise,
which means that the end product should
feel like it should be built for Postgres.
So that is one policy decision that we have made.
I don't know how far we are going to get there,
but that's our North Star of some sort.
Which means that from VITES,
there are some parts we will definitely leave behind.
For example, anything that's MySQL specific is not coming over.
What else will we leave behind?
We will leave behind anything that is legacy in VITS.
Anything that we built in VITS but continue to support because there are users using it.
But if you built it from scratch,
we wouldn't do it that way.
So that we want to leave behind.
And also anything that's not well implemented.
If something is, it works, so we left it alone.
But it was not well implemented.
Given a choice, I would do it differently.
So that we are not going to bring over.
And everything else we want to bring over, for sure.
So there are actually, and there is a fourth category,
which is if there are things we can improve, we should.
More specifically, I do want to improve the high availability
story in Postgres, because that is currently a big challenge
that users are facing.
There is no good high availability solution.
So that is something that definitely want to improve
on the Postgres side of things.
So we'll bring some components over from VITAS
because VITAS already has some HA components. So we'll bring some of those components,itesse, because Vitesse already has some HA components.
So we'll bring some of those components,
but we'll also build something that
is very specific to Postgres.
And the place where I'm still confused about how
to solve the problem, but we will figure it out,
is how do we solve this sharding and query processing?
Because Vitesse can do query routing.
Vitesse has the concept of pluggability.
The part that I'm thinking about is the Postgres pluggability
is more at the binary level, where you actually
build extensions and link them into Postgres.
Whereas Vitesse is built in Go, which if we brought that in,
how do we make these plugins work?
Do we have to translate them?
Or do we figure out a way to make them still work
if we ported the Go code over is the part that
I'm that I'm working through right now.
Actually we now have some people hired so I'm working with some of those people too.
Does that compromise your desire to be Postgres native considering you know what you learn
with the task being written in Go etc. Like how does that compromise your?
Maybe postgres for life. I'll maybe put in words in your mouth, but like this native postgres way. So the
So if you look at the vitas design, it was designed to be
For a for the longest time we didn't want to implement anything specific to my sequel We thought this should be its own database system.
And we actually restricted ourselves to just SQL 92.
Because SQL 92 is generic, all databases must support it.
Therefore, if we did SQL 92, then it'll work for anything.
So we could even swap in Postgres underneath.
But what we found out is that people
were hesitant to migrate over.
They said, oh, I want my application to run as is when I move over,
which is what forced us to add MySQL extensions.
So in other words,
we could port the SQL 92 part of the test directly over into Postgres and it'll work. And the problem is it's a subset.
So if you do anything Postgres specific, it's not going to work.
And so I'm beginning to think that then porting that means that, yes, we can port it,
but then we have a lot to build, not to catch up to everything in Postgres.
And looking at 10 years from now or five years from now,
when let's say we manage to get full compatibility
of Postgres, we will still face the issue
of these plugins, these extensions.
So that's the problem that I want to think about right now to make
sure that when we get there, there is a solution for these extensions.
How important is it to reuse a lot of what you learned with Vitesse in this new world?
Because I can imagine like all this work you've put in and countless hours of open source,
etc. Like there's a lot in there. How hard is it? How important is it to bring a lot of it over?
The learning is totally valuable.
And without the learning, we wouldn't be able to build it.
If you say rebuild with tests, to what it is,
it is a lot of code, but it won't take the 15 years
that it took us to build.
It may take us, even if we were to rebuild from scratch,
it may take us a year maybe.
Especially with AI, right?
I mean, that might be a big help there.
I mean, Go is really useful in co-gen tooling
and stuff like that.
So I can imagine that that might even be a leg up
for you to speed up the process.
Yeah, we have been using Claw to analyze Postgres.
Yeah, so you're probably right.
We can probably do this much faster
than even what I'm talking about.
We have to rebuild this whole thing than even what I'm talking about.
We have to rebuild this whole thing.
So there's this common movie trope, you know,
the retired badass.
I'm not sure what this is called.
You know, like the-
The retirement.
Exactly.
For one last-
John Wick comes to mind here.
I'm sorry.
John Wick, Bruce Willis often plays this role.
It's true.
Yeah.
Or the Tom Cruise.
Oh, Tom Cruise, yeah.
I'm not sure who would play you too,
but they're coming out of retirement for this
because like we got to get, what is it Armageddon?
We're like, it has to be these people, you know?
Right.
Which was funny because those were like oil drillers,
weren't they?
Yep, yep, yep.
Interesting setup.
But no, they come out of retirement
for that one last mission.
You know, I'm getting too old for this.
That's a common thing they say.
And here you are.
But you're coming out with SuperBase.
I mean, it's a different story.
Like, SuperBase came from where?
Obviously the Postgres maxis were fans of SuperBase
and I know Paul Coppola had them on the show a few times.
And so we know about them.
We know what they're up to.
Obviously PlanetScale was on the MySQL side,
but why SuperBase?
Why come out and hook up with this ragtag group
of Postgres Maxis to build this thing?
So I actually engaged with Coppola on a whim, you know,
saying, so when I say, you know, like, so, when I, when I
said, you know, like this problem needs to be solved, needs to be solved now.
My first thought, you know, like I should start a company, you know, I like me, let
me do a startup, you know, I actually started talking to some, some looking for co-founders,
you know.
Okay. I was even talking to my, to Jitha and my PlanetSkill co-founder.
And I was kind of starting to engage this.
Then I thought, you know, like, what am I trying to achieve this time?
Like, I'm not trying to maximize profitability. What I want to
achieve is give this project the best chance for it to succeed. And I was
thinking if I do it as a startup, I'm bringing so much risk and so much
distraction, right? Because then I will have to make sure that I raise money,
I have to keep people happy.
And instead like, what if I found a place where I don't have to worry about these
things, where I don't have to worry about raising money, where I don't have to
worry about, and even like if you're a seed stage startup, it's also have to worry about. And even if you're a seed-stage startup,
it's also hard to hire people.
There are people who are really, really happy where they are.
They are being very highly paid.
How do you convince them to come and pile here and take
the risk to?
We had this struggle at PlanetScale, by the way, when we left Google. And we thought, to, we had this struggle at planet scale, by the way, when, when we
left Google and we thought, oh, we know so many engineers, so hiring is not going to
be a problem. And then we start the company, nobody wants to join us. So I thought of that.
I said, you know, like, maybe there is value in not going all the way to a big company, but at least a company
that can pay people well, still give them a good upside.
And so I thought of, you know, I'll start talking to some people who are, you know,
in the startup stage and see where they are.
And that's how I engaged with couple. And I was
like, pleasantly surprised to find out he's like almost a
perfect fit for what I was looking for. And the biggest
one is the open source part. Yeah. Like, of course, even even
in YouTube, we kind of had to push hard to open source with us.
For me, the success of an open source project is adoption.
Open source and adoption is true success.
And that's exactly what Paul believed in, Coppola.
So it was so perfect.
And talking to him about all the other values, he was like, at that time, it was
almost like a slam dunk obvious that now this is a place where I should be building this.
Nice.
So it's not exactly like the movies because in the movies, you know, he would have sent
out his henchmen to find you and you would have been like, Oh, I don't want to come back.
Yeah.
You would have been like fixing up some old boat or, you know, practicing Tai Chi
and like they would have found you and you wouldn't have wanted to, but you had
to, to save the planet, you know.
But in this case, you actually reached out.
Like you were, you were, you were ready to solve this problem once
and for all Postgres users.
And so that, that doesn't make sense.
I think that jives Adam with my impression of Paul and SuperBase. Does that make sense. I think that jives, Adam, with my impression of Paul and Super Bass.
Does that make sense to you?
Yeah, I like the fact that it's, you know,
Postgres Native, open source.
I mean, that's the roots that Paul shared here.
He's like, I think he said,
the board of directors will have to fire me
or pry this role from my cold dead hand.
One of those, something elaborate like that
to express his love for open source and his love for Postgres, Postgres native.
So how'd it go from there?
So he hired you, he's given you a team now,
and he says, have this on my desk in six months,
or how does it work?
I'm giving myself about,
about three to six months to come up with an MVP.
It may vary.
The reason why it may vary is because of my,
I don't want to compromise on the long-term plan.
So if for the sake of delivery,
I feel like I'm deviating too much from where I want to go,
I would rather delay the delivery.
But I don't think it will take us longer than
three to six months to come up with an MVP.
We have a very well-defined set of
objectives to reach for the MVP,
and I think we'll hit it all in about three to six months.
Then iterate from there.
The idea is that from there, it should only be iterations, no major changes.
That's the reason why I want to make the right decisions
today so that it becomes iterative from then on.
So that is the oversimplified plan,
but I'm pretty sure we'll have,
there are about five big features
that we want to launch in this MVP.
So is Go still on the menu then,
or does that not quite get you Postgres native enough?
I'm not sure if you can write.
Which one?
Go is.
Can you write Go extensions,
like can you write Postgres extensions in Go,
or do you have to use the C Go, or how does that work?
So that's the part that we are debating about.
I don't want to start a brainstorm here,
but I can tell you what we are talking about.
Sometimes what happens.
Yeah, just let me know.
So the one option we are looking at is CGO.
Like if you have an extension, can we link to it?
Call you through CGO is one thought that we are having.
The other thought we are having is, well,
AI is so good nowadays, can we like translate this for you
and run it as an extension within with us
is the other thing that we are debating.
So, there's the other crazy idea, which is not translate,
leave go and do it in C.
Yeah.
Like, directly, so all cards are on the table.
Gotcha.
The last one is the least liked one at this point,
but it doesn't matter.
No, whatever gets the right solution
is what we're thinking.
So that's fun.
It's got to be fun to be back at square one
to a certain extent with a technical problem in front of you now, like a big gnarly technical problem that's fun. It's gotta be fun to be back at Square One to a certain extent with a technical problem
in front of you now, like a big gnarly technical problem
that's gonna take three to six months for an MVP
and much longer if it's successful.
Just to be back at where you started,
but in a different context,
you had a different part of your life.
Yeah, and the best part is I had to fix all the mistakes
I made in the previous project. Oh yeah.
But what about the mistakes you're gonna make this time?
That's the problem.
You know this, V2 is perfect always, right?
That's right.
You're not gonna make any bad decisions this time around.
Yeah, that's gotta feel good.
Cause there's so many land mines
that you just can avoid this time around.
Yeah, yeah.
That's awesome. What's your team looking like? Who are you looking for? You're looking for
hardcore Postgres people or did you hire from the TESS?
There are some, so I cannot name names yet because some of them are still in the pipeline,
but it's a pretty exciting, pretty high caliber team. Some are older with test contributors.
That there's a diaspora of with test contributors.
Some are from there.
And some are actually people that were,
that have handled this type of scaling problem before.
So for whom we test is an obvious thing.
So it's a pretty high caliber team.
I think we'll be announcing the team very soon.
They're going to be kind of the founding team
of this multigrass project.
So we'll probably be announcing
in maybe about
a month or so. But it's hiring is looking very good at this point. I'm pretty optimistic.
What is implementation like in this new world? Is that still to be found out as you make
these choices that have long-term permutations into maintainability, et cetera? Like how
will, like I know how, you know,
for the most part, how Subabase works as a service.
But if I wanted to use, you know,
what you're calling multi-grass,
the test for Postgres essentially.
If I wanted to use that on my own outside
of a Subabase context, what will,
what do you know about the future thus far
to kind of showcase implementation details
for end users?
So in this, I should talk about something
that we did in Vitess, which we are not
going to do in multigrass.
Vitess, actually, is what you would
categorize as one of the most flexible
enterprise solutions you can ever think of.
If you weren't asked a question to Vitesse,
I want to change this very specific behavior of Vitesse,
there is probably a command line option for it.
So that made Vitesse extremely powerful and flexible.
That's the reason why Slack could adopt it and completely deploy it using their rules.
GitHub could use a completely different approach.
And I believe Etsy has an even crazier approach.
And Vitesse works for all of them.
But the problem it created by solving these things is approachability.
If I came in and looked at Vitesse, I wouldn't know, like as a layperson who was just, no,
oh, my MySQL is getting too big.
I want to use Vitesse.
It will take you a few months of learning how Vitesse works before you can start to use
it confidently.
So I realized that's a problem.
And so what I want to do in the multigrass is make it a lot more approachable.
Actually, at the expense of even reducing those flexibilities.
Because I think adaptability is more important.
Approachability is more important than flexibility.
And if we add flexibilities, we want
to hide them away from the user, that they shouldn't have
to find it until they need it.
So that is kind of the approach we are going to use.
And what that means is that it should be easy
for a user to just take a multigrass
and just deploy it in their environment
and it should just work for them.
Do you know how that will manifest inside a SuperBase
or is that more on the SuperBase product team to decide?
Like, will that be invisible to SuperBase customers?
Yes, absolutely. Yeah. So the, um...
Makes sense.
The idea is, um, what we are going to be building is a Kubernetes operator.
Uh, so the, uh, so if you want to deploy it yourself without Kubernetes. You may have to do a few things for yourself.
But if you just took multigrass and used the operator,
you specify, this is what I want,
and then the entire cluster comes alive.
And we plan to do that within SuperBase also.
So this operator will be deployed within SuperBase.
And all SuperBase has to do is, you know,
hey, bring up the cluster and we'll just pin it up for you.
And they have to be operating at significant enough scale
that they can act as a test ground for you, I would think,
or maybe they can't do that with their customers.
But I'm just thinking like you had YouTube
to build the test with, and that was like awesome, right?
Like you had a real scale database that had
consistent users, etc. And you could build the system.
I guess in that case you built it incrementally over time as you figured it out.
But building this in a vacuum, you got to have some sort of real scaled things to test against.
Maybe these are some of your your core team that will be announcing later that I'm sure they're going to come from scaled
up companies.
Yes. Yes. So there is a philosophy in, so we had that luxury and that's the reason,
I think that's the main reason why weESS succeeded, where others have failed, is because nothing
that could not scale could be even written in VITESS because the next day YouTube would
go down.
Right.
And we would be...
So we have created a few outages due to those things.
So if something was not going to scale, we would know right away.
So we had that luxury.
So nothing that went in VITESS, everything that that went into with us had to work at scale.
So that was one thing.
And the advantage we have is we have those learnings.
So we know what not to do.
Sure, but don't you want to test it as you go
and make sure that you're doing it right.
On the testing side, what we over time developed,
what that is something we had the luxury of not
having to do in the early days at YouTube, is we had to develop really, really, really,
really thorough tests.
The reason was because at some point of time, we would be making changes in the VITAS source
code and we had to be confident that people are going to take this, Slack is going to we would be making changes in the VITAS source code.
And we had to be confident that people are going to take this.
Slack is going to take this and run it.
GitHub is going to take this and run it.
It's going to come back into planet scale.
And there is no waiting until it goes live
to find out what is going wrong.
So we actually, in VITess, we had an extremely rigorous
testing policy, extremely strict,
with 100% code coverage, performance tests,
backward compatibility tests.
So that's actually one of the, it also
makes it hard for an engineer to contribute,
because you know, you change five lines of code.
Now you have to write all these tests
to make sure that you won't break anybody.
But the confidence is people now have developed the trust
that if the test releases code, I can just take it and run,
and it will not fail.
So I'm hoping to bring that same philosophy here which is to test thoroughly to the extent
that we are confident that this will work and we'll obviously still ease that into production.
Could you actually bring the test suite over or at least parts of it from the test? I mean
because that's a pretty robust test suite. If you could bring, if you could port that somehow
relatively easily.
Anything, yeah.
Anything we can copy from the test we are going to.
Yeah.
Definitely, yeah.
There are some parts which are, which we'll just copy.
There's no reason to change anything there.
So those parts we'll just copy.
Yeah.
Have you looked at deterministic simulation testing
similar to what Tersa is doing and
what was the other database chair that you had called out me?
Tiger Beel.
Tiger Beel, yeah.
Have you looked into that and is that translator, is the test base?
I need to look this up.
Okay.
But in the tests, we probably have the opposite, which is called the fuzzer.
Is that the same thing?
Similar.
Yeah, similar.
So we do run the fuzzer and the fuzzer does not miss.
As far as I know, it misses nothing.
I wasn't there when they implemented it at PlanetScale,
it was someone called Vicent who did that test suite.
He said that the fuzzer was so good
that it found bugs in MySQL.
That's good.
That's a pretty good fuzzer.
I have pretty high confidence in that,
but yeah, whatever it takes.
Well, we have a couple of past shows
we can recommend for you as well
that might give you some insights
to just some of the behind the scenes,
not exactly that and how to implement it,
but more fodder for your requests.
Yeah, yeah, definitely.
Well, it's all very exciting.
And I assume the licensing is gonna be similar to Vitesse
or similar to what SuperBase normally does.
Is there any sort of gotchas in there
in terms of the open source side of it?
Just like-
No, Apache is what we are planning to do.
And yeah, that's,
Vitesse was actually BST once upon a time
until we joined the CNCF and we changed to Apache.
But yeah, it's going to be Apache.
Does multigrass make sense in the CNCF as well or no?
It's an option.
It's definitely an option.
We even thought of it as either a fork
or could be a sub-project of WithS.
But then I felt that that will restrict its freedom right now.
Starting it as an independent project,
these are very difficult to see decisions.
And I think once we make these decisions
and bring it to a good shape,
we could look at actually moving it back to CNCF
as its own project.
Yeah, I think you want the space to reinvent and rethink
without feeling at all burdened
by your past decisions with the tests.
And I think, obviously copy and paste the stuff
that makes sense, but like don't make it a fork
where you're like basically starting at this foundational
which you may end up wildly different in implementation.
Or you might be the same,
but at least give yourself room to come there organically.
You know?
Yeah, I can't like,
that is something that was pretty obvious to me.
Although the fundamentals are the same.
So you can, even if you don't copy code,
you can copy ideas pretty easily.
So those don't change.
But when you get into the details,
there's so many small things.
And the thing is, if you don't do them the Postgres way,
that will keep nagging you.
You know?
You say, there is this thing where, you know,
I have written this, there's this if statement hanging there,
which is there only to differentiate
between Postgres and MySQL.
Right.
And so you don't want any of that.
None of that.
Yeah.
That kills the joy part, right?
It takes away the joy that you're trying to enjoy
what you guys are building here.
Well, it's very exciting.
Obviously you're just getting started.
So there's still a lot of question marks for us,
Postgres fans, it's just a lot of waiting,
seeing, you know, watching maybe the GitHub project
as you put it out there and the announcements as the core team comes together, you know, you get the Vitesse
Avengers, you know, get the whole team together.
But what else?
What else is on your mind or things that are important for folks to know before we let
you go? So the thing I feel is I feel a bit anxious that there's not much activity in the open
source project.
But you know, we may see cool, but we are paddling hard underneath.
Really, really hard.
So I will be actually publishing a few blogs about some of the thoughts.
There's one area which we didn't cover much of, which is the high availability and the
consensus part.
So actually my initial blogs will be covering those.
I feel like this is something, the database world kind of went a little bit astray with
mounted storage,
disaggregated.
I feel like that needs to be brought back in.
I feel that storage should be with the database
because database are IOPS hungry
and having your data local to your database
is very important.
And that story can be complete only
if you have a good consensus story where your data will not
be lost if you lose your node.
So that is something that we want
to solve really, really well in Postgres.
So that people, because I have seen the good days
of the database, where database like was screaming fast, you know. Like Slack and Wikita, they run like hundreds of
terabytes of database size, they run millions of QPS and their latency is
like you know millisecond, one millisecond, two milliseconds. That's like
that's the kind of latency they get.
And they can't tolerate any higher latency.
And I want to bring those days back into Postgres.
So that is kind of what I'm shooting for.
So are those things that Vitesse tackled
or did not tackle high availability?
Those are, yeah, those are like pretty,
very core to Vitesse.
Yeah, I thought so.
So those parts we are going to bring
into Postgres. Interesting.
Adam, anything on your mind?
Any other questions?
I'm thinking about neon, honestly.
I'm thinking about, you know,
just this kind of bigger picture.
I'm curious if this shakeup,
this change in the database world,
like for a while there,
we had obviously Planet Scale leading,
you know, massively scale MySQL, which was great for MySQL,
but not so great for the Postgres folks like us.
Then we've had SuperBase, obviously we've had Neon.
Neon was acquired by Databricks.
I don't know how that's going to shake out for a product
that's usable by the public or if it's a Databricks thing, this seems like a good time for SuperBase to do this
because there's change amongst the Postgres world.
I feel like this is the clincher really for SuperBase
to finally have this kind of feature set
that wasn't available unless they built it,
and why build it in a non postgres native way in a non open
source way given cobblestone Paul cobblestones way of thinking as a leader there.
I'm just curious your thoughts on that.
So it is just the thought of like neon postgres super base that shake up you coming back out
of retirement basically to avenge this scenario and shard
for life with multigress like help us unpack you know this shake up I suppose is this are
you excited obviously you came back but how does Neon's change change the landscape for
Postgres specifically to open the the floodgate for SuperBase's mission with Multigrasse?
Yeah, going back to why I started this project, right?
The reason why I started this project was because I believed, I still believe actually,
that there is no scalability solution for Postgres.
Not Neon, not Aurora.
I'm an advisor to Metronome, which is a Postgres shop.
They have one of the largest Aurora instances, and they are struggling.
They have no way out beyond where they are looking at moving data out of their Postgres
instance because they've hit the limit. And now they're saying,
oh, we need to do something about this scalability thing.
And that's the reason why I wanted to start this project.
And one of the things I thought is with
SuperBase, if not just like,
so this is a problem that is meant to solve it
for the industry overall.
In other words, neon could use multigrace if they wanted.
If they hit the limit of the machine,
you could deploy multigrace on neon and run it
unless there was incompatibilities,
no, that for which you had to make changes,
but even in those cases, the changes would be minimal.
So that's what I came for.
And for SuperBase specifically is that
if SuperBase is going in its trajectory,
there are going to be users that are going to max out a single instance very, very quickly.
As a matter of fact, I won't be surprised if some are already hitting those limits.
And their way out is you could go to Aurora, maybe,
and that will extend your runway by maybe at best 2x.
I would be surprised even if you got a 2x runway.
As soon as your QPS doubles,
you pretty much are at your limit.
So the only way out of this is a sharded solution.
That is why I started this project.
I said that, you know, like in MySQL,
if you hit your limits, you could go to VITESS,
but there is nothing like that in Postgres.
And VITESS for Postgres,
at even mediocre level of functionality,
will allow these people to stay in Postgres.
Otherwise, you have to find a way out of Postgres
once you've hit that limit.
Well, after you accomplish the test for Postgres,
maybe you can build the test for SQLite.
No, I'm just kidding.
I'm just kidding.
I'm just kidding.
I'm just kidding.
You know, for your second, after your-
Yeah, once he punched the screen, Jared.
He's like, I'll punch my screen right now.
Yeah, he's like, excuse me.
After your next sabbatical, you know,
the only thing that gets your back is
when Dr. Richard Hipp approaches you and says, He's like, after your next sabbatical, you know, the only thing that gets your back is,
Dr. Richard Hipp approaches you and says, Sugu, we need you.
We need you to Sugu.
Right?
Over here now.
That reminds me of a,
I think airplane was it?
Where, where there's a,
there's a background scene with a movie poster
which says Rocky 83 and a really old guy standing there.
Yeah.
You're going back in the day into the 80s to mention the movie.
I haven't seen an airplane for years.
I mean, like, don't call me Shirley, okay?
Exactly.
Good stuff.
Yeah, good stuff.
So thanks for coming back from retirement for this. I know that, I mean, considering what you just said, that there's no way out, that sharding
is the only way.
I mean, you're coming back at the right time.
I think you're coming back to the right team with the right motives at the right time.
And, you know, somewhat full disclosure, Jared and I are both angel investors in SwoopBase.
So we kind of have, you know, this desire for our Jared and I are both angel investors in Superbase.
So we kind of have, you know,
this desire for our own upside,
but like technologically,
this is gonna be amazing for Superbase
to essentially be one of the only leading at scale,
Postgres flavored, Postgres for life,
open source for life flavors out there to choose from.
And so I'm super excited about that for you and for them and for Postgres,
because we need it.
It's amazing.
I am excited to get building, man.
Get building. Get building.
All right. Thanks, Sugu.
Bye, Sugu. Thank you.
Well, friends, it was awesome talking to Sugu, diving deep into Vitesse,
diving deep into what that means for the test for Postgres, AKA multigress.
All the work that's going into it,
and what's to come for us Postgres maxis.
Postgres for life.
But yes, we are also in Denver this weekend.
If you're in Denver, around Denver,
wanna hang out with us on Friday or Saturday
for the live show, learn more at changelog.com slash live.
Big stuff is happening, don't miss it, come hang out.
And it's also free to attend for Plus Plus members.
It's better.
You know what, it's better.
That's why it's better.
changelog.com slash Plus Plus.
Big thank you to our friends at Code Rabbit.
Check them out at coderabbit.ai,
our friends at depo, depo.dev, course fly check them out fly.io that's it Thanks for watching!