The Infra Pod - Building a serverless full-text + vector search on S3! Chat with Simon from Turbopuffer
Episode Date: April 7, 2025In this episode of Infra Pod, Ian and Tim chat with Simon, the CEO of Turbopuffer, about his journey from working at Shopify to building the fastest vector store known to man. They delve into the intr...icacies of scaling infrastructure, the challenges with traditional databases like Postgres and MySQL, and why Simon believes in building databases on top of object storage. They also explore the trade-offs Turbo Puffer has made, including its unique approach to consistency and high write latency.00:36 Scaling Shopify's Infrastructure01:56 Challenges with Postgres02:19 Building Turbo Puffer16:30 Trade-offs and Performance of Turbo Puffer22:32 Importance of Consistency in Databases26:29 BYOC (Bring Your Own Cloud) Option32:03 Customer Use Cases and Integration35:11 Spicy Future!
Transcript
Discussion (0)
Welcome back to InfraPod.
This is Tim from Essence and let's go.
This is Ian, lover and builder of things that are secure and developer friendly.
I'm super excited Tim.
We've got Simon, CEO of TurboPupper, the fastest vector store known to man apparently on the
podcast today.
Simon, why don't you introduce yourself and tell us a little bit about how you got
into into building TurboPupper and why?
Yeah, absolutely. I it goes I mean, it goes way back. Ian, you
and I were just reminiscing about being two Canadians. I
came to Canada back in 2013 to work at what was at the time a
little e commerce startup called Shopify. And I spent almost a
decade there working on infrastructure, even before
there was much of an infrastructure team at all.
So basically just played bottleneck whack-a-mole for almost a decade, scaling
that platform as the Kardashians rolled through and sold merchandise, challenging
the platform on trial accounts and what have you.
So I spent a long time doing that.
When I joined, it was probably a couple hundred,
maybe a thousand requests per second.
And when I left, we had events in excess
of a million requests per second
and worked on more or less every single part of the stack.
Of course, the fundamental bottleneck
of most SaaS platforms like that is the database.
So I spent most of my time working somewhere
between the Rails level and MySQL,
building abstractions, sharding, moving shops around, balancing shards, like multi data
center, all of this kind of stuff. Some point we rewrote the entire storefront for performance,
for geographical distribution of all the shops and all this kind of stuff. So just used NGINX,
Redis and MySQL as my weapon for almost 10 years.
I left in 2021 and spent a couple of years kind of bouncing around, helping my
friends at their startups with various infrastructure scalability problems or
anything that they found interesting.
Turns out the biggest infrastructure challenge in 2021, 2022 is tuning Postgres
Auto Vacuum. And I was led to believe from the Orange site
that Postgres was this holy grail
after spending a decade with MySQL.
But it turns out it's extremely difficult to tune.
And so, yeah, spend a bunch of time doing that
and working with a bunch of other companies
on various scalability issues.
And that's where I discovered, I guess,
rediscovered that search is still difficult.
And there is sort of three separate events happening
around 2020 to 2022 that all led it to seem like a good time
to maybe sit down and rethink how you might do
a search engine.
So one company I was working with,
actually also a Canadian founder,
is this company called Readwise.
And essentially what they allow you to do
is when you're reading on your Kindle,
then it will import the highlights,
you can do daily reviews,
and now they have a reader product as well
that allows you to save articles, PDFs, read them,
interact with AI with them, and all these kinds of things.
And of course, naturally for something like that,
you might want to have recommendations
for the type of content that you wanna see.
And so this company was spending maybe five grand a month
in their Postgres storing hundreds of millions of articles
and all of this data for this application.
And so we started doing some experiments
when I was doing a little stint there
on doing recommendations.
And so we did some vector search, some embedding,
some chunking, and I was like, okay,
this actually, this is looking pretty cool
and it's pretty interesting. Let's run the back
of the envelope on doing this on some of the incumbent solutions
at the time. And it would have cost 25 to 30 grand a month. At
a big company, this is nothing. But for a company that's
spending five grand a month on Postgres, it's just, I feel
sort of whack that you're spending half an order of
magnitude more on the search, right, rather than than the
canonical storage.
So we kind of ditched the idea and put it in the bucket of,
wow, we'll return to this later when it gets cheaper.
Token costs are coming down, LLM costs are coming down.
Surely someone's gonna bring the vector search cost down.
When traveling with my wife for a bit,
and I just couldn't stop thinking about this.
So I just started, I was just reading papers
on how does vector indexing work?
How does all of this work?
Because what seems so exciting to me about vector indexing
was that at Shopify and elsewhere
that I've worked on search,
it's always been this problem of how do we take these strings
and turn them into things?
I'm searching for red shoe
and they have a burgundy sneaker, right?
That used to be kind of a PhD level search problem
to do ergonomically across lots of different queries.
And now any off the shelf embedding model
would just do incredibly well at that.
So we were kind of moving from this world of
you're gonna write a thousand lines
of like very well massaged JSON to express
to Elasticsearch or OpenSearch or Solr
some of these Lucene based solutions,
how we're gonna turn all these strings into actual things. to express to Elasticsearch or OpenSearch or Solr,
But with vectors, it was just so simple. It was like, oh my God, these things that used to be
incredibly difficult and we have to train the sources
to map burgundy and red,
and maybe you would even skip this art.
Certainly it was just very easy.
So that was the first thing that seemed worth paying
attention to was that search maybe seemed due for a little
bit of a revamp because it was now easier
to do semantic search.
The second thing that happened, right,
is that lots of companies around this time
wanted to start connecting their LLMs to their data.
And that's always been interesting
because semantic search in itself
has never really had a super large scale outcome
other than elastic search.
There's never been a ton of players,
but suddenly lots of people wanted to search
a lot more data.
And more importantly, now machines could do it, right?
A good database needs to be something
that's being queried by machines as well,
otherwise you're limited by the number of humans
that do searches.
And even Google is only doing tens of thousands
of searches per second.
Very few companies are doing much more than that,
including Shopify.
And then the third thing that had happened
is that the way that we can build databases today
is different.
There's like a series of events leading up to the architecture
that's working really well for TurboPuffer
and for a bunch of other types of databases.
But NVMe SSDs have gotten incredibly fast.
They came out around 2018 in AWS.
Of course, they've been around for longer,
but just absolutely phenomenal price performance and storage density.
The second thing that's happened was that in 2020, S3 finally became consistent, which
was launched at reInvent back then.
It's hard to even think about that now, but that actually had to happen and happened in
2020.
The other thing that needed to happen, which happened at the end of 2024 and now to that
reInvent, was that S3 finally became, had compare and swap. And with those three dependencies, it means that you can build online databases with only
object storage as the only dependency, just a bunch of binaries running and using object storage.
And so those three things, one, like being easier to do search, LLMs being connected to a lot of data
and then fundamentally being able to build databases with only
object storage as a dependency, seemed like maybe this is a
good time to write a search engine of the future.
I mean, that's pretty compelling. And there's quite the story.
One of the things you sort of started at is you started at
readwise, you're thinking about like the order of magnitude
around, like how much more expensive it was, the post-tress
vacuum compaction.
What is it about things like Elastic that make and post-tress, post-tress has things
like PG Vector, they has like PG full search.
Like post-tress has this ecosystem and the whole idea or what the PG ecosystem has been
selling people for a very long time is there's a wire protocol and then there's like a storage
thing and then it's all extensible and you'll have like one database to rule the world. And so it sounds like you have a very specific
take on why that is that's not the case and what the problems with Postgres specifically are.
I have my own view on Postgres when it comes from like vertical scaling when you get to like large
data sizes and when things fall apart specifically with the way that the write ahead log works and
like if you have high velocity edited rows like it turns into like this massive like vacuum
impaction problem. I'm curious, from your perspective,
what was it about Postgres that you were looking at
that made it cost so much and why it was like,
hey, this is actually the wrong tool
for this semantic search problem set in the first place?
Because I think everyone sits back,
and if you were to say,
the common nature from the average developer
who isn't deep in the space like you are,
is probably like, oh, I can just use Postgres for vectors or semantic search.
What are the things you basically end up to it, and when does it start to break?
Yeah, to be clear, the 25k to 30 grand a month was not PG Vector.
Back in 2022, when we were looking at that,
PG Vector just would not be able to handle that 100 million plus scale that we were looking at.
So that number is quoted from the cost calculator from what is now some of my colleagues in the space, right?
So let's talk a little bit about the fundamentals
and kind of the first principle observation that leads you
to build something like TurboPuffer.
I think my general belief, if I was on the SaaS side,
not building a database, but building a company,
is that I would also, pardon my French,
but abuse the shit out of MySQL slash Postgres, the canonical OTP store for as long as possible.
In fact, we're still doing that at TurboPuffer.
We just have a big like MySQL that we abuse for billing and analytics until TurboPuffer
can do that kind of thing.
And so I'm very much in favor of that.
But there becomes a tipping point where the complexity of over and overhead of adopting another database and ETLing into it ends up being the
right choice.
One of the first things that people rip out of Postgres as your company grows
is full-text search.
Um, the reason for that, and Ian, you're nodding and probably you've seen this
as well is that fundamentally updating and inverted index, right?
The index that you build
for full-text search, is a pretty expensive operation to do under the tight transactional
guarantees that Postgres provides. So you can get pretty far with that, but you can't get into,
you know, storing many terabytes of full-text search data in Postgres without the complexity
of dealing with all of that and the performance starting to tipping the scales
towards another solution.
And that's why people adopt open search or elastic search
or these other Lucene based solutions traditionally.
That and the concurrency of implementing a queue
on Postgres are typically some of the two first things
that you extract out to avoid having to shard
the Postgres instance.
So I think that's the first one. And there are just various workloads like that. Of course, the third thing that you extract out to avoid having to shard the Postgres instance. So I think that's the first one.
And there are just various workloads like that.
Of course, the third thing that you would adopt out other than Qs and search
is various types of caching, right?
So that's why you have a memcache or Redis or something like that.
But you kind of start with the big monolithic thing
and then you rip it out as the economics and the complexity justifies it.
So the other thing is cost.
Fundamentally, with something like Postgres,
you just need to store every bit on three disks, right?
A replication chain of two is a little gnarly.
I think very few companies have good enough backup mechanisms
that they're comfortable running with that.
So you're going to run on three replicas.
One disk, whether it's an EBS or a PD,
depending on your cloud provider,
costs somewhere between 10 and 12 cents per gigabyte.
You're not going to run these provisioned disks
at 100% capacity.
You're going to run them at closer to 50%.
So in reality, you're probably paying
somewhere between 20 and 24 cents per gigabyte
of data per disk.
But you don't have one disk.
You have three disks, right?
So it's three
times 20 cents or 24 cents. And so you just get close to 66 cents per gigabyte of data stored.
And of course, when you put in one byte, more than one byte is actually stored on the disk. You have
to build all these all these indexes and various other derivative data structures. So there's some
space amplification involved as well.
So every logical gigabyte,
it might turn into maybe 1.5 or something like that,
depending on how many indexes that you're building
and how compressible the content is.
So fundamentally, this just means that
if you're using any SaaS vendor,
they're gonna charge you close to a $1 per gigabyte
to make a little bit of money
and to have a margin for the operational overhead
and all of that to that.
And that's on disk, right?
But generally, if you're churning really hard on this
in Postgres, which you need for something like vectors
and a lot of search, you're storing a lot of that in memory.
And memory is closer to one to $2 per gigabyte.
So that's the legacy cost structure.
And that's what the traditional search engines also do,
replicating this around.
So I approach it from the angle of,
well, what's the fundamentally cheapest way
that you can store data in a responsible way in the cloud,
given the primitives we have today?
Well, it's just to put all that data on obiX storage.
ObiX storage is two cents per gigabyte,
lower than that at lower storage classes.
Of course, that's quite slow.
Like a cold query on something like TurboPuffer to a vector index with a million vectors is maybe
around half a second when you've really optimized it. But at the end of the day every round trip is
going to have a p90 of around 200 milliseconds. You have to design around that. But then you can
pull the data that's actually worn into an NVMe SSD. And most people don't even use NVMe SSDs for their Postgres
because if you restart the instance, often that's gone and it's operationally
just a little too scary unless you have a very mature operational
environment. Those NVMe SSDs are somewhere around 10 cents per gigabyte.
If you run them on a spot instance or commit to some usage, it can get all the way
down to 3 to 4 cents per gigabyte. And you run them on a spot instance or commit to some usage, it can get all the way down to $0.03 to $0.04 per gigabyte.
And you can utilize these disks at 100% utilization
with a multi-tenancy cache, right?
So you're not paying anything extra.
And then, of course, you can leverage your own buffer
pools in memory to also have 100% utilization
on the caching here.
And now you have this nice pufferfish architecture, right?
There you can inflate the pufferfish as much as you want to get the performance you need.
But if you're not querying the data, which most companies have some Pareto distribution
of what data is actually being cached, it just stays in obiX storage.
And even if you do query it once a month, it's half a second.
And that's often acceptable.
So for searching a lot of data, this is a phenomenal architecture because search can
often tolerate a tail latency of half a second,
and then once it's in cache, after loading into the cache
at a gigabyte or whatever per second,
there's no reason why we can't be as fast as any other solution.
Fascinating.
So we talked to the workstream team on our pod before,
and probably one of the most memorable things for me
was really about the trade-offs of actually building a system on top of S3
Because I'm sure not not the very first one
But it's probably the most recent memorable one that has been talked about a lot right Kafka with S3
There was a bunch of things to consider. There's a bunch of things that isn't really
people thought of why is building on S3 even possible, but nor why is a heart and
I'm reading her sort of like limitations and trade-offs
docs part of your sections.
And it's just super fascinating to me that
I think it's actually intuitively understandable now,
given the S3 nature, you have to choose certain things.
But I want to hear from you, right?
What is the trade-offs you guys are willing to take
or choosing to take?
Because I feel like this is not just an S3 trade-off here.
There are other trade-offs you're willing to kind of choose,
like configurable performances, open source or free tier,
like things you don't want to do at all.
So give us sort of like the belief here.
Why TurboBuffer wants to be known for doing this sort of performance characteristics and this sort of limitations?
And how does that fit finding the right customers for you as well?
Yeah, I think every database I wish had a trade-off section like we do in our docs.
I've tried to make our website feel more akin to a specification sheet than a marketing website.
Because I think it resonates with
the type of engineer.
Every time I go to a database website, there are just things I want to know, like, what
are you giving up?
What are you gaining?
Everyone's making fundamental trade-offs.
And as you sit in that five-dimensional set of trade-offs, every database sits a little
bit differently, right?
Our biggest trade-off is high write latency.
Every time you do a write and we return back
a successful response from the API,
that's committed to object storage.
Guarantees don't really get much stronger than that, right?
That's the F sync of 2025 is committed to object storage.
But the latency of that can be a couple hundred milliseconds
depending on how much data you're writing
and what mood that S3 partition is in.
And that's the biggest trade-off with TurboPuffer.
The other trade-off is that we do have tail latency, right?
Because not all the data is sitting on disks
and replicated disks at all times.
It would be very unusual to have a tail latency of 500
millisecond on an actual persistent disk or EPS volume.
But with Alpic storage, of course, you will have that.
So if you can tolerate that once in a while or you have some heuristic you can use in your application,
like pre-firing a query, which is, for example, what Notion will do, pre-fire a query to warm up the cache
and then the user is not waiting for as long because the cache is hydrated.
Those are the biggest trade-offs.
A third trade-off, also on the latency front,
is that by default, TurboPuffer is consistent.
It's a very unusual choice for a search engine.
I don't know of any other search engine that
is consistent out of the box.
And by consistent, what we mean is that if you do a write,
you insert Tim and Ian into the database,
and you do a query immediately after that write has returned,
it will be visible.
Most search engines don't do that.
They will refresh the inverted index on some periodic interval,
like a second or 30 seconds or whatever.
But we always return it immediately.
And in order to do that for a database
that only has
object storage as the dependency,
with no other dependency, no consensus plane,
no control plane, such as what WarpStream has,
is that we have to go to object storage
and make sure that what we have in the local cache
is the most up-to-date part from the index.
And that round trip on GCS, the P50 is about 16 milliseconds.
I know that P50 off heart because that is our P50.
And if you look at our traces, it's
maybe a millisecond of vector search time
and 16 milliseconds of waiting to make sure
that the cache was consistent so we return
a consistent response.
On S3, this latency is more like 7 to 8 milliseconds,
a little bit better.
I think that this will come down over time.
It's essentially that Spanner floor latency that sits
in front of Google Cloud Storage.
You can turn that off, and we can give you still very, very
strong guarantees.
Probably 99.999% of the time, you're
going to get a consistent read.
But I feel very strongly that you
should design a database for consistency
first of our nature, because it's very hard to walk that back.
Those are the trade offs that I see them.
The biggest ones.
I'm really curious, like to dig into the consistency thing for a second,
because, you know, broadly in our market, like we go to the top level
of the database space, you kind of have like operational data turns,
you know, asset compliant transactional stores, and you've got sort of like
OLAP analytical stores and these've been in these different worlds.
And you have the like magical HTAP thing
that people have talked about, like the folks from Cedar DB.
I'm like very curious.
It makes sense to me that oftentimes
people have thought of like full tech search.
So things you put in Elasticsearch
is like an analytics workload,
but vector stores, like in the use case of an LL,
I'm building a chat bot, I'm doing something.
And especially if you're building like an app where you kind of have your, your asking
compliant, you know, soar, then you have your specific thing like a turbo puffer is like,
well, actually when I talk to the agent and I update the agent some state that agent,
however it's getting that state, they ultimately want to like return to the user a consistent
experience from my perspective.
I would love to hear your perspective.
A lot of use cases on full tech search have been like long-term analytics workloads where it's like, it's okay if
it's out of date by a minute, 10 minutes, an hour, a day, 30 days, depending upon where you were
applying it. And so do you think this is like a net new thing that's unique to the application
of building like basically AI apps where you're doing a lot like RAG needs to be up to date,
needs to have up to date,
needs to have very update data so that you're actually returning
a consistent experience to the user.
What's driving your view on why consistency is really important?
Because the way you describe it sounds more like your view is
we're actually building a semantic search database
for a world that requires consistency for a world where
actually search is core to the operational experience of the user. Is that true? Because that's what I'm reading into.
I'm kind of curious to get your perspective.
Yeah, there's two reasons why we do the consistency
by default.
The first is that it's just too easy to imagine use cases
where it's very useful to have consistency.
An example in the LLM case, right,
is that someone is uploading a document or something
like that,
maybe a huge document or indexing an entire Google Drive or whatever into TurboPuffer.
How is the user supposed to know that that is searchable other than I've inserted into TurboPuffer?
Now you have to have this whole separate API to do that.
And that's a very common use case. When you open up Cursor,
it will index the entire codebase into TurboPuffer.
And it's very useful for them to know that when all the responses have come back from TurboPuffer,
it is indexed and it is retrievable.
So if you're uploading a PDF and you want to chat with it,
well, you know, when you've uploaded the PDF,
you expect it to be searchable, right?
In the data set of what you're ragging over.
So that's one.
I think also it expands use cases.
So to give you an example from the Shopify days, of course, there's two big search use
cases in Shopify.
There's the shop app, and then there's searching on a storefront.
Searching on a storefront is probably the most common one.
And that's okay to be a little bit delayed.
A user or buyer doesn't really know if the merchant has added a product a minute ago
or 10 seconds ago, but it decreases the
utility of the solution if it's not consistent.
Because imagine, for example, that you have a collection and you define that collection
based on a search query, and you're using those collections as you're managing your
inventory and doing merchandise.
Well, if you can't really rely on, I've inserted the document and it's visible in this collection
as a core primitive of the database, it limits what you're building and the document and it's visible in this collection as a core primitive of the database.
It limits what you're building and the experience of it.
So it really just boils down to from a product perspective,
it's just too easy to imagine use cases for consistency is really, really useful.
The second comes down to recall.
What we see ourselves is a broader search engine doing both full text and vector search.
For vector search in particular, a core metric is recall.
Recall is essentially a measure
of how accurate your index is.
The only way to know which vector is closest
in vector space to another vector for sure
is to exhaustively search through every single vector
in the corpus.
If you have a million vectors of maybe 768 dimensionality, that's around 3
gigabytes, right? You could search 3 gigabytes in probably around 100-200 milliseconds depending on
how fast your machine is, but that's maxing out a machine. So you'll be able to do just a couple
of queries per second there. So what you do instead, instead of exhaustively searching through all of
it, is that you build an approximate index. And that index essentially has a score between zero and one
for how accurate it is.
That accuracy is defined as, okay,
we know that these are the 10 closest vectors
in vector space to this query vector
by exhaustively searching through it
in those 200 milliseconds.
And here is what our faster index got back in one millisecond.
What is the percentage overlap?
And that percentage overlap is what we refer to as recall.
Recall is a very important metric for a vector database.
If your recall is 10%,
it means that you're doing a very poor job
and products built on top of you
are not going to show relevant results.
We generally see among our users
that they like something above 90%.
That's a good balance between performance and accuracy.
And so to return back to the consistency point,
recall is a pretty difficult thing to tune for,
especially in the types of approximate near neighbor index
that we've chosen.
And we were all very scared, essentially,
that we would, if we saw recall and production that was below 90%,
just sort of like put it under the rug as, oh yeah, it's inconsistent.
So probably just what happened was that it was as of that LSN in the wall on one
node and the other node was evaluating it, it was probably out of date.
It's fine. But when it's consistent, there's a simple model and you know that
that is the accurate number. So you can't explain it away.
The third reason to be consistent is operational.
If you are consistent, it means that every time someone does a write, before you merge
it into the indexes, both the full-text search and the vector index, you have to exhaustively
search and apply these writes onto whatever is in the index.
This is how every database works.
If the indexing falls behind, you start getting errors, right?
Someone probably gets woken up
because the system is not working.
If you're eventually consistent,
and we've seen that with some of the other
newer entrants to this space,
you might have search results
that are hours and hours out of date
because no one's getting woken up
and it becomes a bit of an operational crutch, right?
Things get fixed if people get paged.
That's a lot of pain to induce to ourselves,
but we think we have a real responsibility
to our customers.
So to recap, one, it's easy to imagine use cases
where consistency is useful.
Two, you can't explain away recall
because it's very simple to reason about
that the results are consistent.
And three, operationally, it induces pain directly on us and people get
woken up if the results are not consistent, right?
And so coming from our previous episode with Chris, we're all debating around
BIOC, you know, and I think seeing that you have to bring your own cloud
options, I was looking at other vector databases.
It seems like most people are offering sort of BIOC options.
I still think not everybody truly believes
in we have to offer BIOC,
or at least have to offer BIOC early, right?
And you're relatively so early in the database journey,
right, but you're choosing to kind of bring BIOC
pretty early right now.
Can you talk about like,
are you seeing that
from a customer's poll as a major demand?
And also is there any other trade-offs
you're willing to take here?
Because I think folks, you know,
sometimes like I want to delay it as much as possible
because I want to have the fastest way to do manage,
multi-tenant so I can have cost sharing and stuff like that.
What's your take on BIOC?
Like, is it something you are forced to
or you actually do believe this is probably the right choices
for your customers to?
Yeah, so I'd say that BIOC and having to do it so earlier
comes out of talking to our customers.
And I think that early on, vector database is that success
because the vectors were sufficiently obfuscated
that even large companies were comfortable putting this data
into relatively small and newer companies
because it's very hard to go back to the origin data.
But as you expand into things like full-text search
and having real customer data and the actual text of it
there, it starts to become a more dicey customer
conversation for our customers to have with their customers
about what are you doing with our data exactly?
In plain text, that starts to get more tricky
before you've built up the type of trust of a GCP,
AWS or Snowflake or others.
I think that for analytical databases,
it can be a little bit easier to sort of separate
the data out, like what are we ETLing out
into Snowflake or something else. With text fundamentally, you're storing the data out, like what are we ETLing out into Snowflake or something else?
With text fundamentally, you're storing the plain text, right?
You're storing like the company secrets right there,
depending on the use case, right?
So that seemed like a fundamental
from first principle reasoning that resonated with me.
But I think BIOC is also a little bit
of a chicken and egg problem, right?
I come from a background of operating
very large scale multi-tenancy for Shopify,
which of course doesn't have a BYOC offering.
And that's what we're really good at.
And operating a big SaaS is something we live and breathe
and BYOC operations is quite different.
So we prefer to have everyone on the SaaS solution, sure.
But there's a chicken and egg here on the trust, right?
You know, Databricks also build up their trust with BYOC and then now I think are really pushing
people towards the serverless platform and they can do that, right? No one's getting scorned in
the public markets because you're trusting your customer data with Databricks anymore. They've
just been around long enough. So I think that's another component to it. A third, of course, is
that you can use your negotiated discounts and things like that
alongside it. That's a pro that we see as well. The fourth reason why we're doing it early is
because we can. And the reason we can is that I was on the last resort pager of Shopify for
six years or so. And that changes how you write software forever. Because you've been paged by
every conceivable system, every conceivable interaction
and you just write software differently. One of the things that we do differently is that
obiX storage is our only dependency. We're just a bunch of Rust binaries on stateless nodes
talking to obiX storage. With that model, it's been very easy for us to operationalize turbo
puffer, right? If a node is about to run out of memory,
we can just put it onto a bigger node very, very quickly,
because there's no state on that box and it seamlessly moves.
We can auto scale very quickly. We can just double the number of nodes and
there's no rebalancing of petitions or tablets as in a traditional database.
It just starts hydrating from Mopic storage. And so,
operationally,
this has just been one of the best systems to run
that I've ever run in my career.
And that gives us the confidence
to also let other people run it.
We do our BYOC in a way that is what I would have bought
when I was on the other side of the fence,
where we're also on call for our customers clusters.
And we help them operate it as much as we possibly can.
I think that BYOC is still very much in its infancy.
I think most people, if you ask them, what is BYOC?
If you ask 10 people that,
you're going to get 10 different answers.
There are various solutions out there
that will try to package up the billing
and the control plane components of it.
We decided to do that completely on our own
for a variety of reasons.
And I think every flavor is gonna look slightly differently.
But fundamentally, what I do believe is that
because TurboPuffer has these stateless workflows
and Obliq storage is reliable and ubiquitous,
we're in a very, very good position for it.
I think complexity is the antidote of very good position for it.
I think complexity is the antidote of a good BYOC solution.
And that's why Ritchie and WarpStream did so well with that model
is because they didn't even use disks as far as I'm concerned, right?
So they're even less stateful than we are. I think it's more difficult once you start having to operate consensus and things like that in your customer's accounts.
That's super interesting.
I mean, I just pose another question I really want to ask you going back to Postgres as
an example, right?
Because like if you typically think as a developer, it's like, I'm going to like have my giant
Postgres database, I'm going to have my PG factor, and I'm going to have all these add-ons
and like, I'll build all these triggers and all these things.
Like it's all consistent.
And like when I insert like a document, I make an insert, it like everything's all up to date and I'll build all these triggers and all these things.
When I make an insert, everything's all up to date
and I consistently solve.
Obviously you're used to it outside.
You're building on top of Object Store.
You've built this incredibly fast, consistent system
for the state that is in the system.
And you provide all these incredible guarantees.
What's the architecture that you see customers using
to deploy you to get data into TurboPuffer?
And what's the relationship between that
and all the different types of data sources?
I'm sure some data sources are cold storage,
they don't change very often.
Then you have other data sources that are high velocity
in terms of the high rate of change.
And how does that impact how TurboPupper kind of sits
in the broader customer's data ecosystem?
I know you like Cursor as an example as a customer,
and you don't need to tell us anything
about Cursor specifically,
but some of these cost companies have very different use cases. Like I noticed you like Cursor as an example, as a customer, and you don't need to tell us anything about Cursor specifically, but I mean,
some of these cost companies have very different use cases.
I'm kind of curious how TurboPuffer sits
into these different use cases
and how the parameters of those use cases kind of change.
Yeah, I think TurboPuffer right now
is still extremely opinionated in the simplicity
that we provide as a product.
The way we talk about this internally is that
I'm always reminding the team that we're not
in the business of ergonomics yet.
We're still too much in the infancy of our product
to start doing that.
And it's also far too early for us to start bundling
integrations and others.
It moves and dilutes the focus of me and Justin,
my co-founder and CTO,
and the rest of the engineering team.
Our customers are extremely smart, extremely capable, and one of the things that most companies
have an idiosyncratic version of is some kind of ETL pipeline.
That's not a business I'm particularly interested in getting into.
I just want to make sure that we plug in really nicely with that and they can pump in hundreds
of thousands of vectors per second, which is what's possible with TurboPuffer, right?
We store more than 100 billion vectors and we do almost half a million vector writes of thousands of vectors per second,
How you do that, we don't really get into the business of. You have to create your own embeddings.
You have to do all of that.
We're completely focused at this point in time on just creating a phenomenal
first stage retrieval search engine.
We are not in the business of running rerankers, running on GPUs, creating embeddings,
like helping you ingest the data and all of that.
We think that the right people right now that are adopting it just are okay with just a sharp tool, not a framework that's trying to do everything.
I think that AI has had a lot of that and like a land grab of like,
we got to own the workflow.
No, we just want to do a really good job of building a dumb, cheap,
very simple solution that can take hundreds of billions of documents and full-text search
and vector search them. And I think TurboPuffer is one of the best solutions for that type of scale
for exactly that. But we don't have a simple language thing you just drop into Spark and it
just pumps things in. But generally, if you talk to the teams and our customers, that's not where
their pain is, right? Their pain is that they want to search across way more data than they can right
now, but they can't 10 X their bill on open search.
They want to search through like just enormous amounts of data and connect into
LLM and connect them with their customers and just nothing else works for the
economics of that scale.
There are very few companies in the world that can earn a return on storing full
text search and vector indexes in memory. Very, very few. But there's lots of companies that
can ship useful product, searching billions and hundreds of billions of documents from
their customers at a reasonable price.
Awesome. Well, here's for our favorite section of this podcast, we call it the spicy future.
I think you already kind of talked some level of your belief or S3 system and stuff, but
we'll leave it open for you.
What is a spicy hot take or just any take that you believe in that you think that war
may not have fully believed in yet?
I think there's a lot of excitement right now
about building databases on top of object storage,
but I don't think that there's a ton more databases
to be built on top of object storage
as simply as you can do with WarpStream and TurboPuffer.
I think that search and streaming are useful.
The time series in OLAB, we've done it for a very long time.
And I think the nascent fifth category that we see a lot of are these hyper specific databases within companies that they write for a very singular workload.
And I have friends inside of companies where they just, OK, I just need to store this exactly thing.
I know exactly how to compact it.
It's going to be way simpler, way cheaper for us to just do this ourselves.
And I think that's a phenomenal things that that's unlocked from it.
I think there is a lot of hype around building databases on obiX storage, but I
don't think there's a ton more categories here would love to be proven wrong.
But that's my general take on this category.
That's super interesting.
I feel like I don't think the world even knows what is possible,
not possible, right or wrong. It just seems like, hey, everything should be S3 backed,
right? Everything should be in a sort of serverless, like all these turnouts are just being thrown
out there. And is there a particular type of example, like a database you saw, we don't
have any names, but like a type of nature database or being built with S3 right now,
you feel like this is not going to go well.
I think I'm not calling any names and it's not something I've spent an enormous amount
of time thinking about.
But again, if we go back to the scar tissue of being on call for databases for as long
as I have, I think that there are certain databases where it's too hard for me to reason
about what's going on at 3 a.m. that I feel good about operating it.
That may be a bit of Stockholm syndrome to that database because maybe these big distributed
databases that are more complicated actually makes up for it in that complexity.
But I think there's kind of like three types of relational databases.
There are the big distributed ones and some of them are so mature now
that they may actually be a really, really good fit.
But for me, as someone that has all the scar tissue
of operations, they can be a bit scary
because they're hard to reason about.
But I think they may have crossed the threshold now
when maybe they're so good that it doesn't matter anymore.
But it's not something I'm super educated on.
The second one for relational databases
that we see now, more relational databases that
are backed by obiX storage.
The performance profile of that is kind of scary to me,
again, in the 3 AM scenario.
That's why I like InnoDB, MySQL, just so much.
It's very simple.
I can reason about it.
I know exactly what's going on.
Yes, I have to do all this sharding and stuff,
but it's a predictable quantity.
So I think that that's an area of relational storage is an area where S3 is being applied
to where it's not particularly appealing to me as an operator.
Awesome. Well, we have so much more we could ask. But as you know, we always want to go
over time. Where do people can find you or turbo buffer? Like what was the best place
for you for folks that actually get interested to try it out? Where do people can find you or TurboBuffer?
What is the best place for folks to actually get interested to try it out?
Yeah, turbopuffer.com.
If you have a good use case, you just hit the website and just apply and describe your use case.
Usually if it's a good fit, we're better at Rust than React. But it's not in beta.
It's running some of the biggest production workloads in the world.
We just want to be very close to our customers.
The second is you can find me on X or Blue Sky or whatever.
At Syrupsen is my handle, S-I-R-U-P-S-E-N, on those areas.
And then my website is syrupsen.com.
You'll find lots of napkin math blog posts and things like that on there.
Amazing. It's been so interesting talking to you, Simon.
Super informative. And it's always great to meet another Canadian infrastructure founder.
Thank you so much for joining us.
Thank you so much for having me.