The Changelog: Software Development, Open Source - Typesense is truly open source search (Interview)
Episode Date: September 9, 2022This week we're joined by Jason Bosco, co-founder and CEO of Typesense — the open source Algolia alternative and the easier to use ElasticSearch alternative. For years we've used Algolia as our sear...ch engine, so we come to this conversation with skin in the game and the scars to prove it. Jason shared how he and his co-founder got started on Typesense, why and how they are "all in" on open source, the options and the paths developers can take to add search to their project, how Typesense compares to ElasticSearch and Algolia, he walks us through getting started, the story of Typesense Cloud, and why they have resisted Venture Capital.
Transcript
Discussion (0)
This week on The Changelog, we're joined by Jason Bosco, co-founder of TypeSense, the open-source
Algolia alternative, and the easier-to-use Elasticsearch alternative.
For years, Changelog.com has used Algolia as its search engine, so we come to this conversation
with Skin of the Game and the Scars to prove it.
Jason shared how they got started on type sense,
why and how they are all in on open source,
the options and the paths developers can take to add search to their projects,
how type sense compares to elastic search and Algolia.
He walks us through getting started the story of type sense cloud and why so
far they have resisted venture capital for our plus plus subscribers.
There is a bonus six minutes
at the end of today's show for you.
If you're not a Plus Plus subscriber,
hey, head to changelog.com slash plus plus to join,
directly support us, drop the ads,
and get access to bonus content on our shows.
A big, big thanks to our friends and partners
at Fastly and Fly.io.
Our pods are fast to download globally
because, hey, Fastly is fast globally. Learn more at Fastly.com. And our friends and Fly.io. Our pods are fast to download globally because, hey, Fastly is fast globally.
Learn more at Fastly.com.
And our friends at Fly let you put your app and your database closer to users all over the world.
It's like a CDM of your entire application.
Check them out at Fly.io.
This episode is brought to you by our friends at Fly.
Fly lets you deploy full-stack apps and databases closer to users,
and they make it too easy.
No ops are required.
And I'm here with Chris McCord,
the creator of Phoenix Framework for Elixir,
and staff engineer at Fly.
Chris, I know you've been working hard for many years
to remove the complexity of running full-stack apps in production,
so now that you're at Fly solving these problems at scale, what's the challenge
you're facing? One of the challenges we've had at Fly is getting people to really understand
the benefits of running close to a user, because I think as developers, we internalize as a CDN,
people get it. They're like, oh yeah, you want to put your JavaScript close to a user and your CSS.
But then for some reason, we have this mental block when it comes to our applications. And I don't know why that is. And getting people past that block is
really important because a lot of us are privileged that we live in North America and we deploy 50
milliseconds a hop away. So things go fast. Like when GitHub, maybe they're deploying regionally
now, but for the first 12 years of their existence, GitHub worked great if you lived in North America.
If you lived in Europe or anywhere else in the world, you had to hop over the ocean and it was
actually a pretty slow experience. So one of the things with Fly is it runs your app code close to
users. So it's the same mental model of like, hey, it's really important to put our images and our
CSS close to users. But like, what if your app could run there as well? API requests could be
super fast. What if your data was replicated there? Database requests could be super fast. But if your data was replicated, their database requests could be super fast. So I think the challenge for Fly
is to get people to understand
that the CDN model maps
exactly to your application code.
And it's even more important
for your app to be running
close to a user
because it's not just requesting a file.
It's like your data
and saving data to disk,
batching data for disk.
That all needs to live
close to the user
for the same reason
that your JavaScript assets
should be close to a user.
Very cool. Thank you, Chris. So if you understand why you CDN your CSS and your JavaScript,
then you understand why you should do the same for your full stack app code.
And Fly makes it too easy to launch most apps in about three minutes.
Try it free today at fly.io. Again, fly.io. so decent bosco is here to school us, I guess, give us a glimpse into building a search engine,
the algorithms behind it, not taking venture, making it open source, a ton of fun stuff.
One of the co-founders behind TypeSense.
Jason, nice to see you.
Welcome to the show.
Thanks, Adam.
Thank you for having me.
This is exciting to be on the show. Thanks Adam, thank you for having me. This is exciting to be on the show. We are excited to get schooled about search engines, open source things, and all the stuff
Adam just listed. Specifically, what's going on in search engine land? It seems like there's lots of
interest and hype around open source search engines, Elasticsearch, etc. And I don't know
if I just, my thumb's not on the pulse of search,
like what's going on these days.
TypeSense looks cool.
I wonder what else is out there.
People are always working on making better wheels,
and we've had plenty of them along the years.
Jace, maybe tell us how you got into search,
and then give us maybe the lay of the land
of what's going on and what's kind of innovative
in the search space.
Yeah, so we got into TypeSense in 2015
and the lay of the land back then was
Elasticsearch was,
and maybe it still is,
the dominant player in the search space.
So pretty much think about anything related to search
and you will eventually land on Elasticsearch
because they have so much content out there.
And it's a super well-adopted product.
So that's why we were in 2015.
I was working at Dollar Shave Club.
My co-founder, Kishore, he was working at another company in the search space,
or in the space which required search as one of the tools that they needed.
And we were just quite frustrated with how complicated it was to get Elasticsearch up
and running and scaling it and fine-tuning it.
In my personal experience, I've had at least two engineers
spend a week or so every two to three months
fine-tuning our Elasticsearch clusters as we scaled it.
And it seemed like there was just too much machinery
that needed to be handled to get search working.
And our use case at Dollar Shave Club
was seemingly pretty simple,
which was to be able to search for customer names,
emails, addresses, phone numbers
when they write in or call in in our support
for our support agents to look up customers.
So it seemed like a pretty simple use case,
but then the amount of effort it involved
to get that going seemed out of whack with the
feature holds. It seems simple. So anyway, so that's how, just out of that was how we started
up with the idea for TypeSense. So it was more like, you know, what would it take to build our
own search engine? Like something that's simple to use. So it was a little naive at that point.
It's, you know, it's something like, something like, can I build my own huge piece of database
or huge piece of software that people have spent decades working on?
Can we build our own?
But we stuck with it.
We started reading up papers on how search algorithms work
and what goes into building a search engine.
And now looking back, I see how much work is involved in building a search engine.
It's been a long time since 2015.
Oh, yeah.
It's been seven, eight years now.
So now I know how much work is involved,
so I'm glad that naivety is what helped us bridge the gap
of, okay, let's just stick with it.
And it started as more like an R&D project,
as the nights and weekends thing.
So there were no time commitments or deadlines we were trying to hit.
It was just, you know, chipping away little by little.
And so even though we started working on it in 2015, 2018 is when we actually got to a stage where we were like,
okay, now it's good enough to be used by maybe someone other than just the two of us.
And in 2018 is when we open sourced it. And one of the bets that we took at that point in time
was we wanted to put all the search indices in memory,
whereas in 2015,
disk-based search indices were the norm.
That's what Elasticsearch was doing.
And there's another search engine called Solar,
which actually predates Elasticsearch.
And everyone used disk-based indices
because disk was cheap, RAM was expensive.
But then what we figured at that point
was RAM was only going to get cheaper
as years rolled by.
And we said, let's put the entire index in memory.
So that, of course, the trade-off there
is that you get fast search
because you put everything in memory,
it's as good as it's going to get in terms of speed.
But the trade-off is if you have petabyte scale of data,
then there's no petabyte scale RAM available today
unless you fan it out across multiple machines.
Of course, AWS, for example, has a 24-terabyte RAM VM
that you can spin up, but it's still expensive
compared to a 24-terabyte disk.
So I think that's the sweet spot where we figured TypeSense would fit is if you have
massive volumes of data, like for example, logs, application logs, or tons of analytics
data that, you know, it would be very expensive to put on RAM, then use disk-based search.
And that's where, you know, LASIK search and Solr play in.
If you want instant, we call it searches you type,
that's where something like TypeSense fits in.
You can put everything in memory and get fast search results.
So that's how we started working on TypeSense.
And after that, once we opened sources in 2018,
it was just a matter of, not just a matter of,
but we were just listening to what users were telling us,
and just adding features one by one.
And another interesting thing that happened in parallel is that there's another search
player called Algolia, and they have pretty, they're a closed source SaaS product, but
then they have very good visibility among developers because many documentation sites
use Algolia because they give it out for free for open source documentation sites.
If a developer searches for any documentation,
it's usually, you'll see a little powered by Algolia logo
and that does work very well.
And Algolia is a fantastic product,
but something that ended up happening
was they kept raising their prices.
And then Algolia users started discovering TypeSense
and started asking us for features that existed in Algolia.
And then we started adding those.
And then we eventually got to a stage where we were like, okay, I think now we have a sufficient number of features where we can call ourselves an open source alternative to Algolia.
And I think that resonated with a lot of people because Algolia is a very good product, very well known, and solves for actually many of the pain points that Elasticsearch has from a developer experience point of view.
So they essentially simplified search and then spread the word around that, hey, search
need not be this complicated.
And then once we started saying we're an open source alternative to Algolia, people quickly
connected that, okay, this is what we're trying to do with Titans as well, which is good developer
experience, fast search, and
easy to use
and to get up and running with the search experience.
So then we started seeing good traction
and then
at that point, we had... And then people
started asking us for a paid
version where we hosted
for them because they didn't want to host
it. That's when we realized
we have a business model in front of us
and people are telling us that they will pay us if we had this.
And I just couldn't let the opportunity go by
and we quit our full-time jobs and started working on it full-time in 2020.
Okay.
Exciting.
Very exciting.
Yeah, yeah.
I compressed what I think is five years worth of ups and downs that's a good
compression algorithm you got there yeah on your history it's about one minute per year good job
but yeah it was it was a fun journey yeah if your code is anywhere near as good as this then uh
you'll be in good hands with typesense so gosh where do we go
from there first of all i was just thinking back that you mentioned the naivety of like it couldn't
be that hard you know like how many businesses are started with such statements as which reminded me
i mean before we started recording you're talking about the hacker news crowd and and how often you
see those kind of statements on hacker news News, when somebody releases their product, I could build this in a weekend. And it's like, first of all, no, you couldn't. But second of all, we all understand that sentiment because that core thing, of course you could do in 48 hours, maybe 72 hours or whatever it is, but you're so far from finished at that point. You almost have to have, whether it's arrogance or naivety or a combination or
whatever it is to say, I'm actually going to try this and, and get started and get going and have
a different idea. Like, Hey, what if we put everything in memory to even start on a journey
that's going to take seven years? And I'm sure it's, it's just getting started, right? Like you
guys aren't at your, at your finish line. You're like just barely off the starting line. So
it's always cool to see when a story like that comes to fruition,
even though it's like so often that is the story.
It's like, yeah, it couldn't be that hard.
And now seven years later, you're like, actually,
it was really hard and it still is.
Yeah, yeah.
I think what I realized, like you said,
maybe the core of stuff you can get done in a weekend or whatever unit of time, which is smaller, is maybe a little closer to what you have in mind.
But the iterations on top of that to actually make it a product that someone other than you can use, that is what takes so much effort.
And it's not even just effort on your side.
Of course, you have to invest a lot of time, but it's also interacting with people
who are using the product other than you,
getting that feedback,
and then iterating based on that feedback.
I think that is what takes a lot of effort and time.
Because, so even if you were to iterate by yourself,
you know, for whatever X amount of years,
I don't think the product will be as mature
as being able to trade with actual people using it and giving you feedback. So case in point,
for example, for us, so at one point, we tried an open core model with TypeSense, where there
were some features where we held back from the open source version and said, you have to pay for
a premium version. And then eventually, we did with it because what we realized was the features that were
in the open source version, more people were using it and giving us feedback.
So it was generally more stable and more feature rich than the features that we held
back because less number of people were actually paying for it and giving us feedback.
So ironically, the features that the closed source features that people are paying for
ended up being the ones that had a little less stability and less maturity.
And that's when I realized, okay, this is hurting us by keeping some parts closed source
because people are just not adopting it as well as we'd like.
And at that point, we just open sourced, you know, 100% of TypeSense.
And after that, we uncovered a series of bugs
in what used to be the closed source features.
And we quickly addressed them
and people started asking us for more features
in line with what we already had,
like improvements on those features.
And it suddenly skyrocketed the amount
of how useful those closed source features were
because people kept asking for more things on top of that.
So I feel like that is actually a good example of where product maturity comes from actually talking to users
and iterating based on that rather than just building it yourself and thinking that it's going to be awesome.
I think that's needed in the beginning because you need to have a point of view on what it is you're building and define that.
But after that point, I think talking to your users
and getting feedback and building based on that,
I feel like that has been our superpower
as not-so-secret superpower, I guess.
Yeah.
Since we're on the note of, I guess, licensing to some degree,
it's GPLv3 licensed.
Yeah.
Yeah.
Yeah, so we initially started with the GPLv2,
and then someone pointed out that GPLv2 was not compatible
with some of the licenses, so we changed it to GPLv3.
But still, we stuck with GPL instead of MIT or Apache
because, at least in my opinion, GPL is an open source license
which encourages other people modifying the source code
to contribute that back.
And of course, that's a big debate
whether what is open source.
But my philosophy at least is that
if you're taking advantage of an open source software
and if you're modifying that software,
then it's only fair to ask you to contribute that back to the community
versus taking it closed source versus something like an MIT license or Apache.
What I've seen happen is open source projects end up getting modified
and then those modified version ends up getting closed source,
which kind of goes against, you know,
it's almost like a take and not give back
model. So that's why we've kind of stuck with GPL. And of course, there's a more stringent
version of it, which is AGPL. And that, it seems like people tend to avoid as much as possible.
Like I've heard, for example, at Google, they just don't use AGPL-licensed anything.
I've heard that as well.
Yeah. And ironically, I was on that side of the table at Dollar Shave Club, for example.
I was the one who had to say no AGPL-licensed software.
Because just during every round of fundraising, for example, the lawyers would ask us, give us the list of all your open source software
using all the licenses.
And if there's anything that's AGPL or anything that's not, you know,
a little off than Apache or MIT, we'll get asked questions.
Why are you using this?
And then more discovery into, are you using it the right way?
Did you modify it?
Like just a lot of conversations need to be had when you use anything that's AGPL.
So that's one reason we haven't gone down the AGPL path.
So far, it's worked out well.
And I guess the best model for that is the Linux kernel.
It's as popular as an open-source project.
It's going to get, and they use GPL for a license, and it's worked out well for them.
And that's what I usually tend to point to developers sometimes
when they ask, hey, if it's MIT, that'd be more inclined to use it.
But then I point out that, hey, you're probably using something Linux-related,
and that is GPL.
So it's a very similar model.
I think there's a lot of misunderstanding that is GPL. So it's a very similar model.
I think there's a lot of misunderstanding about how GPL works in the industry,
and that is definitely a friction point.
But I mean, I think the benefits outweigh the risks,
I guess, for us to change the license.
There's kind of like a freedom of ignorance
with the MIT license, where it's just like,
this one and the Apache 2, and the BSD license. It's like the very permissive ones, where it's just like this one, I know,
and the Apache too,
but yeah,
and the BSD license is like the very permissive ones where it's like,
I don't have to think about it.
I'm just good.
You know,
where it's like,
okay,
the GPL and the AGPL,
I need to understand what exactly I'm getting myself into.
And once you do,
it's not that hard to understand and the implications.
I mean,
it can get hairy,
especially if you're trying to build businesses and stuff.
But I think the, I'll just MIT it and forget about it
kind of thing is kind of throwing caution to the wind.
And it's nice for adoption
because you can just green list or whatever.
Go ahead and all these MIT licensed projects
are just good to go.
You don't have to think about it.
So I can definitely understand that you have a good example
of a GPL project that's massively adopted and popular.
I wonder how often we don't think about Linux in our infrastructure as much as we think about a database or a search engine.
You know, even though Linux is the underpinnings most of the time for all that stuff.
But for some reason, it's almost like so low level that you don't even consider like the licenses of your operating system, maybe.
Right, right. level that you don't even consider like the licenses of your operating system maybe right right yeah and i think that's probably a success of the gpl you know showing itself where once a
project is so popular that it seems like it's everywhere but then there are different flavors
of it all coming from the core source and it still didn't hurt the adoption of you know of linux
of the linux kernel so it kind of shows that GPL
can also be a very successful model.
And maybe, I mean, I'd say that maybe
also helped the model
or helped the core project mature much faster
because all these modifications that were done
were being contributed back into the open.
And that helped evolve the product much more faster
versus a bunch of people
forking it into
private forks
and then making
their own modifications
without contributing back.
Who knows,
maybe,
you know,
that might have hurt
how fast the core
Linux project
evolved over time.
So, yeah.
But again,
this is just my hypothesis.
The hard part is
we can't fork history
and run both
experiments in parallel you know if
we could just do that would that be nice that would be nice we need more some version control
systems inside of our timelines this episode is brought to you by Sourcegraph.
Sourcegraph is universal code search that lets you move fast, even in big code bases.
Here's CTO and co-founder Byung-Loo explaining how Sourcegraph helps you to get into that ideal state of flow in coding.
The ideal state of software development is really being in that state of flow.
It's that state where all the relevant context and information that you need to build whatever feature or bug that you're focused on building or fixing at the moment, that's all readily available.
Now, the question is, how do you get into that state where you don't know anything about the code necessarily that you're going to modify?
That's where Sourcegraph comes in.
And so what you do with Sourcegraph is you're going to modify. That's where Sourcegraph comes in. And so what
you do with Sourcegraph is you jump into Sourcegraph, it provides a single portal into
that universe of code. You search for the string literal, the pattern, whatever it is you're
looking for, you dive right into the specific part of code that you want to understand. And then you
have all these code navigation capabilities, jump to definition, find references that work
across repository boundaries that work without having to clone the code to your local machine and set up and mess around with editor
config and all that. Everything is just designed to be seamless and to aid in that task of code
spelunking or source diving. And once you've acquired that understanding, then you can hop
back in your editor, dive right back into that flow state of, hey, all the information I need
is readily accessible. Let me just focus on writing the code
that influence the feature
or fixes the bug that I'm working on.
All right, learn more at Sourcegraph.com
and also check out their bi-monthly virtual series
called DevTool Time,
covering all things DevTools
at Sourcegraph.com slash DevToolTime. so let's go back to search now we're kind of on the licensing uh beat but if we go back to
just thinking about search you know any organization that has interesting data,
like if it exists long enough, there's going to be a request for search, right? Otherwise,
the data is just not interesting, because everybody wants to poke at what they have and learn things from it. As an indie dev, and as like a small teams, small customers developer,
most of my days, I kind of had two strategies for search. It was like strategy one was can I do it
inside of Postgres? You know, like I can get some full text search inside there. Is that good enough?
And for a lot of cases, that's just good enough. And then it's like, and then it gets really hard
from there. And I was never going to do an Elasticsearch or like add this another appendage
to my infrastructure. So from there, I'd go straight over to services. So I'd be like,
can I do it in Postgres?
Or is it going to be in Algolia?
Or there's one called SwiftType.
Not sure if they're still around, but you know.
They got acquired by Elasticsearch.
Okay, so they're gone now. They were cool for a minute.
I liked what they were up to.
I think I actually had my blog on SwiftType
for a little while. They just provided
probably a lot of the stuff that type sense provides,
but that was basically it.
And I'm wondering like,
what are other options?
Like,
is that the fork in the road for most people is like,
well,
elastic search or Apache solar with infrastructure needs.
When I looked at it,
it's not like I'm just afraid of adding things to the infrastructure.
It's like,
you know,
I'm not a DBA or a Elasticsearch BA. It seems hard. And one thing I'm liking about
TypeSense, just reading about it, is it seems pretty simple. No dependencies, C++, compile it.
It seems like it's pretty easy to run. But I'm just wondering how, from your vantage point,
working in larger companies that I usually work with. Is it basically that, like
Elasticsearch or Solr, or a service, or shove it in your RDBMS of choice, or what does Mongo have
built in, etc.? Yeah, yeah. So I think most people just start out with database search,
and you understand it like SQL queries on Postgres and MySQL. And it works for relatively small data sets
because when you use a like query on this,
you know, for example, if you say it starts with queries,
that uses the index if you set an index on the field,
at least in MySQL.
But anything that, if you're trying to search
in the middle of the string in a field,
things like that are basically scanning the entire table
and you just start seeing performance issues.
So once your data set is large enough,
plus you need to do more standard things
that typically such industry,
like things like, you know,
what's called faceting.
So in the results,
if you want to say these many number of results
have a status of active,
these many results have a status of inactive
or whatever your field is,
if you want to group like that.
So you combine,
and then there's typically will come a need
for doing some sort of fuzzy searching.
So you want it to account for typos
to make sure that the spellings
are still fetch the results that you're expecting.
So as you add each of these,
you can still do a lot of this with Postgres, for example, but performance is the key thing that starts taking a hit once you have a sizable amount of data.
And so that's the point when a search engine can help, where you do have to then build plumbing to take the data from your Postgres or MySQL or whatever database you have
and then sync a copy of that into your search engine.
And what a search engine essentially does
is it builds indices that are custom
or optimized specifically for full-text search
with typo tolerance and fast thing
and the standard things that you need with search.
So because it's optimized for that,
it's going to return fast results,
whereas a database is more meant for, you know, that's more concerned about consistency and making
sure your data never gets lost and transactions and, you know, make sure parallel rights still
end up with a consistent copy of the data and things like that. So which is why we usually,
we say search engines are not your primary data store.
Instead, it's a secondary data store where you sync a copy of the data from your primary data store.
Now, interestingly, like you said, once you have data,
you eventually need to search on it or run some sort of aggregations on it.
And I think over time, databases also have realized that,
which is why you see something like Postgres add full text support within it. And then I know, for example, MongoDB added full text support within it. And even Dredis added full text support.
Really? that full-text support is a thing that databases need, but then the type of indices that you need to build
to support both a full-text mode
and your standard data storage model,
it's like different.
And that's where you have dedicated search engines
that do that one thing well
versus databases try to offer everything
that works reasonably well
for the full-text search use case as well.
But then again, it's not optimized specifically for fast full-text search.
So that's where, once you run into that, that's when you take the hit of, okay, I need to build some machinery to get the data from my primary data store into my search engine.
And then you hit your search engine for search results.
Another interesting use case, though, is for, even though we call it a search engine,
search engines typically also have
filtering capabilities where you can say,
get me all records which have
this particular value for this field.
So I know some users for TypeSense, for example,
are using it as essentially like a
cache in a JSON store because
you can just push a bunch of JSON.
You can search on that JSON and you can also get JSON documents by ID. And since they're
any way replicating a copy of the data into a type sense to search on it, some users are actually
using it as just another JSON store to in front of their database so that they don't have to hit
the database for any heavy queries, which is another interesting use case for type sense.
That is interesting.
I have felt the pain of like marshaling,
I don't think marshaling is the right term here,
syncing data over to a search store.
And I'm wondering if there's ever been an effort
or other projects that just say,
don't send your data over to the search,
just point your search at your database
and then maybe configure it for what you want.
And it can exist in one place
and this could be a proxy.
Like you said, you could use it however you want.
And it has maybe read-only access or something,
so it's safe.
It's not going to destroy stuff.
Or does that have performance implications
that are massive?
So in fact, there are projects which do this.
For example, Airbyte, for example, is one company that I know is doing it.
They're actually building an open source way to transport data from one source to a different destination.
And they, and there's, I think, Fivetran does it.
There's a bunch of different startups that have attempted to do this.
But when it comes to search engines,
usually it's not,
like if you replicate an exact copy of the data
into your search engine,
you're probably going to be replicating things
that you don't want to search on.
Or you might want to change the shape of the data
a little bit before putting it into your search engine
so that it's more optimized for the types of search queries you're running instead of replicating
a structure that works more for your application query the data so that's where what i've seen is
even though there are many of these frameworks out there another one is singer framework i think
and that's another open source product that does this. But even though there are a couple of these out there,
it seems like you eventually end up
having to transform the data a little bit
so that it's more optimized for your search use case.
So at that point, you have to customize that logic yourself.
And eventually people end up writing,
you know, writing their own transformation layer
and, you know, building it themselves,
maybe on top of one of these.
But so there is some customization needed.
So I don't think, given that the access patterns are different, just mirroring your entire
data set usually will mean that you're probably storing more in your search engine that's
actually needed, which might increase your costs.
You have to deal with more data going through the wire.
So consistency issues, for example.
So eventually people end up building their own custom sync scripts.
So sort of unavoidable because you're either going to do it up front
or you're going to do it slowly, probably not as well eventually
as you use it anyhow.
Right, right.
Okay, that's too bad.
It'd be great if you could just point it and be like,
hey, just index this thing differently and be awesome.
Oh, yeah. I wish maybe one of these frameworks allowed you to also set up transformation rules on the fly.
Yeah, exactly.
Especially if they allow you to join.
That's the most common transformation that I've seen is joining data from two different tables in your relational database and putting it into one flattened structure.
Because in a search engine, you typically flatten
out your data because if you do runtime
joins, it's going to slow down the search.
So if they allow you to
transform, set up joins
at transformation time,
I think that'll be an amazing
product.
Add it to your roadmap, Jason.
Yep, yep.
Yeah, I think we have a lot of such core such use cases
or features so yep so you said 2015 was your your begin date it's by my math 2022 now so that's what
how many years seven years good math good math you're compared to algolia you compared elastic
search how well do you think you compare to Algolia and to Elasticsearch? Do you think you're a pretty good one-to-one?
Do you win most cases? What makes you win? What's your differentiator?
Yeah, so I would say it depends on the use case. So if you're looking at feature parity,
I would say we're, because we're closer in spirit to Algolia, I would say we're 85% feature parity with Algolia.
You know, most of the features that we don't have today are things related to AI or any machine learning related features that Algolia has out of the box.
With TypeSense, you have to bring your own machine learning model and integrate that into the search engine.
So with Algolia, we're 85% feature parity.
And even with that, a good number of Algolia users
are switching over on a regular basis.
Elasticsearch, though,
is a different type of a beast
in that they do app and site search,
which is what TypeSense and Algolia do.
So a search bar on your website or apps.
They also do things like log search.
They also do anomaly detection.
They do security incident monitoring. They do analytics and visualizations if you're using the Kibana stack. So they have a
whole bunch of umbrella of search-related use cases that's, of course, built on the core
Lucene engine, but it's still customized very well for a whole plethora of use cases.
So I wouldn't say we're feature-parity
with Elasticsearch by any stretch
because they do a whole bunch of different things.
What we've done with TypeSense is essentially
just taken the site and app search piece,
and we're trying to simplify that
and have an opinionated take on what sort of features
or parameters are allowed to be configured
and we'll choose defaults for you.
So it is an opinionated take on app and site search.
So given that our goal is not to be feature parity
with Elasticsearch,
even if it's just site and app search,
if we become feature parity with Elasticsearch,
then we'll also invite the same level of complexity.
So that is not our end goal.
Instead, we want to see what use cases
people are using Typesense for
and then building an opinionated thing
that works
out of the box for, say, 80% of the
use cases.
I'd say we're nowhere close to
feature parity with Elasticsearch, to answer your question,
but that's by design because if we did
do that, then we end up becoming another
Elasticsearch and that's not what we want to do.
Yeah. You also said
the frustration you had early on was maintaining the Elasticsearch, and that's not what we want to do. Yeah. You also said the frustration you had early on
was maintaining the Elasticsearch instance,
not just the code behind it, what made the code work
and be able to be a great algorithm to search
and transform data and be real-time
or whatever the needs are for the engine.
You mentioned maintaining the actual Elasticsearch infrastructure.
It took hours every couple of months. Can you talk about
how you've changed, how you've used that pain to change things with TypeSense?
Yeah, so with Elasticsearch, part of the complexity comes with the fact that it runs on top of the
JVM. And fine-tuning the JVM itself is such a big task.
And then you have to configure Elasticsearch's parameters on top of that.
So I was recently, I actually grepped the Elasticsearch code base
for the number of configuration parameters that they have.
It's almost 3,000 different configuration parameters to do various things.
And you need to figure out which of those parameters apply in your specific use case to fine-tune that
on top of, of course, the JVM configuration parameters.
So that dependency on the JVM was one big thing
that we avoided with TypeSense
because we built it in C++,
so there are no other runtime dependencies.
It's a single binary,
so you just use a package manager
to install it or download and install the
binary with zero other
dependencies. So it's a single binary
that you start up and it scales
without any fine-tuning.
And that's something we've done in TypeScript
is set same defaults for many
of the configuration parameters so that
it scales out of the box without
you having to tweak some
parameters. So for example, I've seen users do without any fine tuning. There was one use case
where people did almost, this one user did almost 2,500 requests per second on their particular
data set. It was only 4,000 records, but still on a 2V CPU node with just 512 MB of RAM. They were able to get almost 2,500 requests per second
from a type sense cluster without fine tuning anything,
just installing it, indexing their records
and running a benchmark against it.
So that's what we optimized for, which is out of the box,
no finagling with all the knobs.
It just scales out of the box. So youagling with all the knobs. It just scales out of the box.
You throw more CPU at
TypeSense, it just takes advantage of it without
you having to do more work to take advantage
of all the cores. Use all
the resources available that you
provide TypeSense. That's the model that we've gone
for with TypeSense versus
Elasticsearch.
In addition to adding resources, you need to
make sure it's configured to take advantage of them in the best way possible.
And with Algolia, you don't know.
Oh, right. Yep.
With Algolia, I don't think they allow you
to benchmark their services.
Plus, if you benchmark it,
because they charge by the number of requests
that you send them,
if you benchmark it, you're probably,
even if they allow benchmarking,
you'd probably have to pay a ton of money
just to run the benchmarks.
For example, if you're doing 2,500 requests per second,
you're paying $2.50 per second
for how long you run your benchmark,
at least based on their public pricing.
So it'll be very expensive to run benchmarks on Alcoa there.
So let's say you yum install TypeSense or depackage install or whatever it is, Homebrew.
Pick your distro choice and do the standard package management installation.
Then what do you do?
Is it like you, is it provide an API that listens on a port?
And like, how do you start to use the thing?
Let's say I have a database.
Let's just say I have a typical 12 factor web app
with like a database.
What do I do from there?
I have a type sense now.
I'm sure it's registered as a service or something
on the operating system.
So it's going to start when the OS boots
and it's going to turn off and stuff.
What do I, how do I use it?
Yeah.
So TypeSense will start listening by default on port 8108,
which is the standard port that we've chosen.
And an API key is auto-generated for you.
If you use one of the package managers to start TypeSense.
So you get the API key from the config file,
and then you look at
the documentation and just use curl to first create a collection, and then you send JSON data into it
in another curl command. And then that's it. It's indexed. And then you can call the search API
endpoint again via curl. Or typically, you at that point start building a search UI and have the
search UI make search calls out to TypeSense with an API key that you generate just for search purposes.
So roughly, it's just two steps to get the data into TypeSense, create a collection, and then index your JSON data.
And then the third step can be as complicated or as simple as you need it to be.
But at that point, the data is ready to be searched, either via curl or through a UI that you build.
Okay, so it's all just JSON.
Let's say the data's in there already,
and I'm doing queries against it.
It's just going to send JSON back and forth.
Correct, yeah, it's all JSON and RESTful-ish API.
RESTful-ish.
Isn't RESTful already has the ish in it?
That's the full part, right?
That's a good point, yeah.
I know what you mean because uh
because rest is not exactly what we all think of it when you look at the full thing there's a lot
there yeah okay cool what about administration is there any sort of ui for typesense itself
is there an admin or is there i know it's supposed to be saying defaults but what if i do decide
i want to save some ram or I don't know, whatever.
I'm sure you have some configuration.
Yep. So on the self-hosted version, it's an API only thing.
We don't publish a UI,
but there is a community project where people have built a UI that you can
basically hit all the API endpoints. So it's almost like a postman,
but on top of that,
there's a nice UI to
look at the collection schema and things like that. And then on TypeSense Cloud,
we do have a UI that's built in, that's built by the TypeSense team. And that comes with things
like role-based access control. And so you don't have to share API keys and permissions and all
the good stuff that if you're in a team setting, things that might be useful there, we put that in,
at least on the UI front in TypeScript Cloud.
But we actually run the same open source binaries
that we publish on TypeScript Cloud as well.
So it's exactly the same versions that we publish
that we run on TypeScript Cloud.
Yeah, that's super cool.
I think hosting is an obvious business model.
Obviously, it's working so well so far, better than the open core,
which was giving you probably indigestion.
That's how I think of it, to decide where to put stuff.
And then as you confessed to earlier,
the open source stuff was more solid than the proprietary stuff
because of the fact that more people were using it.
Have you considered on-prem as another way of going about it?
Because a lot of orgs i would assume
want search but they don't want hosted search because their data is precious and they may have
regulations and they have security concerns and you think you could make money with an on-premise
version even though i could just you know yum install and run it myself but i don't know maybe
there's like the tooling around it that y you all are building for the hosted version could be
value-add for larger orgs.
Yeah, we did
consider it. I guess we just didn't go down
that path because of
the complexity of maintaining
on-prem installations.
Because on TypeSense Cloud, we have full visibility
into the entire infrastructure,
and we've built monitoring
infrastructure. Those are've built monitoring infrastructure.
Those are not really directly related to TypeSense,
but still monitoring tooling that helps us monitor the TypeSense cloud clusters.
Installing something like that on an on-prem environment,
I mean, it's possible we can probably set up VPCs
and private networks and all that stuff.
But it's just added complexity
that we didn't want to take on just yet.
So I think it's just maybe a matter of time if enough people ask us for it.
And today it seems like if people say, hey, we need to be HIPAA compliant, for example,
we're not HIPAA compliant on TypeSense Cloud, then the only option is to self-host. I tell them,
if you need additional support, can like do like a support agreement
separately and help you but then being on call and doing production level support for
stuff running on someone else's infrastructure where you don't have complete visibility that is
that's i haven't yet come to a point where i can digest doing that unless unless we figure out
you know more ways to make that efficient, I guess.
Right. Or the number has to be good enough, right? Like it's got to be.
True.
It's got to be that worthflux Data, the creators of InfluxDB.
InfluxDB is the open source time series platform where developers build IoT, analytics, and cloud applications.
And I'm here with Paul Dix, founder and CTO of Influx Data.
Paul, all the open source software that Influx Data creates
is either MIT licensed
or Apache 2 licensed.
These are very permissive licenses.
Why are you all for permissive licensing?
The thing is,
we like permissive licenses
because we want people
to do whatever they want.
Because of these three reasons,
freedom, evolution, and impact.
Freedom means being able
to create a business
and create your livelihood off of this code
regardless of what you want to do with it.
You can modify it, look at it, do whatever.
Evolution means you can create a derivative project
and rename it, put it out there in the world,
either as an open source project under a permissive license,
or you can relicense it under a copyleft license,
or you can create a
business off of that and then the last bit is impact we believe more people benefit from open
source when that code is permissively licensed despite the changes that the other infrastructure
vendors are making influx remains permissively licensed open source with commercial software
well said thank you paul that truly summarizes the spirit of open source.
So if you want the option to have freedom,
the option to have evolution and impact,
use InfluxDB for your time series application needs.
Check it out and start for free, of course,
at influxdata.com slash changelog.
Again, influxdata.com slash changelog.
And by our friends at Retool.
Retool helps teams focus on product development and customer value,
not building and maintaining internal tools.
It's a low-code platform built specifically for developers.
No more UI libraries.
No more hacking together data sources.
And no more worrying about access controls.
Start shipping internal apps that move your business forward in minutes
with basically zero uptime, reliability,
or maintenance burden on your team.
Some of the best teams out there trust Retool,
Brex, Coinbase, Plaid, DoorDash, LegalGenius,
Amazon, Allbirds, Peloton, and so many more.
The developers at these teams trust Retool
as their platform to build their internal tools,
and that means you can too.
It's free to try, so head to retool.com slash changelog.
Again, retool.com slash changelog. so jason you mentioned typesense cloud for the first time on this conversation
now i assume that i see a pricing tab and this is hosted this is your ability to
make money your ability to resist venture potentially attract venture this started as
nights weekend project how did you get to i did you ever think you'd be here uh you know launch
cloud and self-fund what's the story there yeah I'd say TypeSense Cloud is a product that our users
essentially pulled out of us. Because, you know, when we started working on TypeSense, I mean, we
didn't think we'd build a company around it that, you know, in 2015, if you had asked me. But
eventually, you know, once we open sources, let's say in 2018, 19, we figured, okay, this,
we probably need to figure out a business model here to make sure this is a sustainable open source project.
And then we tried the open core model
and that didn't go too well.
And then people eventually told us
that they will pay us if we hosted TypeSense for them.
So that's essentially people telling us
that they're ready to pay if only we had a hosted version.
So that is how this came about.
Then we started building TypeScript just based on people asking us for it,
which is, I'd say, a nice place to be in.
So me and my co-founder have probably built like 12 or 13 different products
in the last 15 years.
And some of them did well, some of them didn't get too much traction.
But every product in the past, we would build a product first and then hope it makes money.
And, you know, that used to be our operating model.
But with TypeSense, we were in a different place where people were telling us that, hey, do this and we will pay you.
And so it was nice that when we launched, you know, that week we had people paying us already once we launched Typesense
Cloud. So that's when we realized there is a real problem that people are willing to pay to have
solved for them. So, you know, we started, you know, just mentioning Typesense Cloud in different
pieces, places in the documentation and on our landing page saying that this exists. And people
kind of organically started signing up for it and using it.
And we also made sure that, you know,
the product is full featured in the open source and in the hosted version as well.
So it was nice to be able to tell people that,
hey, we're doing this only if you don't want to worry
about servers and if you don't have an infrastructure team,
we'll take care of that for you.
And that's what we're charging you for.
So it was very easy to explain to users
what the benefit we're giving with TypeSense Cloud is,
which is we're essentially like an extended
infrastructure team for them
so they don't have to worry about servers.
So that worked out pretty well.
I'd say I, you know, to answer your question,
I'm pleasantly surprised with the, you know,
how many folks opt to use TypeSense Cloud.
You know, especially it seems like serverless is a thing that
is getting a lot of
adoption these days. So people
generally don't have any other...
Where TypeSense Cloud fits in is if people don't have
any other VMs that they run in their infrastructure
and they don't want to deal with
hosting anything themselves, then TypeSense Cloud
is a nice sit there. So that
also means that we
now have revenue
to sustain ourselves off of while working on Typhoon.
So some of the attention we got on Hacker News, et cetera,
we had inbound interest from almost 30 different VCs
at this point asking us if we'd be interested,
if we're considering, et cetera.
But for me personally,
so I've worked at venture-backed companies in the past.
And so I kind of know the song and dance
of what it takes to run a venture scale business.
And the realization that I had eventually was that
in a venture-backed company,
you're essentially selling stock to your investors.
And stock is, if you think of it,
just like another product line that you have
and your customers here are your investors.
So in addition to selling your core product
to your customer, to your users of the core product,
you're also selling a new product line,
which is your company stock
to your investor group of customers.
So once I started seeing it that way,
the value that your investor group of customers get So once I started seeing it that way, the value that your investor group of customers
get from the product that they're buying,
which is the company stock,
is appreciating stock value.
So to keep them happy,
you have to do things to increase
your company's stock value.
And sometimes some of the things that you do there
might not sit well with the core group of your customers
who are buying your core product. And that tension is what, you know, I've seen that play out in the past. I've seen,
I keep seeing that play out in other SaaS companies that are VC-backed, where, you know,
the eventual cycle seems to be that, you know, they price their products super low, so they get
and subsidize it to gain massive adoption. And then eventually they work their way up to like, you know,
the Fortune 5,000, Fortune 1,000 companies
and start looking at million-dollar deals.
And suddenly, once you have a million-dollar deal in your radar,
your, you know, $15 a month paying customers
seems like a tiny drop in your revenue bucket
and your priorities as a company completely shift.
So that is what I hate to have happen in, you know, with the product like TypeSense,
because one of my goals with TypeSense is to make sure that it is available to as many
people as possible without cost being an issue.
And that's why it's open source.
It's really accessible.
And I felt like, or at least this is my
current thinking i felt like the venture model kind of doesn't sit well with that
goal of making sure that as many people have access to attention as possible or at least
it doesn't make that goal easy to easy to achieve without conflicts of interest
here and there at different position points. So as you grow the company.
So that's one big reason I've essentially said no
to all the VCs who've reached out so far.
And who knows?
I mean, at least that's my current state of mind.
Yeah.
If something changes.
And then we've been able to sustain with this model.
So it's working out very well for us, I'd say.
Jared knows I've been one to say absolutes, what we will and won't do,
only years or days, potentially even later,
changing my mind or having my opinion change
and sort of walking back that hard absolute I'd say before.
One thing you said was the appreciation, right?
The appreciation of the stock to the investor.
Isn't that the name of the game for business anyways?
Don't you want your business to appreciate?
So how does the tension with an investor involved change the game for you?
Yeah, that's a good point.
So I would say the value of a business, there's building value into the core product that you're selling and providing that value to the customers
who are paying for that core product. That's one way to grow the value of the business.
Now, of course, if you're looking at it from the perspective of stock prices to be able to maybe
sell the company later on, then building value in the core product is not going to be as
financially rewarding as selling stock to investors,
but I'm wondering if maybe once you have a sufficient large adoption of your core product,
I'm wondering if that will help translate to also like,
you know,
not that we're looking to do this,
but you know,
if we were to do like a crowdfunded fundraising,
eventually maybe that core value that the product delivers is what determines the, you know, our stock prices if we ever were to do a crowdfunded round.
Rather than today, it feels to me like the way stock prices increase in a VC-backed model is that it's only by raising your next round of funding. So once you get on that train to keep your latest round of investors happy for the valuation that they paid,
you have to raise the next round of funding or go public or have some sort of a liquidity event
just so that the latest round of investors
make good returns on that investment.
So that's what leads to increasing valuations.
You just keep having to raise additional rounds of funding
to keep that group of, you know, quote unquote, customers happy.
Good point.
One more question on this front.
I mean, 2015 to 2018 isn't a far stretch.
Elasticsearch IPO'd in 2018.
You had to see the possibilities of this space in terms of a business, right?
Algolia was well-funded, Elasticsearch IPO'd.
You had to see the possibility of you taking a large portion
or even a large small portion of that market share
and capitalize on it.
Oh yeah, for sure.
I think search, like we were discussing in the beginning,
is something that is an evergreen problem,
something that didn't start yesterday,
is not going to stop being a problem suddenly.
So I'd say definitely something that we consciously chose
is to choose a market that's big enough
so that even if we capture a very tiny portion of that market,
it's still a good investment of our time.
So the space was such that there are not actually
that many search players in the market.
Now there are a bunch of closed source SaaS search providers,
which more likely than not,
many of them are maybe using Elasticsearch, for example.
I'd say Algolia is at least one that I know of
that has built their own search algorithms.
But for the most part, people just use Lucene and build on top of that.
So the space didn't have too many players.
So that was the second thing.
So the first thing was a large evergreen problem that's not going to go away.
And the second thing was not many players in the market trying to solve this problem.
So I think that's why we were like, okay,
maybe we'll find our way through to making money, you know,
with some business model eventually.
Like that's the thought we had in mind.
We probably wouldn't have, you know, if it was any other SaaS product,
I would say like a SaaS closed source product,
or even an open source product in a different market,
we'd have probably did a little bit more research before jumping into it as a
building a business around it.
But I think this space was, again,
and I should say the third thing is search,
as we've learned, is also a very hard problem to solve,
which is why you don't see many search engines around
in the market.
So if you want to call it like the technical moat, I guess,
there's a huge gap to jump, you know,
to figure out search as a problem domain,
get up to speed with it and see what everyone else is doing
and then seeing where you can improve it.
That is a huge chasm to jump before you build a product.
And even if you do that, you know,
a couple of weeks polishing it and then, you know,
bringing it to market and then, you know,
telling developers that this is why a product is expensive.
It's a lot of effort to cross that big gap.
So all of this was in our mind for sure.
And we thought this is a good bet worth taking.
And all of the other ideas we've had, our focus was always going after very niche things,
like things that no one else would probably have an interest in going after, mainly because, you know, it's so niche
and it's not really directly related to, you know,
day-to-day technology that you might be using.
We basically picked boring old spaces for all the other past products.
And this one was, you know, modern, cutting edge,
and the target audience happened to be developers who, you know,
both of us are engineers, both my girlfriend and I are engineers,
so we were able to speak the same language as our target audience. So I think all of these put
together made it seem like this is like a once in a lifetime type of an idea that, you know,
we just have to execute on. So I really dig your transparent pricing for the cloud and the way that
it calculates out. You want to just tell folks how that works and
you know you mentioned you want to bring this to as many people as possible and it seems like being
able to pay as you go get exactly what you need and scale up as your need to scale up is a great
way of doing that of course a lot of the public clouds have this kind of pricing as well but you
had a configurator right there on the on the pricing Do you want to tell us how you came up with this and how it all works?
Yeah, so we came up with it mainly to mirror the cost of running the service
with how much we charge users.
So that's one core principle that we held on to because from a business perspective,
that probably doesn't make the best idea because you are
very closely tied to your costs,
but that's what we chose in service
of trying to make sure that we offer something
that's as affordable as possible.
So if you were to run, for example,
TypeSense on your own cloud accounts,
we wanted the cost to be
somewhat similar and where
we take care of, where we
get savings is from economies of scale,
essentially like running thousands of clusters on ourselves
to both the management effort involved
and the savings that you get with high spend.
So that's what we capitalize on.
And then we pass on some of the savings we get,
if you want to call it that,
some of that savings back
instead of trying to do value-based pricing,
which is what I've seen some other SaaS companies do.
Now, that does make the pricing a little bit more complicated
because people have to know how to calculate RAM,
how to calculate how much CPU they need.
And that's why we added a little calculator
which says just plug in the number of records you have
and the size of every record,
and then we'll roughly give you an estimate
of how much RAM you might need.
So that works out well for most use cases.
And so if people choose X as the size of their dataset, Typesense typically takes 2X to 3X RAM,
and that's given out as the recommendation in that calculator.
And then for CPU, we just tell people, pick the lowest CPU available for that RAM capacity.
And then as you start adding traffic, you'll see how much CPU is being used, We just tell people, pick the lowest CPU available for that RAM capacity.
And then as you start adding traffic, you'll see how much CPU is being used, and we can scale you up from there.
Or we say, run benchmarks.
If you already have high traffic in production, run benchmarks with similar kinds of traffic in a staging environment, see how much CPU you use, and then pick the CPU.
So that does make it a little bit more complicated to calculate CPU.
And then the other configuration parameters,
like you can turn on high availability,
meaning that we'll spin up three nodes
in three different data centers
and automatically replicate the data between those,
and then load balance the search traffic that's coming,
search Android traffic between all the three nodes.
So flick of a button, you have a HA service.
And then we have this thing called search delivery network,
which we built in TypeSense Cloud,
which we essentially replicate the data set
to different geographic regions.
So you could have one node running in Oregon,
one node running in Virginia,
one node running in Frankfurt,
another one running in Sydney, et cetera.
And anytime a request originates,
we will automatically route it to the node that's closest to the user.
So it's similar to a CDN, except that in a CDN,
they only cache like most frequently used data,
whereas here we replicate the entire search index
to each of those nodes sitting at the different locations.
So it's as good as it's
going to get in terms of reducing
latency for users.
In fact, this search delivery
network is what prompted some people
to use, some users to use
TypeSense as a distributed
caching JSON store.
So instead of having to
replicate your primary database, which is
probably sitting in one location out to different regions
which is a hard thing to do
they instead send a copy
of the data into TypeSense and have TypeSense
replicate the data to different regions and then
hit TypeSense directly as a distributed
cache so that's
an interesting use case that people
use Statins for. So yeah so these are
the different pricing
angles and I think when people realize that oh if I were to use statins for. So yeah, so these are the different pricing angles, and
I think when people realize that, oh,
if I were to host this on AWS or GCP,
this is how much the incremental
spend I have to spend with TypeSense Cloud.
When that delta is tiny,
when people realize that, that's when
hopefully that's a convincing case for people
to let us deal with
the infrastructure stuff rather than having to
spend time on it
yourself and spend like engineering time and bandwidth uh however tiny that might be we still
take care of that on an ongoing basis so for the true diyers who are doing it at scale is the
clustering stuff are those things that are in typesense and you're implementing it in your
cloud and they could they could also go about doing it for themselves?
Or is that stuff that's outside of the binary
and is only in the cloud?
Oh, no.
The clustering is also something
that's available in the open source version.
So it's the same binary
that you can run multiple copies of in different machines
and set up a configuration file
to point each other to the IP address of the other nodes,
and it'll automatically start replicating that.
So we, again, run the same TypeSense binary
in TypeSense Cloud as well.
In fact, any improvements that we do in TypeSense Cloud,
once we've observed people using it at scale,
that actually makes its way back into the open source version.
And that actually has helped in a nice little feedback loop
where because we have firsthand visibility into TypeSense running in production at scale with TypeSense Cloud, we're able to then improve the open source product with that experience. out. Writing software is one thing, but watching the software run
in production and
observe how it works in different
data sets, different types of traffic patterns,
query patterns, shape
of the data, you get so much
more visibility into how your software performs.
And I'd say that has
been a nice side benefit of
Typhoon Scout, besides, of course, the revenue,
to keep improving the open source product as
well through the hosted version.
Is that a commitment of yours to always
give back to the open source through cloud, or
is this just a natural
byproduct that's happened, but is it a
commitment, or is it just
sort of an accident, I guess? I don't want to
downplay it by any means.
So I guess
when we started out with TySense Cloud, we didn't
intend for this side effect that I mentioned to happen, which is us being able to use experience
from TypeSense Cloud also benefiting open source. But now that I see it happen and see how that
benefits the open source product, and I shouldn't even say open source product because it benefits
TypeSense the product because TypeSense the core product, like the API, say an open source product because it benefits TypeSense, the product, because TypeSense, the core product,
like the API is fully open source.
And the fact that we're able to use our experience
from TypeSense Cloud to improve TypeSense,
the product is amazing to me.
So I don't think we'll ever stop doing that
because if the product improves,
whether you're self-hosting it or not,
I'd love for TypeSense to be adopted.
Like if people say, you know,
today if people think about search,
you know, most developers,
back-end developers,
they tend to think about Elasticsearch.
I'd love for type sense to be that thing
when people think about it,
they think search,
especially for site and app search.
And because that's one big goal that I have,
I'd hate to not contribute things back
into the product, open source or not,
because that does disservice to what we're trying to do in the long term with types of
which is, you know, good adoption for a product that works well out of the box.
Right.
So in light of that, have you considered stealing or borrowing a page out of Algolia's playbook?
Because they've become that because they're willing to
offer that open source free tier and become kind of the starting place for many people who have the
money maybe in the business context but on their personal site they don't etc usually that's the
kind of move that you know VC money allows you to do so I'm wondering where you stand on that
because you get a whole lot of users now they're're not giving you any money, but if you want to be that default that people think about, that's one
move. Work for them. Yeah. So people do ask us regularly for an unlimited free tier in TypeSense
Cloud. So right now we give out a free tier for 30 days and after that you have to start paying.
But I think the difference between Algolia and TypeSense is that TypeSense is open source.
So if you wanted it for free, you could definitely run it yourself.
And it's fully featured.
And there's a community UI.
You can basically run this whole thing.
If you were willing to put in a little bit of effort, you can get this for a free, unlimited amount of time. So I'd say that is kind of equivalent to Algolia's unlimited free tier,
which does have a lot of restrictions and can only put so many records,
so many searches.
With TypeSense's quote-unquote free tier, it's unlimited everything
except for, of course, the infrastructure costs
that you'll have to pay any cloud provider.
Or if you want to run it on your machine,
it's going to be completely free except for the electricity.
So that's how I think about it.
So if someone says they absolutely want
a free tier, I just tell them, you know,
maybe sign up for one of the cloud providers.
They offer a better free tier
for at least like a year or give you
free credits if you open new accounts
and then just run PyPsy under those
under your own cloud account
and you get it for free.
No, I think that's logical, reasonable and fair. But what it's not is And then just run TypeSense under your own cloud account and you get it for free. Yeah.
No, I think that's logical, reasonable, and fair.
But what it's not is a go-to-market strategy
whereby TypeSense can become the default.
Like, it's just, I agree with you.
And that's a nice answer.
And that's probably what I would say as well.
But if you did have the VC money,
you could say, but we'll also, for open source, do this.
And people would just use it.
And then you would become so. Yeah, fair, fair, fair. I definitely understand what you're saying.
And that, that step of like host it yourself, you're going to cut off like 80% of the people
that would use it and not saying you can't get there. You can totally get there, but it limits
you in certain ways. Yeah, for sure. Yeah, I think being able to subsidize some of the,
I guess you could call this a form of marketing.
Yeah, it is.
That's definitely one downside of not having
marketing funds available as large as a VC would have,
or a VC-funded company would have,
but I guess that's the trade-off.
The upside is you don't lose sleep at night
while you're burning through somebody else's money.
Truth.
True, true.
And the fact, I'd say every morning just waking up and saying that the only thing that i need to
focus on today is what a majority of my users are telling me as the next important feature and i just
need to focus on that and keep chugging through that and everything else is falling into place
like that is such a satisfying feeling for me, I should say.
At some point, TypeSense is going to become ubiquitous enough
to gather the attention of somebody else
that gathered the attention of Elasticsearch.
And we had a conversation a year ago
when Elasticsearch changed their licensing
because of the Goliath in the room, basically.
What happens when AWS decides to offer TypeSense? their licensing because of the, you know, the Goliath in the room, basically, you know,
what happens when AWS decides to offer TypeSense?
You know, what will happen then?
Are you prepared?
Are you ready for that day?
Have you business planned enough?
Have you license planned enough?
What are you going to do?
Yeah, I would say that if that happens, that would be a good thing because AWS has already spent a ton of time and money getting into.
And, for example, working with the government agencies and things like that. So they've done a lot of this legwork. And if they were to offer TypeSense under that umbrella,
it only works well for TypeSense's adoption at that point.
From a revenue perspective, I think the mindset that maybe Elasticsearch has
is that they need to capture all the value
that they're creating, which is understandable, I guess.
I mean, I can see that point of view as well.
But my point of view on that is that we're creating value, but then we're also creating this value together with the community.
Even if it's just people, you know, asking us questions and giving us feedback and asking feature requests and telling us that here's how we're using TypeSense, you know, how best can we use it?
Like all this is feedback that has collectively gone into building TypeSense, the product.
So my opinion is that, and that's the nature of open source.
And so my opinion is that when you've built a product like that
standing on the shoulders of your community
and on other like dependency that you're probably using,
we've already built this together with shared value.
Let's spread this value around
rather than trying to capture it all within one commercial entity.
So I would actually
love it if additional
cloud providers start
offering types of service because that is how
we, that's how you get to be a Linux
and not an Elasticsearch, I should say.
Where there's so many
flavors of Linux, so many people use Linux
and it's become the foundation.
And I'd rather become a Linux than an Elasticsearch, at least from a
licensing adoption perspective.
We didn't ask if you were a listener of this show before we brought you on the show, but if you are not a listener of this show,
I would suggest you go back and listen to Adam Jacob talk about this because
I asked you that question thinking, what is he going to say? Because I kind of know what your answer might be, but I'm
kind of hoping that it is in light of what Adam Jacobs said,
which is essentially they're your marketing funnel, right?
Why get upset when AWS offers your thing as a service?
Because they've just blessed you, essentially, as worthy.
Worthy to use, worthy to try, and let the tech,
the usefulness of the tech and the community behind it
and the people behind it and the people behind it
be the support mechanism to say
this is worth keep using versus
what is type sense?
Who are they when AWS hasn't chosen
you yet? You're just in a
sea of obscurity, essentially, of
search land. And if they blessed you
in that way, then it's like, wow, that's a better
go-to-market strategy potentially than the free
tier of Algolia.
True.
That's true.
Maybe.
Yeah.
Yep, yep.
Yeah, for sure.
I think AWS's breadth of adoption, you know, you're just riding on its coattails if they
end up offering you as a service.
100%.
Like you said.
So, yeah, that's exactly how I look at the world as well.
Let me bring a question over that I ask on Founders Talk often,
which I think I'll ask here
as a closer for the show,
which is what's on the horizon?
We've talked a lot about TypeSense Cloud,
your commitment to open source,
your commitment to the community,
the unintended consequence
of being so faithful
to the sturdiness and stability
of the open source to give back from the advances you've made in cloud to bring them faithful to the sturdiness and stability of the open source to give back from
the advances you've made in cloud to bring them back to the binder that everybody else gets
what's on the horizon what do we not know about today that you could share here on the show
yeah so i think this is the first time i'm going to mention this publicly but we've been
working on vector search in typesense So essentially you can bring your embeddings
from any ML model that you have into TypeSense
and have that be mixed with text relevance as well.
And you could do things like
in the context of e-commerce, for example,
you can say, get me all products
that are similar to this product
or get me products that I'd recommend
based on this user behavior in the past
or whatever you construct your model on,
you can bring your embeddings into TypeSense
and have TypeSense do a nearest neighbor search.
And this is actually another example of something
that users asked us for and essentially said,
you know, we'd have to start looking at using two different services
if it's not built into TypeSense.
And we started looking into it.
And we're essentially right now building it actively with users.
So I'm super excited about that.
And I think it's going to open up a whole...
So far, I've always had to tell people that,
hey, we don't have any AI or ML-related features,
and that is going to change very shortly.
So I'm super excited about that.
Awesome. Sounds cool.
When does it drop, Jason. When does it drop,
Jason? When does it drop? Oh, it's actually already available in an RC build. We just
selectively give it out to folks. So if anyone's listening, wants to try out Vector Search
in Type Sense, I'd love to get some feedback before we publicly launch it. But we don't have
like a fixed timeline for releases. That's another maybe unique thing we do.
We just essentially collect sufficient
volume of features and then
once we think, okay, this is a good chunk
of volume to put out as the next GA release,
we promote the latest RC
build as the GA release. So, you know,
it varies between like two
months to sometimes four months before we do
GA releases.
What's the best way to get in touch with you if they
want to try that out? I'd say
just sending an email
to support at typesense.org.
That'd be good. Or just DM me on
Twitter. I have my DMs open.
My Twitter handle is Jason Bosco.
I'd be happy to, or join our Slack
community, of course, and
mention it there. What's left?
What have we not asked you?
Is there anything we haven't asked you yet that you want to share before we let it go?
I think we've covered good ground here.
Yeah, I think we've covered everything.
I can't think of anything, everything to talk about.
So maybe you spoke about a good.
We did our job, Adam.
Nice.
A good breadth of topics.
Ghost stones unturned, all the crevices examined.
Jason, thank you so much for your time.
Thank you for your commitment to open source.
Thank you for coming on the show and sharing your wisdom.
We appreciate you.
Of course.
Yeah.
Thank you for having me, Adam.
This is a great conversation, Adam.
Thanks, Jared.
Thank you.
That's it.
This show's done.
Thank you for tuning in.
What do you think about this truly open source
search alternative to algolia to elastic search to rolling your own with postgres or my sequel
let us know in the comments links are in the show notes and during the show i mentioned our
conversation with adam jacob back on episode 353 here on the changelog here's that clip
our disruptive products though not necessarily better.
They're usually actually worse, but they're good enough.
And the cost is disruptive.
And so in the case of an AWS version of Mongo,
yeah, it's not going to be as good or as maybe well-supported
or have as many features as Mongo's version of Mongo,
but it's satisfactory and it's way cheaper.
So it's disruptively cheap.
And then you add to the fact that there's no R&D, there's no development costs from Amazon's side.
So you're not competing with them on features. They're just free-riding all the features that you're building.
Well, but here's the thing. This is where we come back to the funnel. So now we're back to the business.
So sure, maybe Amazon, but this is why it's good business for Amazon to launch your stuff as a service instead of just compete with you directly.
So you've brilliantly elucidated why they would want to launch a Mongo service in the first place,
right? Brilliant. Good job, Jared. Yeah, it's good. But as soon as they do that,
if the top of the funnel was fixed, if that created no more interest in your product than
it did before, then you'd be right. But it doesn't. Instead, it turns out that the single
largest pool of software developers on the planet are the ones that use Amazon, and AWS, or Azure, or Google, how many of those developers
using one of those platforms? And if your stuff is on all three of those platforms, and it's not
on the others, how many eyeballs do you get that cockroach doesn't? The answer is a ton of eyeballs,
so many eyeballs. And so the size of that that funnel your possible monetization gets bigger hugely
bigger than it was before and in that moment your ability to capture that revenue every single one
of those cut rate document db users is a potential lead that's already using your product so all you
have to do is go find them and be like yo did, did you see how much better our console is? How much better
our operation stuff is? How you can get on a Zoom with the dude that wrote that indexing feature
when it's broken? I dare you to get that out of Amazon. And next thing you know, Citibank is like,
you know, Atlas looks pretty good. You know? What you're describing, Adam, though, is a very
well-known business tactic, which is turn your liability into your assets.
Yes.
To your advantage, you know?
So use liabilities to your advantage
as a very known by many, let's say.
Yeah, it's not news.
Right.
Well, that's good though.
I think those kinds of ideas,
sometimes seem so logical,
but yet not everybody thinks like that, you know?
So I think this is a great idea
of how could you leverage the fact that these platforms are so massive that they actually become your marketing channel for you.
They are your marketing channel for you.
And the only thing you have to give up is that they're also going to monetize some number of your customers.
All right. Find that episode at changelog.fm slash 353.
That is episode 353 with Adam Jacob on the war for the soul of open source.
A big thank you to our friends at Fly and Fastly for having our back.
And of course to Breakmaster Cylinder for those awesome beats.
And of course to you.
Thank you so much for tuning in.
We will see you on Monday. Outro Music