The Changelog: Software Development, Open Source - Typesense is truly open source search (Interview)

Starting point is 00:00:00 This week on The Changelog, we're joined by Jason Bosco, co-founder of TypeSense, the open-source Algolia alternative, and the easier-to-use Elasticsearch alternative. For years, Changelog.com has used Algolia as its search engine, so we come to this conversation with Skin of the Game and the Scars to prove it. Jason shared how they got started on type sense, why and how they are all in on open source, the options and the paths developers can take to add search to their projects, how type sense compares to elastic search and Algolia.

Starting point is 00:00:37 He walks us through getting started the story of type sense cloud and why so far they have resisted venture capital for our plus plus subscribers. There is a bonus six minutes at the end of today's show for you. If you're not a Plus Plus subscriber, hey, head to changelog.com slash plus plus to join, directly support us, drop the ads, and get access to bonus content on our shows.

Starting point is 00:00:57 A big, big thanks to our friends and partners at Fastly and Fly.io. Our pods are fast to download globally because, hey, Fastly is fast globally. Learn more at Fastly.com. And our friends and Fly.io. Our pods are fast to download globally because, hey, Fastly is fast globally. Learn more at Fastly.com. And our friends at Fly let you put your app and your database closer to users all over the world. It's like a CDM of your entire application. Check them out at Fly.io.

Starting point is 00:01:24 This episode is brought to you by our friends at Fly. Fly lets you deploy full-stack apps and databases closer to users, and they make it too easy. No ops are required. And I'm here with Chris McCord, the creator of Phoenix Framework for Elixir, and staff engineer at Fly. Chris, I know you've been working hard for many years

Starting point is 00:01:41 to remove the complexity of running full-stack apps in production, so now that you're at Fly solving these problems at scale, what's the challenge you're facing? One of the challenges we've had at Fly is getting people to really understand the benefits of running close to a user, because I think as developers, we internalize as a CDN, people get it. They're like, oh yeah, you want to put your JavaScript close to a user and your CSS. But then for some reason, we have this mental block when it comes to our applications. And I don't know why that is. And getting people past that block is really important because a lot of us are privileged that we live in North America and we deploy 50 milliseconds a hop away. So things go fast. Like when GitHub, maybe they're deploying regionally

Starting point is 00:02:18 now, but for the first 12 years of their existence, GitHub worked great if you lived in North America. If you lived in Europe or anywhere else in the world, you had to hop over the ocean and it was actually a pretty slow experience. So one of the things with Fly is it runs your app code close to users. So it's the same mental model of like, hey, it's really important to put our images and our CSS close to users. But like, what if your app could run there as well? API requests could be super fast. What if your data was replicated there? Database requests could be super fast. But if your data was replicated, their database requests could be super fast. So I think the challenge for Fly is to get people to understand that the CDN model maps

Starting point is 00:02:49 exactly to your application code. And it's even more important for your app to be running close to a user because it's not just requesting a file. It's like your data and saving data to disk, batching data for disk.

Starting point is 00:02:59 That all needs to live close to the user for the same reason that your JavaScript assets should be close to a user. Very cool. Thank you, Chris. So if you understand why you CDN your CSS and your JavaScript, then you understand why you should do the same for your full stack app code. And Fly makes it too easy to launch most apps in about three minutes.

Starting point is 00:03:14 Try it free today at fly.io. Again, fly.io. so decent bosco is here to school us, I guess, give us a glimpse into building a search engine, the algorithms behind it, not taking venture, making it open source, a ton of fun stuff. One of the co-founders behind TypeSense. Jason, nice to see you. Welcome to the show. Thanks, Adam. Thank you for having me. This is exciting to be on the show. Thanks Adam, thank you for having me. This is exciting to be on the show. We are excited to get schooled about search engines, open source things, and all the stuff

Starting point is 00:04:11 Adam just listed. Specifically, what's going on in search engine land? It seems like there's lots of interest and hype around open source search engines, Elasticsearch, etc. And I don't know if I just, my thumb's not on the pulse of search, like what's going on these days. TypeSense looks cool. I wonder what else is out there. People are always working on making better wheels, and we've had plenty of them along the years.

Starting point is 00:04:34 Jace, maybe tell us how you got into search, and then give us maybe the lay of the land of what's going on and what's kind of innovative in the search space. Yeah, so we got into TypeSense in 2015 and the lay of the land back then was Elasticsearch was, and maybe it still is,

Starting point is 00:04:52 the dominant player in the search space. So pretty much think about anything related to search and you will eventually land on Elasticsearch because they have so much content out there. And it's a super well-adopted product. So that's why we were in 2015. I was working at Dollar Shave Club. My co-founder, Kishore, he was working at another company in the search space,

Starting point is 00:05:13 or in the space which required search as one of the tools that they needed. And we were just quite frustrated with how complicated it was to get Elasticsearch up and running and scaling it and fine-tuning it. In my personal experience, I've had at least two engineers spend a week or so every two to three months fine-tuning our Elasticsearch clusters as we scaled it. And it seemed like there was just too much machinery that needed to be handled to get search working.

Starting point is 00:05:45 And our use case at Dollar Shave Club was seemingly pretty simple, which was to be able to search for customer names, emails, addresses, phone numbers when they write in or call in in our support for our support agents to look up customers. So it seemed like a pretty simple use case, but then the amount of effort it involved

Starting point is 00:06:02 to get that going seemed out of whack with the feature holds. It seems simple. So anyway, so that's how, just out of that was how we started up with the idea for TypeSense. So it was more like, you know, what would it take to build our own search engine? Like something that's simple to use. So it was a little naive at that point. It's, you know, it's something like, something like, can I build my own huge piece of database or huge piece of software that people have spent decades working on? Can we build our own? But we stuck with it.

Starting point is 00:06:35 We started reading up papers on how search algorithms work and what goes into building a search engine. And now looking back, I see how much work is involved in building a search engine. It's been a long time since 2015. Oh, yeah. It's been seven, eight years now. So now I know how much work is involved, so I'm glad that naivety is what helped us bridge the gap

Starting point is 00:06:57 of, okay, let's just stick with it. And it started as more like an R&D project, as the nights and weekends thing. So there were no time commitments or deadlines we were trying to hit. It was just, you know, chipping away little by little. And so even though we started working on it in 2015, 2018 is when we actually got to a stage where we were like, okay, now it's good enough to be used by maybe someone other than just the two of us. And in 2018 is when we open sourced it. And one of the bets that we took at that point in time

Starting point is 00:07:28 was we wanted to put all the search indices in memory, whereas in 2015, disk-based search indices were the norm. That's what Elasticsearch was doing. And there's another search engine called Solar, which actually predates Elasticsearch. And everyone used disk-based indices because disk was cheap, RAM was expensive.

Starting point is 00:07:48 But then what we figured at that point was RAM was only going to get cheaper as years rolled by. And we said, let's put the entire index in memory. So that, of course, the trade-off there is that you get fast search because you put everything in memory, it's as good as it's going to get in terms of speed.

Starting point is 00:08:05 But the trade-off is if you have petabyte scale of data, then there's no petabyte scale RAM available today unless you fan it out across multiple machines. Of course, AWS, for example, has a 24-terabyte RAM VM that you can spin up, but it's still expensive compared to a 24-terabyte disk. So I think that's the sweet spot where we figured TypeSense would fit is if you have massive volumes of data, like for example, logs, application logs, or tons of analytics

Starting point is 00:08:35 data that, you know, it would be very expensive to put on RAM, then use disk-based search. And that's where, you know, LASIK search and Solr play in. If you want instant, we call it searches you type, that's where something like TypeSense fits in. You can put everything in memory and get fast search results. So that's how we started working on TypeSense. And after that, once we opened sources in 2018, it was just a matter of, not just a matter of,

Starting point is 00:09:00 but we were just listening to what users were telling us, and just adding features one by one. And another interesting thing that happened in parallel is that there's another search player called Algolia, and they have pretty, they're a closed source SaaS product, but then they have very good visibility among developers because many documentation sites use Algolia because they give it out for free for open source documentation sites. If a developer searches for any documentation, it's usually, you'll see a little powered by Algolia logo

Starting point is 00:09:33 and that does work very well. And Algolia is a fantastic product, but something that ended up happening was they kept raising their prices. And then Algolia users started discovering TypeSense and started asking us for features that existed in Algolia. And then we started adding those. And then we eventually got to a stage where we were like, okay, I think now we have a sufficient number of features where we can call ourselves an open source alternative to Algolia.

Starting point is 00:09:57 And I think that resonated with a lot of people because Algolia is a very good product, very well known, and solves for actually many of the pain points that Elasticsearch has from a developer experience point of view. So they essentially simplified search and then spread the word around that, hey, search need not be this complicated. And then once we started saying we're an open source alternative to Algolia, people quickly connected that, okay, this is what we're trying to do with Titans as well, which is good developer experience, fast search, and easy to use and to get up and running with the search experience.

Starting point is 00:10:29 So then we started seeing good traction and then at that point, we had... And then people started asking us for a paid version where we hosted for them because they didn't want to host it. That's when we realized we have a business model in front of us

Starting point is 00:10:45 and people are telling us that they will pay us if we had this. And I just couldn't let the opportunity go by and we quit our full-time jobs and started working on it full-time in 2020. Okay. Exciting. Very exciting. Yeah, yeah. I compressed what I think is five years worth of ups and downs that's a good

Starting point is 00:11:08 compression algorithm you got there yeah on your history it's about one minute per year good job but yeah it was it was a fun journey yeah if your code is anywhere near as good as this then uh you'll be in good hands with typesense so gosh where do we go from there first of all i was just thinking back that you mentioned the naivety of like it couldn't be that hard you know like how many businesses are started with such statements as which reminded me i mean before we started recording you're talking about the hacker news crowd and and how often you see those kind of statements on hacker news News, when somebody releases their product, I could build this in a weekend. And it's like, first of all, no, you couldn't. But second of all, we all understand that sentiment because that core thing, of course you could do in 48 hours, maybe 72 hours or whatever it is, but you're so far from finished at that point. You almost have to have, whether it's arrogance or naivety or a combination or whatever it is to say, I'm actually going to try this and, and get started and get going and have

Starting point is 00:12:11 a different idea. Like, Hey, what if we put everything in memory to even start on a journey that's going to take seven years? And I'm sure it's, it's just getting started, right? Like you guys aren't at your, at your finish line. You're like just barely off the starting line. So it's always cool to see when a story like that comes to fruition, even though it's like so often that is the story. It's like, yeah, it couldn't be that hard. And now seven years later, you're like, actually, it was really hard and it still is.

Starting point is 00:12:38 Yeah, yeah. I think what I realized, like you said, maybe the core of stuff you can get done in a weekend or whatever unit of time, which is smaller, is maybe a little closer to what you have in mind. But the iterations on top of that to actually make it a product that someone other than you can use, that is what takes so much effort. And it's not even just effort on your side. Of course, you have to invest a lot of time, but it's also interacting with people who are using the product other than you, getting that feedback,

Starting point is 00:13:11 and then iterating based on that feedback. I think that is what takes a lot of effort and time. Because, so even if you were to iterate by yourself, you know, for whatever X amount of years, I don't think the product will be as mature as being able to trade with actual people using it and giving you feedback. So case in point, for example, for us, so at one point, we tried an open core model with TypeSense, where there were some features where we held back from the open source version and said, you have to pay for

Starting point is 00:13:41 a premium version. And then eventually, we did with it because what we realized was the features that were in the open source version, more people were using it and giving us feedback. So it was generally more stable and more feature rich than the features that we held back because less number of people were actually paying for it and giving us feedback. So ironically, the features that the closed source features that people are paying for ended up being the ones that had a little less stability and less maturity. And that's when I realized, okay, this is hurting us by keeping some parts closed source because people are just not adopting it as well as we'd like.

Starting point is 00:14:21 And at that point, we just open sourced, you know, 100% of TypeSense. And after that, we uncovered a series of bugs in what used to be the closed source features. And we quickly addressed them and people started asking us for more features in line with what we already had, like improvements on those features. And it suddenly skyrocketed the amount

Starting point is 00:14:42 of how useful those closed source features were because people kept asking for more things on top of that. So I feel like that is actually a good example of where product maturity comes from actually talking to users and iterating based on that rather than just building it yourself and thinking that it's going to be awesome. I think that's needed in the beginning because you need to have a point of view on what it is you're building and define that. But after that point, I think talking to your users and getting feedback and building based on that, I feel like that has been our superpower

Starting point is 00:15:14 as not-so-secret superpower, I guess. Yeah. Since we're on the note of, I guess, licensing to some degree, it's GPLv3 licensed. Yeah. Yeah. Yeah, so we initially started with the GPLv2, and then someone pointed out that GPLv2 was not compatible

Starting point is 00:15:32 with some of the licenses, so we changed it to GPLv3. But still, we stuck with GPL instead of MIT or Apache because, at least in my opinion, GPL is an open source license which encourages other people modifying the source code to contribute that back. And of course, that's a big debate whether what is open source. But my philosophy at least is that

Starting point is 00:15:59 if you're taking advantage of an open source software and if you're modifying that software, then it's only fair to ask you to contribute that back to the community versus taking it closed source versus something like an MIT license or Apache. What I've seen happen is open source projects end up getting modified and then those modified version ends up getting closed source, which kind of goes against, you know, it's almost like a take and not give back

Starting point is 00:16:26 model. So that's why we've kind of stuck with GPL. And of course, there's a more stringent version of it, which is AGPL. And that, it seems like people tend to avoid as much as possible. Like I've heard, for example, at Google, they just don't use AGPL-licensed anything. I've heard that as well. Yeah. And ironically, I was on that side of the table at Dollar Shave Club, for example. I was the one who had to say no AGPL-licensed software. Because just during every round of fundraising, for example, the lawyers would ask us, give us the list of all your open source software using all the licenses.

Starting point is 00:17:07 And if there's anything that's AGPL or anything that's not, you know, a little off than Apache or MIT, we'll get asked questions. Why are you using this? And then more discovery into, are you using it the right way? Did you modify it? Like just a lot of conversations need to be had when you use anything that's AGPL. So that's one reason we haven't gone down the AGPL path. So far, it's worked out well.

Starting point is 00:17:36 And I guess the best model for that is the Linux kernel. It's as popular as an open-source project. It's going to get, and they use GPL for a license, and it's worked out well for them. And that's what I usually tend to point to developers sometimes when they ask, hey, if it's MIT, that'd be more inclined to use it. But then I point out that, hey, you're probably using something Linux-related, and that is GPL. So it's a very similar model.

Starting point is 00:18:04 I think there's a lot of misunderstanding that is GPL. So it's a very similar model. I think there's a lot of misunderstanding about how GPL works in the industry, and that is definitely a friction point. But I mean, I think the benefits outweigh the risks, I guess, for us to change the license. There's kind of like a freedom of ignorance with the MIT license, where it's just like, this one and the Apache 2, and the BSD license. It's like the very permissive ones, where it's just like this one, I know,

Starting point is 00:18:25 and the Apache too, but yeah, and the BSD license is like the very permissive ones where it's like, I don't have to think about it. I'm just good. You know, where it's like, okay,

Starting point is 00:18:32 the GPL and the AGPL, I need to understand what exactly I'm getting myself into. And once you do, it's not that hard to understand and the implications. I mean, it can get hairy, especially if you're trying to build businesses and stuff. But I think the, I'll just MIT it and forget about it

Starting point is 00:18:48 kind of thing is kind of throwing caution to the wind. And it's nice for adoption because you can just green list or whatever. Go ahead and all these MIT licensed projects are just good to go. You don't have to think about it. So I can definitely understand that you have a good example of a GPL project that's massively adopted and popular.

Starting point is 00:19:07 I wonder how often we don't think about Linux in our infrastructure as much as we think about a database or a search engine. You know, even though Linux is the underpinnings most of the time for all that stuff. But for some reason, it's almost like so low level that you don't even consider like the licenses of your operating system, maybe. Right, right. level that you don't even consider like the licenses of your operating system maybe right right yeah and i think that's probably a success of the gpl you know showing itself where once a project is so popular that it seems like it's everywhere but then there are different flavors of it all coming from the core source and it still didn't hurt the adoption of you know of linux of the linux kernel so it kind of shows that GPL can also be a very successful model.

Starting point is 00:19:49 And maybe, I mean, I'd say that maybe also helped the model or helped the core project mature much faster because all these modifications that were done were being contributed back into the open. And that helped evolve the product much more faster versus a bunch of people forking it into

Starting point is 00:20:07 private forks and then making their own modifications without contributing back. Who knows, maybe, you know, that might have hurt

Starting point is 00:20:13 how fast the core Linux project evolved over time. So, yeah. But again, this is just my hypothesis. The hard part is we can't fork history

Starting point is 00:20:22 and run both experiments in parallel you know if we could just do that would that be nice that would be nice we need more some version control systems inside of our timelines this episode is brought to you by Sourcegraph. Sourcegraph is universal code search that lets you move fast, even in big code bases. Here's CTO and co-founder Byung-Loo explaining how Sourcegraph helps you to get into that ideal state of flow in coding. The ideal state of software development is really being in that state of flow. It's that state where all the relevant context and information that you need to build whatever feature or bug that you're focused on building or fixing at the moment, that's all readily available.

Starting point is 00:21:16 Now, the question is, how do you get into that state where you don't know anything about the code necessarily that you're going to modify? That's where Sourcegraph comes in. And so what you do with Sourcegraph is you're going to modify. That's where Sourcegraph comes in. And so what you do with Sourcegraph is you jump into Sourcegraph, it provides a single portal into that universe of code. You search for the string literal, the pattern, whatever it is you're looking for, you dive right into the specific part of code that you want to understand. And then you have all these code navigation capabilities, jump to definition, find references that work across repository boundaries that work without having to clone the code to your local machine and set up and mess around with editor

Starting point is 00:21:49 config and all that. Everything is just designed to be seamless and to aid in that task of code spelunking or source diving. And once you've acquired that understanding, then you can hop back in your editor, dive right back into that flow state of, hey, all the information I need is readily accessible. Let me just focus on writing the code that influence the feature or fixes the bug that I'm working on. All right, learn more at Sourcegraph.com and also check out their bi-monthly virtual series

Starting point is 00:22:14 called DevTool Time, covering all things DevTools at Sourcegraph.com slash DevToolTime. so let's go back to search now we're kind of on the licensing uh beat but if we go back to just thinking about search you know any organization that has interesting data, like if it exists long enough, there's going to be a request for search, right? Otherwise, the data is just not interesting, because everybody wants to poke at what they have and learn things from it. As an indie dev, and as like a small teams, small customers developer, most of my days, I kind of had two strategies for search. It was like strategy one was can I do it inside of Postgres? You know, like I can get some full text search inside there. Is that good enough?

Starting point is 00:23:10 And for a lot of cases, that's just good enough. And then it's like, and then it gets really hard from there. And I was never going to do an Elasticsearch or like add this another appendage to my infrastructure. So from there, I'd go straight over to services. So I'd be like, can I do it in Postgres? Or is it going to be in Algolia? Or there's one called SwiftType. Not sure if they're still around, but you know. They got acquired by Elasticsearch.

Starting point is 00:23:35 Okay, so they're gone now. They were cool for a minute. I liked what they were up to. I think I actually had my blog on SwiftType for a little while. They just provided probably a lot of the stuff that type sense provides, but that was basically it. And I'm wondering like, what are other options?

Starting point is 00:23:50 Like, is that the fork in the road for most people is like, well, elastic search or Apache solar with infrastructure needs. When I looked at it, it's not like I'm just afraid of adding things to the infrastructure. It's like, you know,

Starting point is 00:24:03 I'm not a DBA or a Elasticsearch BA. It seems hard. And one thing I'm liking about TypeSense, just reading about it, is it seems pretty simple. No dependencies, C++, compile it. It seems like it's pretty easy to run. But I'm just wondering how, from your vantage point, working in larger companies that I usually work with. Is it basically that, like Elasticsearch or Solr, or a service, or shove it in your RDBMS of choice, or what does Mongo have built in, etc.? Yeah, yeah. So I think most people just start out with database search, and you understand it like SQL queries on Postgres and MySQL. And it works for relatively small data sets because when you use a like query on this,

Starting point is 00:24:49 you know, for example, if you say it starts with queries, that uses the index if you set an index on the field, at least in MySQL. But anything that, if you're trying to search in the middle of the string in a field, things like that are basically scanning the entire table and you just start seeing performance issues. So once your data set is large enough,

Starting point is 00:25:11 plus you need to do more standard things that typically such industry, like things like, you know, what's called faceting. So in the results, if you want to say these many number of results have a status of active, these many results have a status of inactive

Starting point is 00:25:26 or whatever your field is, if you want to group like that. So you combine, and then there's typically will come a need for doing some sort of fuzzy searching. So you want it to account for typos to make sure that the spellings are still fetch the results that you're expecting.

Starting point is 00:25:40 So as you add each of these, you can still do a lot of this with Postgres, for example, but performance is the key thing that starts taking a hit once you have a sizable amount of data. And so that's the point when a search engine can help, where you do have to then build plumbing to take the data from your Postgres or MySQL or whatever database you have and then sync a copy of that into your search engine. And what a search engine essentially does is it builds indices that are custom or optimized specifically for full-text search with typo tolerance and fast thing

Starting point is 00:26:18 and the standard things that you need with search. So because it's optimized for that, it's going to return fast results, whereas a database is more meant for, you know, that's more concerned about consistency and making sure your data never gets lost and transactions and, you know, make sure parallel rights still end up with a consistent copy of the data and things like that. So which is why we usually, we say search engines are not your primary data store. Instead, it's a secondary data store where you sync a copy of the data from your primary data store.

Starting point is 00:26:50 Now, interestingly, like you said, once you have data, you eventually need to search on it or run some sort of aggregations on it. And I think over time, databases also have realized that, which is why you see something like Postgres add full text support within it. And then I know, for example, MongoDB added full text support within it. And even Dredis added full text support. Really? that full-text support is a thing that databases need, but then the type of indices that you need to build to support both a full-text mode and your standard data storage model, it's like different.

Starting point is 00:27:34 And that's where you have dedicated search engines that do that one thing well versus databases try to offer everything that works reasonably well for the full-text search use case as well. But then again, it's not optimized specifically for fast full-text search. So that's where, once you run into that, that's when you take the hit of, okay, I need to build some machinery to get the data from my primary data store into my search engine. And then you hit your search engine for search results.

Starting point is 00:28:02 Another interesting use case, though, is for, even though we call it a search engine, search engines typically also have filtering capabilities where you can say, get me all records which have this particular value for this field. So I know some users for TypeSense, for example, are using it as essentially like a cache in a JSON store because

Starting point is 00:28:23 you can just push a bunch of JSON. You can search on that JSON and you can also get JSON documents by ID. And since they're any way replicating a copy of the data into a type sense to search on it, some users are actually using it as just another JSON store to in front of their database so that they don't have to hit the database for any heavy queries, which is another interesting use case for type sense. That is interesting. I have felt the pain of like marshaling, I don't think marshaling is the right term here,

Starting point is 00:28:54 syncing data over to a search store. And I'm wondering if there's ever been an effort or other projects that just say, don't send your data over to the search, just point your search at your database and then maybe configure it for what you want. And it can exist in one place and this could be a proxy.

Starting point is 00:29:13 Like you said, you could use it however you want. And it has maybe read-only access or something, so it's safe. It's not going to destroy stuff. Or does that have performance implications that are massive? So in fact, there are projects which do this. For example, Airbyte, for example, is one company that I know is doing it.

Starting point is 00:29:31 They're actually building an open source way to transport data from one source to a different destination. And they, and there's, I think, Fivetran does it. There's a bunch of different startups that have attempted to do this. But when it comes to search engines, usually it's not, like if you replicate an exact copy of the data into your search engine, you're probably going to be replicating things

Starting point is 00:29:56 that you don't want to search on. Or you might want to change the shape of the data a little bit before putting it into your search engine so that it's more optimized for the types of search queries you're running instead of replicating a structure that works more for your application query the data so that's where what i've seen is even though there are many of these frameworks out there another one is singer framework i think and that's another open source product that does this. But even though there are a couple of these out there, it seems like you eventually end up

Starting point is 00:30:27 having to transform the data a little bit so that it's more optimized for your search use case. So at that point, you have to customize that logic yourself. And eventually people end up writing, you know, writing their own transformation layer and, you know, building it themselves, maybe on top of one of these. But so there is some customization needed.

Starting point is 00:30:45 So I don't think, given that the access patterns are different, just mirroring your entire data set usually will mean that you're probably storing more in your search engine that's actually needed, which might increase your costs. You have to deal with more data going through the wire. So consistency issues, for example. So eventually people end up building their own custom sync scripts. So sort of unavoidable because you're either going to do it up front or you're going to do it slowly, probably not as well eventually

Starting point is 00:31:15 as you use it anyhow. Right, right. Okay, that's too bad. It'd be great if you could just point it and be like, hey, just index this thing differently and be awesome. Oh, yeah. I wish maybe one of these frameworks allowed you to also set up transformation rules on the fly. Yeah, exactly. Especially if they allow you to join.

Starting point is 00:31:35 That's the most common transformation that I've seen is joining data from two different tables in your relational database and putting it into one flattened structure. Because in a search engine, you typically flatten out your data because if you do runtime joins, it's going to slow down the search. So if they allow you to transform, set up joins at transformation time, I think that'll be an amazing

Starting point is 00:31:57 product. Add it to your roadmap, Jason. Yep, yep. Yeah, I think we have a lot of such core such use cases or features so yep so you said 2015 was your your begin date it's by my math 2022 now so that's what how many years seven years good math good math you're compared to algolia you compared elastic search how well do you think you compare to Algolia and to Elasticsearch? Do you think you're a pretty good one-to-one? Do you win most cases? What makes you win? What's your differentiator?

Starting point is 00:32:33 Yeah, so I would say it depends on the use case. So if you're looking at feature parity, I would say we're, because we're closer in spirit to Algolia, I would say we're 85% feature parity with Algolia. You know, most of the features that we don't have today are things related to AI or any machine learning related features that Algolia has out of the box. With TypeSense, you have to bring your own machine learning model and integrate that into the search engine. So with Algolia, we're 85% feature parity. And even with that, a good number of Algolia users are switching over on a regular basis. Elasticsearch, though,

Starting point is 00:33:09 is a different type of a beast in that they do app and site search, which is what TypeSense and Algolia do. So a search bar on your website or apps. They also do things like log search. They also do anomaly detection. They do security incident monitoring. They do analytics and visualizations if you're using the Kibana stack. So they have a whole bunch of umbrella of search-related use cases that's, of course, built on the core

Starting point is 00:33:36 Lucene engine, but it's still customized very well for a whole plethora of use cases. So I wouldn't say we're feature-parity with Elasticsearch by any stretch because they do a whole bunch of different things. What we've done with TypeSense is essentially just taken the site and app search piece, and we're trying to simplify that and have an opinionated take on what sort of features

Starting point is 00:34:01 or parameters are allowed to be configured and we'll choose defaults for you. So it is an opinionated take on app and site search. So given that our goal is not to be feature parity with Elasticsearch, even if it's just site and app search, if we become feature parity with Elasticsearch, then we'll also invite the same level of complexity.

Starting point is 00:34:19 So that is not our end goal. Instead, we want to see what use cases people are using Typesense for and then building an opinionated thing that works out of the box for, say, 80% of the use cases. I'd say we're nowhere close to

Starting point is 00:34:34 feature parity with Elasticsearch, to answer your question, but that's by design because if we did do that, then we end up becoming another Elasticsearch and that's not what we want to do. Yeah. You also said the frustration you had early on was maintaining the Elasticsearch, and that's not what we want to do. Yeah. You also said the frustration you had early on was maintaining the Elasticsearch instance, not just the code behind it, what made the code work

Starting point is 00:34:51 and be able to be a great algorithm to search and transform data and be real-time or whatever the needs are for the engine. You mentioned maintaining the actual Elasticsearch infrastructure. It took hours every couple of months. Can you talk about how you've changed, how you've used that pain to change things with TypeSense? Yeah, so with Elasticsearch, part of the complexity comes with the fact that it runs on top of the JVM. And fine-tuning the JVM itself is such a big task.

Starting point is 00:35:26 And then you have to configure Elasticsearch's parameters on top of that. So I was recently, I actually grepped the Elasticsearch code base for the number of configuration parameters that they have. It's almost 3,000 different configuration parameters to do various things. And you need to figure out which of those parameters apply in your specific use case to fine-tune that on top of, of course, the JVM configuration parameters. So that dependency on the JVM was one big thing that we avoided with TypeSense

Starting point is 00:35:58 because we built it in C++, so there are no other runtime dependencies. It's a single binary, so you just use a package manager to install it or download and install the binary with zero other dependencies. So it's a single binary that you start up and it scales

Starting point is 00:36:13 without any fine-tuning. And that's something we've done in TypeScript is set same defaults for many of the configuration parameters so that it scales out of the box without you having to tweak some parameters. So for example, I've seen users do without any fine tuning. There was one use case where people did almost, this one user did almost 2,500 requests per second on their particular

Starting point is 00:36:37 data set. It was only 4,000 records, but still on a 2V CPU node with just 512 MB of RAM. They were able to get almost 2,500 requests per second from a type sense cluster without fine tuning anything, just installing it, indexing their records and running a benchmark against it. So that's what we optimized for, which is out of the box, no finagling with all the knobs. It just scales out of the box. So youagling with all the knobs. It just scales out of the box. You throw more CPU at

Starting point is 00:37:07 TypeSense, it just takes advantage of it without you having to do more work to take advantage of all the cores. Use all the resources available that you provide TypeSense. That's the model that we've gone for with TypeSense versus Elasticsearch. In addition to adding resources, you need to

Starting point is 00:37:23 make sure it's configured to take advantage of them in the best way possible. And with Algolia, you don't know. Oh, right. Yep. With Algolia, I don't think they allow you to benchmark their services. Plus, if you benchmark it, because they charge by the number of requests that you send them,

Starting point is 00:37:43 if you benchmark it, you're probably, even if they allow benchmarking, you'd probably have to pay a ton of money just to run the benchmarks. For example, if you're doing 2,500 requests per second, you're paying $2.50 per second for how long you run your benchmark, at least based on their public pricing.

Starting point is 00:38:02 So it'll be very expensive to run benchmarks on Alcoa there. So let's say you yum install TypeSense or depackage install or whatever it is, Homebrew. Pick your distro choice and do the standard package management installation. Then what do you do? Is it like you, is it provide an API that listens on a port? And like, how do you start to use the thing? Let's say I have a database. Let's just say I have a typical 12 factor web app

Starting point is 00:38:35 with like a database. What do I do from there? I have a type sense now. I'm sure it's registered as a service or something on the operating system. So it's going to start when the OS boots and it's going to turn off and stuff. What do I, how do I use it?

Starting point is 00:38:48 Yeah. So TypeSense will start listening by default on port 8108, which is the standard port that we've chosen. And an API key is auto-generated for you. If you use one of the package managers to start TypeSense. So you get the API key from the config file, and then you look at the documentation and just use curl to first create a collection, and then you send JSON data into it

Starting point is 00:39:12 in another curl command. And then that's it. It's indexed. And then you can call the search API endpoint again via curl. Or typically, you at that point start building a search UI and have the search UI make search calls out to TypeSense with an API key that you generate just for search purposes. So roughly, it's just two steps to get the data into TypeSense, create a collection, and then index your JSON data. And then the third step can be as complicated or as simple as you need it to be. But at that point, the data is ready to be searched, either via curl or through a UI that you build. Okay, so it's all just JSON. Let's say the data's in there already,

Starting point is 00:39:49 and I'm doing queries against it. It's just going to send JSON back and forth. Correct, yeah, it's all JSON and RESTful-ish API. RESTful-ish. Isn't RESTful already has the ish in it? That's the full part, right? That's a good point, yeah. I know what you mean because uh

Starting point is 00:40:07 because rest is not exactly what we all think of it when you look at the full thing there's a lot there yeah okay cool what about administration is there any sort of ui for typesense itself is there an admin or is there i know it's supposed to be saying defaults but what if i do decide i want to save some ram or I don't know, whatever. I'm sure you have some configuration. Yep. So on the self-hosted version, it's an API only thing. We don't publish a UI, but there is a community project where people have built a UI that you can

Starting point is 00:40:39 basically hit all the API endpoints. So it's almost like a postman, but on top of that, there's a nice UI to look at the collection schema and things like that. And then on TypeSense Cloud, we do have a UI that's built in, that's built by the TypeSense team. And that comes with things like role-based access control. And so you don't have to share API keys and permissions and all the good stuff that if you're in a team setting, things that might be useful there, we put that in, at least on the UI front in TypeScript Cloud.

Starting point is 00:41:09 But we actually run the same open source binaries that we publish on TypeScript Cloud as well. So it's exactly the same versions that we publish that we run on TypeScript Cloud. Yeah, that's super cool. I think hosting is an obvious business model. Obviously, it's working so well so far, better than the open core, which was giving you probably indigestion.

Starting point is 00:41:29 That's how I think of it, to decide where to put stuff. And then as you confessed to earlier, the open source stuff was more solid than the proprietary stuff because of the fact that more people were using it. Have you considered on-prem as another way of going about it? Because a lot of orgs i would assume want search but they don't want hosted search because their data is precious and they may have regulations and they have security concerns and you think you could make money with an on-premise

Starting point is 00:41:55 version even though i could just you know yum install and run it myself but i don't know maybe there's like the tooling around it that y you all are building for the hosted version could be value-add for larger orgs. Yeah, we did consider it. I guess we just didn't go down that path because of the complexity of maintaining on-prem installations.

Starting point is 00:42:18 Because on TypeSense Cloud, we have full visibility into the entire infrastructure, and we've built monitoring infrastructure. Those are've built monitoring infrastructure. Those are not really directly related to TypeSense, but still monitoring tooling that helps us monitor the TypeSense cloud clusters. Installing something like that on an on-prem environment, I mean, it's possible we can probably set up VPCs

Starting point is 00:42:38 and private networks and all that stuff. But it's just added complexity that we didn't want to take on just yet. So I think it's just maybe a matter of time if enough people ask us for it. And today it seems like if people say, hey, we need to be HIPAA compliant, for example, we're not HIPAA compliant on TypeSense Cloud, then the only option is to self-host. I tell them, if you need additional support, can like do like a support agreement separately and help you but then being on call and doing production level support for

Starting point is 00:43:11 stuff running on someone else's infrastructure where you don't have complete visibility that is that's i haven't yet come to a point where i can digest doing that unless unless we figure out you know more ways to make that efficient, I guess. Right. Or the number has to be good enough, right? Like it's got to be. True. It's got to be that worthflux Data, the creators of InfluxDB. InfluxDB is the open source time series platform where developers build IoT, analytics, and cloud applications. And I'm here with Paul Dix, founder and CTO of Influx Data.

Starting point is 00:44:02 Paul, all the open source software that Influx Data creates is either MIT licensed or Apache 2 licensed. These are very permissive licenses. Why are you all for permissive licensing? The thing is, we like permissive licenses because we want people

Starting point is 00:44:17 to do whatever they want. Because of these three reasons, freedom, evolution, and impact. Freedom means being able to create a business and create your livelihood off of this code regardless of what you want to do with it. You can modify it, look at it, do whatever.

Starting point is 00:44:32 Evolution means you can create a derivative project and rename it, put it out there in the world, either as an open source project under a permissive license, or you can relicense it under a copyleft license, or you can create a business off of that and then the last bit is impact we believe more people benefit from open source when that code is permissively licensed despite the changes that the other infrastructure vendors are making influx remains permissively licensed open source with commercial software

Starting point is 00:45:02 well said thank you paul that truly summarizes the spirit of open source. So if you want the option to have freedom, the option to have evolution and impact, use InfluxDB for your time series application needs. Check it out and start for free, of course, at influxdata.com slash changelog. Again, influxdata.com slash changelog. And by our friends at Retool.

Starting point is 00:45:25 Retool helps teams focus on product development and customer value, not building and maintaining internal tools. It's a low-code platform built specifically for developers. No more UI libraries. No more hacking together data sources. And no more worrying about access controls. Start shipping internal apps that move your business forward in minutes with basically zero uptime, reliability,

Starting point is 00:45:47 or maintenance burden on your team. Some of the best teams out there trust Retool, Brex, Coinbase, Plaid, DoorDash, LegalGenius, Amazon, Allbirds, Peloton, and so many more. The developers at these teams trust Retool as their platform to build their internal tools, and that means you can too. It's free to try, so head to retool.com slash changelog.

Starting point is 00:46:08 Again, retool.com slash changelog. so jason you mentioned typesense cloud for the first time on this conversation now i assume that i see a pricing tab and this is hosted this is your ability to make money your ability to resist venture potentially attract venture this started as nights weekend project how did you get to i did you ever think you'd be here uh you know launch cloud and self-fund what's the story there yeah I'd say TypeSense Cloud is a product that our users essentially pulled out of us. Because, you know, when we started working on TypeSense, I mean, we didn't think we'd build a company around it that, you know, in 2015, if you had asked me. But eventually, you know, once we open sources, let's say in 2018, 19, we figured, okay, this,

Starting point is 00:47:22 we probably need to figure out a business model here to make sure this is a sustainable open source project. And then we tried the open core model and that didn't go too well. And then people eventually told us that they will pay us if we hosted TypeSense for them. So that's essentially people telling us that they're ready to pay if only we had a hosted version. So that is how this came about.

Starting point is 00:47:47 Then we started building TypeScript just based on people asking us for it, which is, I'd say, a nice place to be in. So me and my co-founder have probably built like 12 or 13 different products in the last 15 years. And some of them did well, some of them didn't get too much traction. But every product in the past, we would build a product first and then hope it makes money. And, you know, that used to be our operating model. But with TypeSense, we were in a different place where people were telling us that, hey, do this and we will pay you.

Starting point is 00:48:17 And so it was nice that when we launched, you know, that week we had people paying us already once we launched Typesense Cloud. So that's when we realized there is a real problem that people are willing to pay to have solved for them. So, you know, we started, you know, just mentioning Typesense Cloud in different pieces, places in the documentation and on our landing page saying that this exists. And people kind of organically started signing up for it and using it. And we also made sure that, you know, the product is full featured in the open source and in the hosted version as well. So it was nice to be able to tell people that,

Starting point is 00:48:53 hey, we're doing this only if you don't want to worry about servers and if you don't have an infrastructure team, we'll take care of that for you. And that's what we're charging you for. So it was very easy to explain to users what the benefit we're giving with TypeSense Cloud is, which is we're essentially like an extended infrastructure team for them

Starting point is 00:49:10 so they don't have to worry about servers. So that worked out pretty well. I'd say I, you know, to answer your question, I'm pleasantly surprised with the, you know, how many folks opt to use TypeSense Cloud. You know, especially it seems like serverless is a thing that is getting a lot of adoption these days. So people

Starting point is 00:49:30 generally don't have any other... Where TypeSense Cloud fits in is if people don't have any other VMs that they run in their infrastructure and they don't want to deal with hosting anything themselves, then TypeSense Cloud is a nice sit there. So that also means that we now have revenue

Starting point is 00:49:45 to sustain ourselves off of while working on Typhoon. So some of the attention we got on Hacker News, et cetera, we had inbound interest from almost 30 different VCs at this point asking us if we'd be interested, if we're considering, et cetera. But for me personally, so I've worked at venture-backed companies in the past. And so I kind of know the song and dance

Starting point is 00:50:09 of what it takes to run a venture scale business. And the realization that I had eventually was that in a venture-backed company, you're essentially selling stock to your investors. And stock is, if you think of it, just like another product line that you have and your customers here are your investors. So in addition to selling your core product

Starting point is 00:50:33 to your customer, to your users of the core product, you're also selling a new product line, which is your company stock to your investor group of customers. So once I started seeing it that way, the value that your investor group of customers get So once I started seeing it that way, the value that your investor group of customers get from the product that they're buying, which is the company stock,

Starting point is 00:50:50 is appreciating stock value. So to keep them happy, you have to do things to increase your company's stock value. And sometimes some of the things that you do there might not sit well with the core group of your customers who are buying your core product. And that tension is what, you know, I've seen that play out in the past. I've seen, I keep seeing that play out in other SaaS companies that are VC-backed, where, you know,

Starting point is 00:51:16 the eventual cycle seems to be that, you know, they price their products super low, so they get and subsidize it to gain massive adoption. And then eventually they work their way up to like, you know, the Fortune 5,000, Fortune 1,000 companies and start looking at million-dollar deals. And suddenly, once you have a million-dollar deal in your radar, your, you know, $15 a month paying customers seems like a tiny drop in your revenue bucket and your priorities as a company completely shift.

Starting point is 00:51:45 So that is what I hate to have happen in, you know, with the product like TypeSense, because one of my goals with TypeSense is to make sure that it is available to as many people as possible without cost being an issue. And that's why it's open source. It's really accessible. And I felt like, or at least this is my current thinking i felt like the venture model kind of doesn't sit well with that goal of making sure that as many people have access to attention as possible or at least

Starting point is 00:52:16 it doesn't make that goal easy to easy to achieve without conflicts of interest here and there at different position points. So as you grow the company. So that's one big reason I've essentially said no to all the VCs who've reached out so far. And who knows? I mean, at least that's my current state of mind. Yeah. If something changes.

Starting point is 00:52:36 And then we've been able to sustain with this model. So it's working out very well for us, I'd say. Jared knows I've been one to say absolutes, what we will and won't do, only years or days, potentially even later, changing my mind or having my opinion change and sort of walking back that hard absolute I'd say before. One thing you said was the appreciation, right? The appreciation of the stock to the investor.

Starting point is 00:53:06 Isn't that the name of the game for business anyways? Don't you want your business to appreciate? So how does the tension with an investor involved change the game for you? Yeah, that's a good point. So I would say the value of a business, there's building value into the core product that you're selling and providing that value to the customers who are paying for that core product. That's one way to grow the value of the business. Now, of course, if you're looking at it from the perspective of stock prices to be able to maybe sell the company later on, then building value in the core product is not going to be as

Starting point is 00:53:41 financially rewarding as selling stock to investors, but I'm wondering if maybe once you have a sufficient large adoption of your core product, I'm wondering if that will help translate to also like, you know, not that we're looking to do this, but you know, if we were to do like a crowdfunded fundraising, eventually maybe that core value that the product delivers is what determines the, you know, our stock prices if we ever were to do a crowdfunded round.

Starting point is 00:54:26 Rather than today, it feels to me like the way stock prices increase in a VC-backed model is that it's only by raising your next round of funding. So once you get on that train to keep your latest round of investors happy for the valuation that they paid, you have to raise the next round of funding or go public or have some sort of a liquidity event just so that the latest round of investors make good returns on that investment. So that's what leads to increasing valuations. You just keep having to raise additional rounds of funding to keep that group of, you know, quote unquote, customers happy. Good point.

Starting point is 00:54:49 One more question on this front. I mean, 2015 to 2018 isn't a far stretch. Elasticsearch IPO'd in 2018. You had to see the possibilities of this space in terms of a business, right? Algolia was well-funded, Elasticsearch IPO'd. You had to see the possibility of you taking a large portion or even a large small portion of that market share and capitalize on it.

Starting point is 00:55:18 Oh yeah, for sure. I think search, like we were discussing in the beginning, is something that is an evergreen problem, something that didn't start yesterday, is not going to stop being a problem suddenly. So I'd say definitely something that we consciously chose is to choose a market that's big enough so that even if we capture a very tiny portion of that market,

Starting point is 00:55:43 it's still a good investment of our time. So the space was such that there are not actually that many search players in the market. Now there are a bunch of closed source SaaS search providers, which more likely than not, many of them are maybe using Elasticsearch, for example. I'd say Algolia is at least one that I know of that has built their own search algorithms.

Starting point is 00:56:05 But for the most part, people just use Lucene and build on top of that. So the space didn't have too many players. So that was the second thing. So the first thing was a large evergreen problem that's not going to go away. And the second thing was not many players in the market trying to solve this problem. So I think that's why we were like, okay, maybe we'll find our way through to making money, you know, with some business model eventually.

Starting point is 00:56:30 Like that's the thought we had in mind. We probably wouldn't have, you know, if it was any other SaaS product, I would say like a SaaS closed source product, or even an open source product in a different market, we'd have probably did a little bit more research before jumping into it as a building a business around it. But I think this space was, again, and I should say the third thing is search,

Starting point is 00:56:52 as we've learned, is also a very hard problem to solve, which is why you don't see many search engines around in the market. So if you want to call it like the technical moat, I guess, there's a huge gap to jump, you know, to figure out search as a problem domain, get up to speed with it and see what everyone else is doing and then seeing where you can improve it.

Starting point is 00:57:12 That is a huge chasm to jump before you build a product. And even if you do that, you know, a couple of weeks polishing it and then, you know, bringing it to market and then, you know, telling developers that this is why a product is expensive. It's a lot of effort to cross that big gap. So all of this was in our mind for sure. And we thought this is a good bet worth taking.

Starting point is 00:57:32 And all of the other ideas we've had, our focus was always going after very niche things, like things that no one else would probably have an interest in going after, mainly because, you know, it's so niche and it's not really directly related to, you know, day-to-day technology that you might be using. We basically picked boring old spaces for all the other past products. And this one was, you know, modern, cutting edge, and the target audience happened to be developers who, you know, both of us are engineers, both my girlfriend and I are engineers,

Starting point is 00:58:04 so we were able to speak the same language as our target audience. So I think all of these put together made it seem like this is like a once in a lifetime type of an idea that, you know, we just have to execute on. So I really dig your transparent pricing for the cloud and the way that it calculates out. You want to just tell folks how that works and you know you mentioned you want to bring this to as many people as possible and it seems like being able to pay as you go get exactly what you need and scale up as your need to scale up is a great way of doing that of course a lot of the public clouds have this kind of pricing as well but you had a configurator right there on the on the pricing Do you want to tell us how you came up with this and how it all works?

Starting point is 00:58:46 Yeah, so we came up with it mainly to mirror the cost of running the service with how much we charge users. So that's one core principle that we held on to because from a business perspective, that probably doesn't make the best idea because you are very closely tied to your costs, but that's what we chose in service of trying to make sure that we offer something that's as affordable as possible.

Starting point is 00:59:14 So if you were to run, for example, TypeSense on your own cloud accounts, we wanted the cost to be somewhat similar and where we take care of, where we get savings is from economies of scale, essentially like running thousands of clusters on ourselves to both the management effort involved

Starting point is 00:59:30 and the savings that you get with high spend. So that's what we capitalize on. And then we pass on some of the savings we get, if you want to call it that, some of that savings back instead of trying to do value-based pricing, which is what I've seen some other SaaS companies do. Now, that does make the pricing a little bit more complicated

Starting point is 00:59:50 because people have to know how to calculate RAM, how to calculate how much CPU they need. And that's why we added a little calculator which says just plug in the number of records you have and the size of every record, and then we'll roughly give you an estimate of how much RAM you might need. So that works out well for most use cases.

Starting point is 01:00:08 And so if people choose X as the size of their dataset, Typesense typically takes 2X to 3X RAM, and that's given out as the recommendation in that calculator. And then for CPU, we just tell people, pick the lowest CPU available for that RAM capacity. And then as you start adding traffic, you'll see how much CPU is being used, We just tell people, pick the lowest CPU available for that RAM capacity. And then as you start adding traffic, you'll see how much CPU is being used, and we can scale you up from there. Or we say, run benchmarks. If you already have high traffic in production, run benchmarks with similar kinds of traffic in a staging environment, see how much CPU you use, and then pick the CPU. So that does make it a little bit more complicated to calculate CPU.

Starting point is 01:00:47 And then the other configuration parameters, like you can turn on high availability, meaning that we'll spin up three nodes in three different data centers and automatically replicate the data between those, and then load balance the search traffic that's coming, search Android traffic between all the three nodes. So flick of a button, you have a HA service.

Starting point is 01:01:06 And then we have this thing called search delivery network, which we built in TypeSense Cloud, which we essentially replicate the data set to different geographic regions. So you could have one node running in Oregon, one node running in Virginia, one node running in Frankfurt, another one running in Sydney, et cetera.

Starting point is 01:01:25 And anytime a request originates, we will automatically route it to the node that's closest to the user. So it's similar to a CDN, except that in a CDN, they only cache like most frequently used data, whereas here we replicate the entire search index to each of those nodes sitting at the different locations. So it's as good as it's going to get in terms of reducing

Starting point is 01:01:48 latency for users. In fact, this search delivery network is what prompted some people to use, some users to use TypeSense as a distributed caching JSON store. So instead of having to replicate your primary database, which is

Starting point is 01:02:04 probably sitting in one location out to different regions which is a hard thing to do they instead send a copy of the data into TypeSense and have TypeSense replicate the data to different regions and then hit TypeSense directly as a distributed cache so that's an interesting use case that people

Starting point is 01:02:20 use Statins for. So yeah so these are the different pricing angles and I think when people realize that oh if I were to use statins for. So yeah, so these are the different pricing angles, and I think when people realize that, oh, if I were to host this on AWS or GCP, this is how much the incremental spend I have to spend with TypeSense Cloud. When that delta is tiny,

Starting point is 01:02:36 when people realize that, that's when hopefully that's a convincing case for people to let us deal with the infrastructure stuff rather than having to spend time on it yourself and spend like engineering time and bandwidth uh however tiny that might be we still take care of that on an ongoing basis so for the true diyers who are doing it at scale is the clustering stuff are those things that are in typesense and you're implementing it in your

Starting point is 01:03:02 cloud and they could they could also go about doing it for themselves? Or is that stuff that's outside of the binary and is only in the cloud? Oh, no. The clustering is also something that's available in the open source version. So it's the same binary that you can run multiple copies of in different machines

Starting point is 01:03:19 and set up a configuration file to point each other to the IP address of the other nodes, and it'll automatically start replicating that. So we, again, run the same TypeSense binary in TypeSense Cloud as well. In fact, any improvements that we do in TypeSense Cloud, once we've observed people using it at scale, that actually makes its way back into the open source version.

Starting point is 01:03:40 And that actually has helped in a nice little feedback loop where because we have firsthand visibility into TypeSense running in production at scale with TypeSense Cloud, we're able to then improve the open source product with that experience. out. Writing software is one thing, but watching the software run in production and observe how it works in different data sets, different types of traffic patterns, query patterns, shape of the data, you get so much more visibility into how your software performs.

Starting point is 01:04:19 And I'd say that has been a nice side benefit of Typhoon Scout, besides, of course, the revenue, to keep improving the open source product as well through the hosted version. Is that a commitment of yours to always give back to the open source through cloud, or is this just a natural

Starting point is 01:04:33 byproduct that's happened, but is it a commitment, or is it just sort of an accident, I guess? I don't want to downplay it by any means. So I guess when we started out with TySense Cloud, we didn't intend for this side effect that I mentioned to happen, which is us being able to use experience from TypeSense Cloud also benefiting open source. But now that I see it happen and see how that

Starting point is 01:04:57 benefits the open source product, and I shouldn't even say open source product because it benefits TypeSense the product because TypeSense the core product, like the API, say an open source product because it benefits TypeSense, the product, because TypeSense, the core product, like the API is fully open source. And the fact that we're able to use our experience from TypeSense Cloud to improve TypeSense, the product is amazing to me. So I don't think we'll ever stop doing that because if the product improves,

Starting point is 01:05:19 whether you're self-hosting it or not, I'd love for TypeSense to be adopted. Like if people say, you know, today if people think about search, you know, most developers, back-end developers, they tend to think about Elasticsearch. I'd love for type sense to be that thing

Starting point is 01:05:33 when people think about it, they think search, especially for site and app search. And because that's one big goal that I have, I'd hate to not contribute things back into the product, open source or not, because that does disservice to what we're trying to do in the long term with types of which is, you know, good adoption for a product that works well out of the box.

Starting point is 01:05:56 Right. So in light of that, have you considered stealing or borrowing a page out of Algolia's playbook? Because they've become that because they're willing to offer that open source free tier and become kind of the starting place for many people who have the money maybe in the business context but on their personal site they don't etc usually that's the kind of move that you know VC money allows you to do so I'm wondering where you stand on that because you get a whole lot of users now they're're not giving you any money, but if you want to be that default that people think about, that's one move. Work for them. Yeah. So people do ask us regularly for an unlimited free tier in TypeSense

Starting point is 01:06:35 Cloud. So right now we give out a free tier for 30 days and after that you have to start paying. But I think the difference between Algolia and TypeSense is that TypeSense is open source. So if you wanted it for free, you could definitely run it yourself. And it's fully featured. And there's a community UI. You can basically run this whole thing. If you were willing to put in a little bit of effort, you can get this for a free, unlimited amount of time. So I'd say that is kind of equivalent to Algolia's unlimited free tier, which does have a lot of restrictions and can only put so many records,

Starting point is 01:07:09 so many searches. With TypeSense's quote-unquote free tier, it's unlimited everything except for, of course, the infrastructure costs that you'll have to pay any cloud provider. Or if you want to run it on your machine, it's going to be completely free except for the electricity. So that's how I think about it. So if someone says they absolutely want

Starting point is 01:07:29 a free tier, I just tell them, you know, maybe sign up for one of the cloud providers. They offer a better free tier for at least like a year or give you free credits if you open new accounts and then just run PyPsy under those under your own cloud account and you get it for free.

Starting point is 01:07:45 No, I think that's logical, reasonable and fair. But what it's not is And then just run TypeSense under your own cloud account and you get it for free. Yeah. No, I think that's logical, reasonable, and fair. But what it's not is a go-to-market strategy whereby TypeSense can become the default. Like, it's just, I agree with you. And that's a nice answer. And that's probably what I would say as well. But if you did have the VC money,

Starting point is 01:08:01 you could say, but we'll also, for open source, do this. And people would just use it. And then you would become so. Yeah, fair, fair, fair. I definitely understand what you're saying. And that, that step of like host it yourself, you're going to cut off like 80% of the people that would use it and not saying you can't get there. You can totally get there, but it limits you in certain ways. Yeah, for sure. Yeah, I think being able to subsidize some of the, I guess you could call this a form of marketing. Yeah, it is.

Starting point is 01:08:28 That's definitely one downside of not having marketing funds available as large as a VC would have, or a VC-funded company would have, but I guess that's the trade-off. The upside is you don't lose sleep at night while you're burning through somebody else's money. Truth. True, true.

Starting point is 01:08:44 And the fact, I'd say every morning just waking up and saying that the only thing that i need to focus on today is what a majority of my users are telling me as the next important feature and i just need to focus on that and keep chugging through that and everything else is falling into place like that is such a satisfying feeling for me, I should say. At some point, TypeSense is going to become ubiquitous enough to gather the attention of somebody else that gathered the attention of Elasticsearch. And we had a conversation a year ago

Starting point is 01:09:18 when Elasticsearch changed their licensing because of the Goliath in the room, basically. What happens when AWS decides to offer TypeSense? their licensing because of the, you know, the Goliath in the room, basically, you know, what happens when AWS decides to offer TypeSense? You know, what will happen then? Are you prepared? Are you ready for that day? Have you business planned enough?

Starting point is 01:09:38 Have you license planned enough? What are you going to do? Yeah, I would say that if that happens, that would be a good thing because AWS has already spent a ton of time and money getting into. And, for example, working with the government agencies and things like that. So they've done a lot of this legwork. And if they were to offer TypeSense under that umbrella, it only works well for TypeSense's adoption at that point. From a revenue perspective, I think the mindset that maybe Elasticsearch has is that they need to capture all the value that they're creating, which is understandable, I guess.

Starting point is 01:10:40 I mean, I can see that point of view as well. But my point of view on that is that we're creating value, but then we're also creating this value together with the community. Even if it's just people, you know, asking us questions and giving us feedback and asking feature requests and telling us that here's how we're using TypeSense, you know, how best can we use it? Like all this is feedback that has collectively gone into building TypeSense, the product. So my opinion is that, and that's the nature of open source. And so my opinion is that when you've built a product like that standing on the shoulders of your community and on other like dependency that you're probably using,

Starting point is 01:11:18 we've already built this together with shared value. Let's spread this value around rather than trying to capture it all within one commercial entity. So I would actually love it if additional cloud providers start offering types of service because that is how we, that's how you get to be a Linux

Starting point is 01:11:36 and not an Elasticsearch, I should say. Where there's so many flavors of Linux, so many people use Linux and it's become the foundation. And I'd rather become a Linux than an Elasticsearch, at least from a licensing adoption perspective. We didn't ask if you were a listener of this show before we brought you on the show, but if you are not a listener of this show, I would suggest you go back and listen to Adam Jacob talk about this because

Starting point is 01:12:00 I asked you that question thinking, what is he going to say? Because I kind of know what your answer might be, but I'm kind of hoping that it is in light of what Adam Jacobs said, which is essentially they're your marketing funnel, right? Why get upset when AWS offers your thing as a service? Because they've just blessed you, essentially, as worthy. Worthy to use, worthy to try, and let the tech, the usefulness of the tech and the community behind it and the people behind it and the people behind it

Starting point is 01:12:25 be the support mechanism to say this is worth keep using versus what is type sense? Who are they when AWS hasn't chosen you yet? You're just in a sea of obscurity, essentially, of search land. And if they blessed you in that way, then it's like, wow, that's a better

Starting point is 01:12:41 go-to-market strategy potentially than the free tier of Algolia. True. That's true. Maybe. Yeah. Yep, yep. Yeah, for sure.

Starting point is 01:12:49 I think AWS's breadth of adoption, you know, you're just riding on its coattails if they end up offering you as a service. 100%. Like you said. So, yeah, that's exactly how I look at the world as well. Let me bring a question over that I ask on Founders Talk often, which I think I'll ask here as a closer for the show,

Starting point is 01:13:10 which is what's on the horizon? We've talked a lot about TypeSense Cloud, your commitment to open source, your commitment to the community, the unintended consequence of being so faithful to the sturdiness and stability of the open source to give back from the advances you've made in cloud to bring them faithful to the sturdiness and stability of the open source to give back from

Starting point is 01:13:26 the advances you've made in cloud to bring them back to the binder that everybody else gets what's on the horizon what do we not know about today that you could share here on the show yeah so i think this is the first time i'm going to mention this publicly but we've been working on vector search in typesense So essentially you can bring your embeddings from any ML model that you have into TypeSense and have that be mixed with text relevance as well. And you could do things like in the context of e-commerce, for example,

Starting point is 01:13:55 you can say, get me all products that are similar to this product or get me products that I'd recommend based on this user behavior in the past or whatever you construct your model on, you can bring your embeddings into TypeSense and have TypeSense do a nearest neighbor search. And this is actually another example of something

Starting point is 01:14:13 that users asked us for and essentially said, you know, we'd have to start looking at using two different services if it's not built into TypeSense. And we started looking into it. And we're essentially right now building it actively with users. So I'm super excited about that. And I think it's going to open up a whole... So far, I've always had to tell people that,

Starting point is 01:14:36 hey, we don't have any AI or ML-related features, and that is going to change very shortly. So I'm super excited about that. Awesome. Sounds cool. When does it drop, Jason. When does it drop, Jason? When does it drop? Oh, it's actually already available in an RC build. We just selectively give it out to folks. So if anyone's listening, wants to try out Vector Search in Type Sense, I'd love to get some feedback before we publicly launch it. But we don't have

Starting point is 01:15:01 like a fixed timeline for releases. That's another maybe unique thing we do. We just essentially collect sufficient volume of features and then once we think, okay, this is a good chunk of volume to put out as the next GA release, we promote the latest RC build as the GA release. So, you know, it varies between like two

Starting point is 01:15:19 months to sometimes four months before we do GA releases. What's the best way to get in touch with you if they want to try that out? I'd say just sending an email to support at typesense.org. That'd be good. Or just DM me on Twitter. I have my DMs open.

Starting point is 01:15:35 My Twitter handle is Jason Bosco. I'd be happy to, or join our Slack community, of course, and mention it there. What's left? What have we not asked you? Is there anything we haven't asked you yet that you want to share before we let it go? I think we've covered good ground here. Yeah, I think we've covered everything.

Starting point is 01:15:55 I can't think of anything, everything to talk about. So maybe you spoke about a good. We did our job, Adam. Nice. A good breadth of topics. Ghost stones unturned, all the crevices examined. Jason, thank you so much for your time. Thank you for your commitment to open source.

Starting point is 01:16:10 Thank you for coming on the show and sharing your wisdom. We appreciate you. Of course. Yeah. Thank you for having me, Adam. This is a great conversation, Adam. Thanks, Jared. Thank you.

Starting point is 01:16:21 That's it. This show's done. Thank you for tuning in. What do you think about this truly open source search alternative to algolia to elastic search to rolling your own with postgres or my sequel let us know in the comments links are in the show notes and during the show i mentioned our conversation with adam jacob back on episode 353 here on the changelog here's that clip our disruptive products though not necessarily better.

Starting point is 01:16:46 They're usually actually worse, but they're good enough. And the cost is disruptive. And so in the case of an AWS version of Mongo, yeah, it's not going to be as good or as maybe well-supported or have as many features as Mongo's version of Mongo, but it's satisfactory and it's way cheaper. So it's disruptively cheap. And then you add to the fact that there's no R&D, there's no development costs from Amazon's side.

Starting point is 01:17:07 So you're not competing with them on features. They're just free-riding all the features that you're building. Well, but here's the thing. This is where we come back to the funnel. So now we're back to the business. So sure, maybe Amazon, but this is why it's good business for Amazon to launch your stuff as a service instead of just compete with you directly. So you've brilliantly elucidated why they would want to launch a Mongo service in the first place, right? Brilliant. Good job, Jared. Yeah, it's good. But as soon as they do that, if the top of the funnel was fixed, if that created no more interest in your product than it did before, then you'd be right. But it doesn't. Instead, it turns out that the single largest pool of software developers on the planet are the ones that use Amazon, and AWS, or Azure, or Google, how many of those developers

Starting point is 01:17:50 using one of those platforms? And if your stuff is on all three of those platforms, and it's not on the others, how many eyeballs do you get that cockroach doesn't? The answer is a ton of eyeballs, so many eyeballs. And so the size of that that funnel your possible monetization gets bigger hugely bigger than it was before and in that moment your ability to capture that revenue every single one of those cut rate document db users is a potential lead that's already using your product so all you have to do is go find them and be like yo did, did you see how much better our console is? How much better our operation stuff is? How you can get on a Zoom with the dude that wrote that indexing feature when it's broken? I dare you to get that out of Amazon. And next thing you know, Citibank is like,

Starting point is 01:18:35 you know, Atlas looks pretty good. You know? What you're describing, Adam, though, is a very well-known business tactic, which is turn your liability into your assets. Yes. To your advantage, you know? So use liabilities to your advantage as a very known by many, let's say. Yeah, it's not news. Right.

Starting point is 01:18:54 Well, that's good though. I think those kinds of ideas, sometimes seem so logical, but yet not everybody thinks like that, you know? So I think this is a great idea of how could you leverage the fact that these platforms are so massive that they actually become your marketing channel for you. They are your marketing channel for you. And the only thing you have to give up is that they're also going to monetize some number of your customers.

Starting point is 01:19:16 All right. Find that episode at changelog.fm slash 353. That is episode 353 with Adam Jacob on the war for the soul of open source. A big thank you to our friends at Fly and Fastly for having our back. And of course to Breakmaster Cylinder for those awesome beats. And of course to you. Thank you so much for tuning in. We will see you on Monday. Outro Music

The Changelog: Software Development, Open Source - Typesense is truly open source search (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.