Software Huddle - Architecting Real-time Analytics with Dhruba Borthakur of Rockset

Episode Date: October 10, 2023

In this episode, we spoke with Dhruba Borthakur, Dhruba is the CTO and Co-founder at Rockset. Rockset is a search and analytics database hosted on the cloud. Dhruba was the founding engineer of the Ro...cksDB project at Facebook and the Principal Architect for HDFS for a while. In this episode, we discuss RocksDB and compare it with LevelDB. We also discuss in detail the Aggregator Leaf Tailer architecture, which started at Facebook and is now powering Rockset. Follow Dhruba: https://twitter.com/dhruba_rocks Follow Alex: https://twitter.com/alexbdebrie

Transcript
Discussion (0)
Starting point is 00:00:00 We don't want to build something if we don't have a customer. So even new features that we're building, we're never building a new feature by deciding between product and engineering and marketing whether this is the best feature to build. We always try to build if there are a few customers lining up saying we will use this feature when needed, when it's available. So RostiB is essentially a C++ library
Starting point is 00:00:22 that is used to store data in a data storage system efficiently, especially when the data storage system is on flash or on memory system. One of the first use cases at Facebook was replacing like a 500 node HBase cluster with like a 10 node or something like that, or 20 node RocksDB cluster. That's the
Starting point is 00:00:47 time when people kind of woke up and thought, oh, this is good technology. Then we tried to understand what this stuff is. Hey folks, this is Alex Debris and I love today's episode. I had Druba Bertheker on here, who is the co-founder and CTO at Rockset, which is just one of my favorite products. It pairs really nicely with DynamoDB and I think a lot of sort of real-time data problems that people have. So I recommend it to a lot of folks. The one thing I like about Druba is he's super technical and has been, you know, doing a lot of things. He was principal architect for HGFS for a while. He was the creator of RocksDB at Facebook. You know, he's been co-founder at Rockset, but also he's just like a really good teacher, right? He has these videos on Rockset's YouTube channel that explains all these different architectural concepts and really
Starting point is 00:01:29 great diagrams and teaching. So we talk a lot here. We talk about RocksDB and Rockset and also the aggregator leaf tailor architecture, which is a really interesting one that, you know, sort of started at Facebook and he brought to Rockset as well. So I hope you like the show. You know, if you like it, please subscribe to the podcast, to the YouTube channel, and feel free to reach out with questions with future guests you want to see and be sure to follow both me and my co-host Sean Falconer on Twitter. And with that, let's get to the show. Druba, welcome to the show. Hey, Alex, how are you? I'm doing well. I'm really excited to have you on the show for a couple couple different reasons. Two of them are like, number one, you're the CTO at Rockset. And I just I love Rockset. I recommend it to so many people because it's just so useful and unique in what it does. So number one, love Rockset and love what you've done there. But number two, you're deeply technical and you're also really good at explaining this stuff. So you've done like a bunch of these videos on the Rockset YouTube about different architectural concepts or things about how Rockset works. And I just love what you've done there. So I'm excited to learn a lot today.
Starting point is 00:02:33 But maybe, Drew, if you could give us a little bit about your background and what you've been up to. Sure. Yeah. Thank you. Thanks for inviting me to your show. I've heard some of your previous episodes. They were fantastic. Because I really love the technical deep dives that you do on technology. Good, good. We're going deep today. Good. So, good to hear. So, yeah, my name is Dhruva. I'm the CTO and co-founder at Rockset now. Rockset is a search and analytics database hosted in the cloud. We've been around for around six to seven years now.
Starting point is 00:03:06 And prior to Rockset, I was at Facebook building a lot of data backends at Facebook, including RocksDB and some photo storage and Hadoop and HBase and these kind of backend systems. And prior to that, I was actually working on Andrew file system from CMU. It was a spin-off and it was kind of the first distributed file system out there. This is like 20 plus years ago.
Starting point is 00:03:30 So I mostly spent a lot of time building servers and storage and backend systems and things like these. Yeah, I'm excited to talk to you today about a lot of database technologies. I think this is what you had in mind, I guess. Yeah, absolutely. And yeah, I love your background. You know, a lot of things I've used either directly like HDFS or RocksDB and a lot of, you know, indirect things as well. So first of all, let's get started. Just tell me about Rockset, your current company that you've been working at and co-founded. What is Rockset?
Starting point is 00:04:00 What would I use Rockset for? Sure, yeah. So Rockset is a search and analytics database. This is the first, I think, of its kind in the sense you have search technologies earlier or your database of technologies earlier, but Rockset kind of builds the backend for combining these two type of technologies that have been there for a while. So search databases essentially are when you want your queries to be optimized for natancies and you have an online application
Starting point is 00:04:29 that is making queries 24 by 7 on your data sets and you want the queries to come back in milliseconds, you have to use a combination of search and database technologies. So this is what Rockset is. It's hosted for you in the cloud. It's a managed service. You can connect connect your data sources
Starting point is 00:04:46 and immediately start making SQL queries on your data. The API is very standard SQL, so kind of everybody knows how to use it from day zero. And then it's very cloud native. Like it's built natively for the cloud, unlike other systems. So you can get all the cloud-friendliness of serverless, auto-scale up, scale down, and these kind of things.
Starting point is 00:05:10 It usually powers applications that are running all the time and kind of making automatic decisions on your behalf. Recommendation systems, personalization systems, fraud detection systems. Those are the kind of applications that can kind of get the maximum value or the money when they use Rockset. That's the short spiel. I hope I was able to explain to you what it is and what it is.
Starting point is 00:05:41 Yep, absolutely. So I first got started with Rockset because it works so well with DynamoDB, right? I do a ton with Dynamo, which is very good at sort of like known point queries or range queries. But then if you have like long tail complex filtering or aggregations or search, things like that, not so good for Dynamo. And Rockset just fills that gap really, really well. It's like, what would you,
Starting point is 00:06:09 what do you compare Rockset to, like related systems or things like that? If people are trying to frame, like where does Rockset fit in my architecture? Yeah, that's a good point. I think in my mind, databases essentially are kind of usually two types, right? One are transactional databases and one are analytics databases.
Starting point is 00:06:25 And then there is this odd man out earlier times, or is, oh, I have a search system where I can do some text search or log search. So the difference between analytical databases and transactional is that transactional are mostly used for kind of say credit card transactions, like very easy for me to explain to somebody. You want to deduct money from a transaction here and put it in a different account, all in one transaction. So there's less of analytics.
Starting point is 00:06:54 There's less of automatic decision-making. It's more like a slave to what the application is doing. And recording, it's like a ledger that you are recording things, right? You need consistency, you need atomicity, you need many other things. But for analytics, it's all about ability to look at large portions of data, right? And then able to extract insights from it or able to take some decisions on this data. For example, let's say you have a data set where you are recording, let's say you are like a fleet
Starting point is 00:07:27 management system, right? You have thousands of trucks as part of your business and the trucks on the road. And you are monitoring where each truck is, what is it picking up, where does it need to deliver things. And then you need to, let's say, you need to read out the truck based on the pickup of the things that it has to pick up and then drop off, right? Dynamically. So this is kind of a decision-making process. It's not like taking out money from one account
Starting point is 00:07:53 and putting it in another account, right? So these kinds of complex transaction decision-making are best done in an analytics database. So when you talk about what category of other data systems that people might use that compete with Rockset, these are very much analytical in nature. Things like, let's say you have maybe a MongoDB application
Starting point is 00:08:14 that you are running to do these kind of fleet management and then trying to do automatic decision-making. Again, for analytics purposes, not for transaction. Or you could have, take for example, an Elasticsearch system where you have stored, let's say, your catalog. You are an online retailer and you have your catalog stored in Elastic.
Starting point is 00:08:37 And you are trying to find out which item you should reorder based on past transaction history on your catalogs. And what is the lead time you need to reorder before you should or the lead time you need before you should place the reordering transaction? These kind of decisions
Starting point is 00:08:55 are what Broadset is best used for. So typically we compete a lot of Elasticsearch because people have kind of been using Elasticsearch or I would say using Elasticsearch because people have kind of been using Elasticsearch or I would say using Elasticsearch for a long time. I mean, it's worked well. It's a good system.
Starting point is 00:09:12 But it's kind of not built for the cloud as you look at Elastic. So we see a lot of people migrating from Elastic to Rockset. And then when you talk about what other data systems we compete with, we are SQL versus many other systems that are not SQL, like say MongoDB Elasticsearch.
Starting point is 00:09:30 These systems are trying to build some SQL APIs in the recent past. But when I talk about SQL, I'm not really talking about select statement. I'm talking about joins. This is what in my mind is a SQL API. Everybody can implement select star, right? That's kind of easy to do whatever it is. You can call SQL or any of select star.
Starting point is 00:09:49 So our differentiator again is that we have a SQL API to our backend, which means that you can join tables as part of your queries. You can do search queries and then you can join them with other queries and then do another round of search. So search and aggregations are kind of combined together in the standard SQL API. So from that perspective, sometimes we also see people migrating from Snowflake because you are using Snowflake for your reporting,
Starting point is 00:10:17 but then you have a real-time use case you put on Snowflake, and then you quickly find out that costs on Snowflake are very high because it's not built for real-time apps. So then those guys, those applications also move to Rockset. But the majority of our use cases are very search analytics centered, where latency of queries is key to serving these applications. I like to tell people like Rockset is like Elasticsearch, but without the pain, basically. And the big things there being like, you know, Rockset is going to automatically ingest from your data source, whether that's Kafka or whether it like I got it with DynamoDB and DynamoDB streams, it's just like automatically indexing and I don't have to do anything. You can do the same with like Mongo's Oplog or, you know, relational database streams, things like that, or Kafka. So like that managed ingestion, really good.
Starting point is 00:11:10 The management of the compute and sort of scaling that up and down and managing better rather than Elasticsearch, which is going to follow her for me. And then, and then like you're saying, in contrast to something like Snowflake or Redshift or Athena, these like other OLAP heavy type things. If you want to do much sort of faster, more interactive queries, whether that's your internal team or what, like that could be like user facing queries data you're showing, you know, dashboards and things like that to users in your application. Just going to work a lot better that way. Can you tell me about the converged index and what that is in Rockset? Yeah, no, absolutely.
Starting point is 00:11:45 So, yeah, we did.org SQL. I can explain how converged indexing works, and then I can kind of join the two components together. But, yeah, just to wrap up on the previous point, I really like DynamoDB. I mean, I know you had a lot of experience with DynamoDB. The simplicity of the system is really great, and the thing is really great.
Starting point is 00:12:06 And the thing that it scales. So you can think about Rockset more like DynamoDB with a SQL API where you can do search, aggregation, joins, everything else. And you don't have to think about managing servers or systems anymore. But yeah, coming back to the uniqueness of Rockset is that we have a SQL API. That's one of our differentiators compared to say Elasticsearch or other systems and the other big difference is that we have something called a converged index. So converged index is a piece of technology which lets us build indices
Starting point is 00:12:37 on different fields of your record in multiple ways so that your queries are fast. For example, for some type, for some of these columns, let's say you're making a lot of search indices, right? So you need an inverted index on those. Or other columns, let's say you're doing more of aggregations and draw like average min or max or some aggregates that you're trying to compute, for them we would create the columnar index. And then there are some certain times you just need to look up all the fields in the record. Then you, it's more like Postgres or Mongo, then you would, for those fields,
Starting point is 00:13:16 you would build kind of the row index. We call it the row index. It's like more projections in your SQL query. But also, we also see sometimes that there are numbers on which you're doing a lot of range queries. So we would build some range indexes for those fields. And then Proxit is also kind of a database where you can put NoSQL data on one side,
Starting point is 00:13:42 but then make SQL queries on the other side. So we also optionally can index all the types of your data. Like it's basically a multi-type database, like multi-type column in the database. If you have one column, which is integers and strings, we can also optionally index those so that you can find a query time, like tell me all all the records where
Starting point is 00:14:05 zip code is an integer and zip code is greater than 48 let's say i'm just giving an example so you can also essentially build indices on types of these objects because it's multi-type and again all is in so this is what we mean by converged index uh you can um by default you can build all indices but you also have optionally can switch on, switch off indices on some of these fields. Because at query time, the query system will automatically leverage this converged index to figure out what is the best access pattern or access path.
Starting point is 00:14:39 So we have a cost-based optimizer that will do some of the hard work for you. To think like, which indices to use? Shall I use the inverted index or shall I use the column index for the query? Sometimes, because we also support joins, there are things like, there are like four or five types of joins that we do, like hash join, lookup join, whatever, broadcast join. And so the cost-based optimizer, based on the converged index we built
Starting point is 00:15:07 and the statistics we maintain, we can figure out which join would be the best thing for that query. But again, all these things are built out of the box, but you can always override it and give more hints to the system so that, I mean, the human brain is sometimes always more intelligent than writing code
Starting point is 00:15:24 and building heuristics. You know this. So there's always the option of specifying hints so you can override what the system is doing default for you. So, yeah, so that's the converge index part. We have good performance measurements on how the converge index behaves when you compare to the column scans on other systems. Like sometimes people might try benchmarks on Grid or ClickHouse or these kind of systems where there's only columnar storage. But a converged index storage for us actually gives us far better price performance for most of those queries compared to just pure column scan that you have in other databases.
Starting point is 00:16:04 So that's the Converse Index story. Yep. And one thing that really drove it home to me is, again, I'm like a Dynamo guy, and I'm like known access patterns is what I think about and stuff. And so like these long tail queries on like arbitrary attributes or columns,
Starting point is 00:16:18 things like that are really hard in Dynamo, right? And I think I was talking to Venkat, your co-founder, as he was sort of explaining that to me. And like, if you could have like a, you know, like a JSON column that has like arbitrary user inputted data, and that's if you query on some sort of like arbitrary column attribute there, that's actually really efficient for Rockset. Like this long tail query, probably not that many records have that exact key and value if it's like user input. So it's using an inverted index, which is super specific,
Starting point is 00:16:48 and it's like really narrowing it down. Whereas Dynamo, you probably have to like look at a bunch of records, sort of filter them out yourself manually and handle that. So like those long tail, hyper specific queries are just like actually very efficient in Rockset, which was counterintuitive to me. I mean, Dynamo, maybe you can build this global secondary index on some fields to do this. But for Rockset, it is
Starting point is 00:17:09 why you get a magnitude better price performance because we use something called RocksDB internally that does this for us. So essentially, you can think about DynamoDB with global secondary index versus using Rockset and letting the Rockset technology do the indexing.
Starting point is 00:17:27 So Rockset, essentially what we have done is that we have made the cost of indexing low compared to all other systems that are out there. That's kind of our biggest differentiator. So people usually think indexing is costly and, oh, it's going to cost me a lot of money to index all my data. But no, that's not the case.
Starting point is 00:17:44 We have good technology based on RocksDB and based on some of our converged indexing processes that actually make you run this at price performance on large, like tens of terabytes and hundreds of terabytes of data and be competitive.
Starting point is 00:17:59 It's amazing. Yeah, that segues into sort of this technical deep dive. I want to start, I want to investigate from like the bottom up and sort of start low level and move on and starting with rocks tv which you just mentioned so while you were at facebook you created rocks tv this means billions of people are using code that you wrote um every single day you know but but rocks tv is like one of those hidden layers you know that we don't even know we're using. So what is RocksDB?
Starting point is 00:18:25 Yeah, so RocksDB is something that we built when I was part of the engineering team at Facebook, right? Again, I started the project, but like maybe tens of developers have contributed to the success of the project, right? So RocksDB is essentially a C++ library that is used to store data in a data storage system efficiently, especially when the data storage system is on flash or on memory system. So it's optimized for kind of running a database on SSDs or on flash memory or some other kind of random access storage systems versus spinning disks, right? So this is optimized for query latencies. At Facebook, we build this. I mean, before that, actually, at Facebook, we were using a lot of HBase, which was the Hadoop-based
Starting point is 00:19:17 pipe system. And I remember when we were building RocksDB, one of the first use cases at Facebook was replacing like a 500-node HBase cluster with like a 10-node or something like that, or 20-node RocksDB cluster. That's the time when people kind of woke up and said, oh, this is good technology. Let me try to understand what this stuff is. But yeah, RocksDB is essentially a C++ library that lets you efficiently index your data and optimize your data so you can do high write rates as well as optimize for query latencies. So these are sometimes conflicting in nature
Starting point is 00:19:55 but when you have fast storage like SSDs it was built essentially for managing large data sets on flash drives and flash became very popular in 2010s, 2012s, 2013s or managing large datasets on Flash drives. And Flash became very popular in 2010s, 2012, 2013, this kind of timeframe. And that's when we started the RocksDB project. And so RocksDB is sort of built on top of LevelDB,
Starting point is 00:20:22 which was created by Google by Jeff Dean and Sanjay, like sort of legendary programmers. I guess, like, how did you improve on their work? Was it this change in hardware for stuff moving to Flash and SSD? Or like, what was sort of the insight that you all had with RocksDB that was improving on LevelDB? Got it, yeah.
Starting point is 00:20:41 No, that's a good point. So at that time, I was mostly, like prior to Rockstable, I was mostly a developer with HBase. So I wrote a lot of HBase fun internals. HBase was also trying to do something like log-structured merge trees. And I know that implementation wasn't as good as it could be when you're running on Flash. So I was trying to look around for other technologies
Starting point is 00:21:05 that can do LSM engines on Flash. And this is when LevelDB came in the picture. At that time, LevelDB was mostly built for, I think the Chromium browser or something like that. Basically it was built only for kind of in-memory data stores, but it was an LSM engine. And the code was very well-written, so I could kind of read all the LevelDB code in maybe like three or four days.
Starting point is 00:21:29 It was like extremely well-written code. Definitely, I kind of thought, oh, this code is something I can definitely take over, take up and make it better or make it useful for server applications. The relationship between LevelDB and RocksDB is more like, how should I say, it's like parent-child relationship. They have their own personalities and characteristics and whatever the ways you interact with them. But there is one common gene between these two is that both of them are log-structured merge trees, which is why our new writes come
Starting point is 00:22:05 in, they get stored in one place, and then when the overwrites happen, they get stored in a different place. And then over time, these things get merged and compacted. I'm kind of making it sound very simple, but essentially this is what, compared to B3, LSM3 kind of does this, where there is called action that's happening in the background. Are there times today when level DB might work better for you than rocks to be or as rocks to be sort of superseded? No, so I mean, there's like, I can ruffle off maybe a huge number of things. Take, for example, compactions, right? Compactions is critical to a database because only if you're able to compact and reduce your size can you take in more data. So it's highly dependent on your write rate.
Starting point is 00:22:49 If you cannot compact, then you have an unstable system. So level DB compaction is again single threaded and you can write in one thread. ROS DB compaction is multi-threaded. We have different compaction strategies. We have level compaction. We have universal compaction, which reduces the write amplification that you do on the storage. And what else?
Starting point is 00:23:12 Oh, yeah, again, things like say the basic table format, levelDB had kind of block based table format, right? But when we are using rocksdb for Facebook news feed, so Facebook news feed is the news feed that when you fire up the Facebook app, you see all your posts and comments by your friends.
Starting point is 00:23:32 That's powered by RocksDB. And one of the changes we did that was that we created a plain table format, instead of a block-based format.
Starting point is 00:23:39 Because we needed, it's running on like RAM systems, the news feed. So you need it, your storage is essentially random access storage versus an SSD or disk-based, which is kind of very block-based. What else? Then also at Facebook, RocksDB is also used for memcache,
Starting point is 00:24:00 storing memcache blobs. So RocksDB had like a blob interface now where you can store larger size datasets or larger size blobs, whereas in LevelDB, they're usually like much smaller size. Things are ideal for it. Caching system.
Starting point is 00:24:17 Oh yeah, memtables. LevelDB has one kind of memtable, which is a skip list. But for RocksDB, we have like eight different types of memtables because sometimes a skip list. But for RocksDB, we have like eight different types of memtables because sometimes a skip list memtable is great,
Starting point is 00:24:28 but sometimes maybe your vectors memtable is great because you're not reading your recent writes. Sometimes a different data structure. So we have like
Starting point is 00:24:36 all these pluggable components. So yeah, I think, I mean, this is like 12 or 10 years back. Oh yeah, RocksDB, this is like 12 or 10 years back. Oh, yeah, RocksDB, we open sourced almost 10 years ago to the dot, right?
Starting point is 00:24:50 Yeah, back in like 2013. And it's come a long way. Facebook had put in a lot of effort building RocksDB over the last 10 years. There are like probably 15 people in the team continuously working. So there's been a lot of change. So we can move back to level DB. None of these workloads will work
Starting point is 00:25:06 if you migrate RocksDB to level DB. Okay. So in terms of like where RocksDB is used, obviously Rocksend, a lot of like, you know,
Starting point is 00:25:14 very high scale application workloads like Facebook might use. The first time I think I came into contact with it was like Apache Flink, you know,
Starting point is 00:25:22 if you want to do like local state storage, stuff like that. But also also just a bunch of databases have RocksDB-based storage in it. It's like Cassandra, Mongo, MySQL have the option to use those. What, for those databases, you know, if you're looking at MySQL using MyRocks
Starting point is 00:25:40 or MongoRocks for MongoDB, is that going to be the better choice for most people? Or are there like certain cases where that engine might work, but certain cases where sort of the traditional, you know, like ICM or whatever, MySQL engine would work better? Yeah, I think that's a good point. I think in general, LSM engines work well for most use cases now, right?
Starting point is 00:26:06 But B-trees, again, the difference essentially is irrespective of whether you are using RocksDB or some other LSM engine. Take, for example, MongoDB. I think they have
Starting point is 00:26:13 a new LSM engine called WiredTiger. Not new, but same time, like 10-year-old. Mostly, I think, by default, they use the WiredTiger
Starting point is 00:26:20 LSM engine. So, yeah, I think LSM engines are definitely something that has become very popular over the last few years, especially because you're running on Flash. And Flash, you can avoid
Starting point is 00:26:32 the wear and tear of Flash when you don't do kind of point writing into the same Flash page over and over again. So that was one thing where the hardware kind of decided how the software would migrate to. But yeah, RocksDB is also used for things like you mentioned about Slink, you mentioned Kafka.
Starting point is 00:26:54 It uses RocksDB as a state store. Databricks, I think the streaming SQL, they use RocksDB again as a straight store. Because at some point, RocksDB is very much like a, what should I say? This is the Swiss army knife, right? You have, it's very sharp. As long as you know how to use it, you could get the best deal out of it. If you don't know how to use it,
Starting point is 00:27:16 you could be in a lot of trouble because it's a complex piece of software that's out there. It's high-performance, but it's quite complex to tune and manage. All right, cool. Let's move a layer up the stack now. And I want to talk about ALT architecture, aggregator, leaf, tailor. This is something you've done a video on.
Starting point is 00:27:33 I'll link that in the show notes. But just for the listeners, what is the ALT architecture? Okay, yeah. So ALT architecture, the full form is aggregator, leaf, tailor architecture. This is an architecture that we use at Rockset for building our analytics database. But it's not something that we invented ourselves. We did this also when we were at Facebook, building, say, the Facebook News Feed app. Again, this architecture is important so that we can scale a real-time analytical system,
Starting point is 00:28:04 which a real-time analytical system in our alliance is somebody who is doing a lot of writes and somebody who's doing a lot of queries at the same time. That's the real-time part, right? You can't do the writes in one place and then upload all your data to be queried after half an hour. So the real-time needs both of these two working in tandem. And the NK architecture supports this well,
Starting point is 00:28:25 where you need to do real-time analytics. So the three components is that it's a disaggregated. One of the basic ways to explain this is that it's completely disaggregated. And there's a three-way
Starting point is 00:28:39 disaggregation between the storage needed to store your data and the compute needed to index new data that is coming into your system and the storage needed to store your data and the compute needed to index new data that is coming into your system and the compute needed to query all the data that you have stored.
Starting point is 00:28:53 So this is a three-way desegregation. It's not just a desegregation between compute and storage, but it's also a desegregation between storage and the compute needed for queries and the compute needed for indexing. So the leaf part is the storage nodes. In the ALT, the L part is the leaf. Leaf is where your data is stored.
Starting point is 00:29:17 So it has its own tier of machines or conceptualizes your own set of servers. If you have more data and your volume of data grows, you grow more leaf nodes and then you scale up. But then if your amount of new data coming into the system grows, let's say today you are sending 10 megabytes a second, but tomorrow you want to send new data at 10 gigabytes a second, you need more compute to index and ingest the data. So that's the tailor part of things in the
Starting point is 00:29:45 ALT architecture. And the aggregator are a set of nodes which are used for queries. So when a query comes in, there's let's say SQL query or whatever query it is, it needs to recompile and needs to be made in a query plan and executed. That's done in aggregators.
Starting point is 00:30:01 So if there are more queries, today you have one query a second, tomorrow you have thousand queries a second, you grow your aggregators. But you there are more queries, today you have one query a second, tomorrow you have thousand queries a second, you grow your aggregators. But you don't have to grow your leaf nodes and you don't have to grow your Taylor nodes, which is why you get the best price performance for these kind of systems because you can spin up and down
Starting point is 00:30:18 each of these layers by itself. And that's the key part of the ALT architecture. So we use it at Rockset extensively, again, to power our backend database. You can have your own set of nodes for ingest and your own set of nodes for queries. And it gives you the best price performance. Okay. Okay. So just to make sure I'm understanding, ALT, the tailor is reading from a source, maybe that's a DynamoDB stream, maybe a Kafka stream, some sort of source. It's reading, it's doing the indexing itself. A data then lands on a leaf node,
Starting point is 00:30:49 which is holding it for storage. And the aggregator, that's going to take a query and sort of maybe fan out, do scatter gather to multiple different leaf nodes, which will sort of read, handle their part of the query, send it back to the aggregator for sort of final assembly and sending that query back. Is that right?
Starting point is 00:31:05 Yes, yes. Okay. Absolutely. So in terms of, I imagine those are like bound by different things. Like maybe the leaf node, is that going to be more IO bound? And then sort of the index aggregator
Starting point is 00:31:17 going to be more CPU and maybe a little bit memory bound on some of those? Yeah, that's a good point. So the leaf nodes typically are bound by storage capacity. But yes, you're right, depending on your workload,
Starting point is 00:31:30 it could also be bound by just the IOPS that is out there. But then all the aggregators, they're essentially bound by the compute that you have,
Starting point is 00:31:41 right, like the queries. And also the aggregators typically have some on-demand cache from the storage tier. So usually CPU and cache would be put together so that you get the best price performance. So sometimes you're also bound by the amount of cache that is there.
Starting point is 00:31:56 Let's say your working size is very big, then you might need more aggregators. Or if you need more compute, then also you might need more aggregators. Same thing with the ingest. Ingest is, no also you might need more aggregators. Same thing with ingest. Ingest is typically CPU bound. It's not memory bound or anything because it's basically ingesting and indexing data. So yeah, this is the part is that each of these tiers you can scale based on only one thing.
Starting point is 00:32:20 And it's easy to scale up when that thing is under high resource contention. Gotcha. How big are those leaf nodes? And are those multi-tenant? Are they single-tenant leaf nodes? So I think for ALV, it's a general architecture, right? We use that Facebook, we use it at Rockset. I know LinkedIn also uses it for some of their feed systems. So now coming back to Rockset, when you ask about how big are the leaf nodes, typically these leaf nodes are what we call the hot storage in Rockset. These are nodes which have locally attached SSD devices. Those nodes could have maybe 2 terabytes to 60 terabytes of storage for some of these storage nodes.
Starting point is 00:33:09 And most of these nodes, also what has happened is that the networking speeds have also improved a lot in the last 10 years. So you can get 10 gigabit, 40 gigabit network speeds. So there is sometimes a fine balance between your IOPS and your network speed so that you can kind of get the best performance again for the workload. But yeah, these are storage heavy nodes. And are those nodes multi-tenant or are they going to be dedicated to just my, what do those leaf nodes look like there? Yeah, so for Rockset, again, there are two different modes. It depends on the deployment option right uh one deployment option is multi-tenant
Starting point is 00:33:53 option where you can some of your data would be on the same maybe cluster nodes but maybe not on the same device or on the same node uh but again for rockset what happens is that the data so for rockset we separate the durability from the performance. So the durability we get by putting all the data in S3 or AWS backends, right? So that you never lose the data. And so the leaf nodes are mostly kind of, you can think about it more like an accelerator for all the data that is in S3. You see what I'm saying? Because accesses from S3 takes like 400 milliseconds, whereas accesses from SSD will take whatever five micro seconds or something like that, right? So yeah, so these leaf nodes are essentially big machines and we have some consistent hashing
Starting point is 00:34:39 between them so that if a leaf node dies, we can refill its data from S3 in panel on many other leaf nodes because these have, let's say, this leaf node has 20 terabytes of disk. If it's a single machine that will refill from S3, it could take a long time. We have some good partitioning scheme and hashing scheme
Starting point is 00:34:58 so that the load is balanced on failures. But yeah, every system is very unique. I mean, every system has a lot of challenges to fix so it's fun to talk about each of these systems we also write
Starting point is 00:35:08 openly about all of these all of the backends that we have it's like typically I mean we try to write
Starting point is 00:35:17 blogs and some more detail analysis about how we are implementing the backend yeah
Starting point is 00:35:24 it's true I love your blog and your videos, walking through this stuff. I've learned so much from these. So keep doing those great work on those. I want to talk a little bit about the aggregator now. Is the query like parser and planar happening at that aggregator node? Got it. Yeah. So I think your query, your question is about like how does Rockset
Starting point is 00:35:46 maybe execute the SQL query, right? So Rockset comes all SQL. I mean, the standard API from a customer is a SQL over REST, right? So you can make a SQL over REST. So it goes to an aggregator node that will do some query parsing
Starting point is 00:36:01 and then compilation. And then it will look at some statistics to do a query plan. So these statistics, again, you don't look at the statistics of the entire collection, but based on some samplings of data. And then figure out saying that, okay, I should use the index filter
Starting point is 00:36:21 for this query versus column scan because this is going to be a highly selective query. And then we have an execution engine that will give this the plan to execute. The execution engine are essentially again a C++ written backend where
Starting point is 00:36:37 the goal for the execution engine is not to like spin up JVMs or spin up threads or there's none of those. It's, again, optimized for low latency. So it's very much like a data flow or the Volcano-style model, where there's data flow,
Starting point is 00:36:51 there is message passing versus more RPC-style, less of RPC-style, more message passing kind of style. So the query flows to all the leaf nodes. The leaf nodes might do some work, send them back to the aggregators and aggregators are going to
Starting point is 00:37:08 do more processing sometimes it needs to go back to the storage to go do other kind of again because the SQL is a very complex language so depending on how complex they could need multiple round trips sometimes to get stuff
Starting point is 00:37:23 but yeah so the aggregators essentially do this SQL parsing planning and then submitting the query for execution by the execution engine. Gotcha. And if I can recall, I remember doing like some analyzed query stuff in the Rockset dashboard before. And I feel like a lot of queries ended up being sort of like an initial step, you know, finding the right records based on some filter conditions, you know, hitting either the inverted index or the columnar index to find those.
Starting point is 00:37:53 And then once I found those, once I found those sort of target records doing like the hydrate step where it's hitting the row index to get any other attributes I want to have with it. Is that kind of like a common? That's true. I think so.
Starting point is 00:38:07 I think SQL, so SQL will have select some columns. Those are the projections, right? And then there's a where, there's some filter conditions. So typically we run the filter conditions to reduce the size of data that this query needs to test. And then we do fetch all the projected fields
Starting point is 00:38:30 and then do the aggregation because aggregations are usually on the projected fields. And then some of those aggregations kind of... Yeah, so some of those aggregations... So actually, good point that you mentioned this. So we have SQL, but you can run the SQL not just at query time, but you can also run some of the SQL at ingest time. So for example, you are aggregating.
Starting point is 00:38:52 So let's say you want to find the number of unique users visiting your website every hour, right? So we know what the query would look like. So we can kind of transfer some of the compute needed for queries that we actually run at in just time and kind of keep semi-rollups. We call it rollups or ingest transformations, where it will do some partial counts and aggregations
Starting point is 00:39:11 when new data is coming in. Again, you don't have to write that code, but it happens automatically for you. And then when you run the SQL query, let's say you want to find the number of unique users every hour, you might not need to look at everyone. You can just look at, say, pre-aggregated data that has happened every minute
Starting point is 00:39:30 and then just look at 60 of those and give you per hour aggregate. So yeah, we have SQL on both sides of the time. Data when coming in and when you're making the queries. Okay. When you talk about when it's making that plan, it's going to look statistics, table statistics and things like that. What is the process for getting those statistics? You mentioned some
Starting point is 00:39:53 sampling, but is it, you know, the the Taylor node is doing some indexing and generating statistics and that is it is it sending it to the aggregator somehow? Does does each aggregator have like the statistics locally or like, how do those stats get communicated? Great part.
Starting point is 00:40:08 So the tailors actually don't communicate with aggregators at all. Okay. This is the good part. This is why we call it ALT architecture, completely segregated. So that if a tailor is getting stuck for some reason, it cannot cause the aggregators to get stuck because of some RPC deadlocks or whatever else. So what happens is
Starting point is 00:40:25 that the tailors read all the new data coming in and then it will automatically generate some of the statistics that we need for some rows. Now this could be like some kind of very elementary histograms of say the ranges of a typical let's say you have an integer field kind of what ranges of integers are out there in some kind of sampling mechanism I'm giving you. If there are strings we have some idea about the lengths of strings and other things as a sample. These things happen when data is coming in so it's part of the indexing process. This is what basically also part of the converged index that we have,
Starting point is 00:41:06 which not only builds all the indices, but also builds higher level summaries, you could say, for some of these so that the query engine knows. So now when the aggregator needs to find this,
Starting point is 00:41:17 it just looks at the leaf and it knows which fields, which are the kind of the hidden, I shouldn't say hidden, but more like the attributes of the data. It knows how to fetch the attributes of the hidden, I shouldn't say hidden, but more like the attributes of the data. It knows how to fetch the attributes of the data from the leaves and it leaves those to plan the query.
Starting point is 00:41:31 So there's no communication between the tailors and the aggregators. And the only thing is the shared storage, which is what we call as the leaf nodes. This gives us something good, isolation, compute, compute, and other things. We can talk about it again later if you're interested, but that...
Starting point is 00:41:49 Absolutely. A couple more things just on this ALT. Do you have four queries? Do you have rough target response times? I know it's going to vary based on if you're two terabytes of data, but if I'm doing a more search-type query where I'm filtering on some specific conditions and I'm trying to just return some records, what sort of target response time do you aim for there?
Starting point is 00:42:12 So, great question. So, most of our users or most of our applications, when they start off using Rockset, they're using a system where, let's say, their data latencies are 15 minutes or 10 minutes or 5 minutes. And they come to us saying that, can I get sub-second query latencies? They don't actually give us a number saying that I need like 800 milliseconds or something like this. But they also tell us that as part of their application query, it might make, let's say, one application query from a user's perspective might result in eight database queries. You know what I'm saying?
Starting point is 00:42:52 So if they need sub-second response times for their users and you need to do eight database queries as part of that application, then your database query cannot take more than say 50 to 100 milliseconds on the average, right? So this is kind of, how should I say it? Giving you the typical example, not everybody falls in this range.
Starting point is 00:43:13 But the lowest latency queries could be less than 10 milliseconds, and the highest ones could be many minutes for us. Again, Rockstrap also can do large, long, big queries if you like to. The largest queries probably take 30 minutes or so, but those are mostly reporting queries that people do on the site, not really our sweet spot and not our differentiator.
Starting point is 00:43:35 Is there a data size where you tell people, hey, you'd probably be better off using something else for that? Yes. What sort of threshold point is that if I'm doing huge large-scale aggregations? When do you say maybe something, a warehouse or something?
Starting point is 00:43:52 Yeah, no, absolutely. I think there is a... So we don't replace a warehouse. We definitely coexist with all warehouses that's out there. Typically, if it is 500 terabytes, petabyte- size data, then definitely this doesn't make sense
Starting point is 00:44:07 for you to make it real time. Usually people don't want to make their warehouses respond to like 50 millisecond queries, you know what I'm saying? So it's a different set of applications. So when you're talking about hundreds, high hundreds of terabytes of data, let's say 500 terabytes, 600 terabytes, petabytes, we definitely don't.
Starting point is 00:44:26 People actually don't come to us with those. When we explain them, they very clearly understand that yeah, I have gigantic amounts of data, but I want to operationalize my last one month of data, last six months of data, and then things automatically fall into place.
Starting point is 00:44:42 Yeah, absolutely. So you talk a lot about real-time data, and sometimes that means very quick response times on my queries, but it also means high-velocity updates and not sort of out-of-date data. First of all, what's your
Starting point is 00:44:58 target latency in terms of freshness of data from an upstream system? How far behind is RockSight usually going to be? Yeah, yeah, that's a good question. So real-time means different things to different people, right? If you ask what is real-time, some people will say one thing, some people will say another
Starting point is 00:45:15 thing. But at the end of the day, it feels like the common theme among all the answers is that you want to get something done very quickly as soon as some event happens in your system, right? So we measure two things. We measure query latencies and we also measure something called data latencies. So these are the two things. The data latency is a measure of your freshness and query latency is a measure of your response times, right? So for data latency, let's say you have, let's say you're collecting megabytes per second of data in Kafka
Starting point is 00:45:49 right very popular system to transfer data so our guarantees are that if you put data in Kafka it'll show up in your octet queries
Starting point is 00:45:57 within say 50 milliseconds 100 milliseconds and this kind of range because we have ways where we can continuously pull Kafka.
Starting point is 00:46:06 So you have managed connectors at Kafka, very nicely integrated. They are going to keep reading and scale up when there's more data in Kafka and index it. And those things become available in your queries in like 100 milliseconds or so. But that's the data latency. This is important when you're doing fraud detection, for example, right? Like if you can shorten the time it takes to detect fraud,
Starting point is 00:46:30 you can save millions of dollars per second or something like that. So there, and then the query latencies, I think people are mostly familiar with, right? Because databases have been around for so long. So we want query latencies to be fast. Again, because fraud detection use cases, whatever these ones are, need to respond quickly in real time. Such a mix of both of these two.
Starting point is 00:46:54 Yeah. And also in terms of that data freshness, I know a lot of data systems, QuickHouse, Druid, whatever, a lot of them have trouble with updates. It's like, what's the sort of core architectural
Starting point is 00:47:09 difference that allows you to do updates in real time where some of these other ones struggle with that?
Starting point is 00:47:16 Yeah, that's a good point. So, for RockSERC's underlying storage engine is RocksDB.
Starting point is 00:47:22 So, RocksDB is this key value store. And the benefit of using a system like RocksDB, right? So RocksDB is this key value store. And the benefit of using a system like RocksDB is that every key and value is immutable. This is by definition is a key value store. You haven't seen a read-only key value store,
Starting point is 00:47:35 I think, ever in your life, right? Nobody uses a read-only key value store. So it's optimized for updates, key value stores, and we leverage that. So it's very easy for us to be able to efficiently update one field in an existing document or add a new field to an existing document or delete also, for example, if you want to delete and change some fields. Whereas for other systems, let's say ClickHouse or you're trying Druid, what happens is that
Starting point is 00:48:05 the technology is built to compress data as much as possible because they're, if you have gigantic amounts of data, they're optimized to reduce your storage cost. So you compress the data so much that it's tied you back into a small part of
Starting point is 00:48:22 your storage system. Now when you want to update, it's a very difficult update because you can't really update a ZZ block, for example. Just, for example, you have to extract everything out, make some changes and recompress it and recompact it. So that's the trade-off that we have where because we use RocksDB, we do use compression like ZFTD compression,
Starting point is 00:48:42 ZZ compression. But because we can do it at block level, we can do it at key level, we can actually give it much better price performance when you are having updates. Let's say you have a DynamoDB table, right? And you join it with and you create a Rockset collection
Starting point is 00:48:58 widget. You update DynamoDB, we look at the CDC stream and update their Rockset collection in real time within less than 200 milliseconds. It really is a superpower compared to some of the other ones. Okay, let's talk about separation. We hear a lot about compute storage separation. You also have more on compute separation, which is really interesting.
Starting point is 00:49:18 So compute storage separation, I feel like that sort of broke into the mainstream with like Snowflake is when people started to get fired up about it. But like what is compute storage separation? What are the benefits there? Yeah, compute storage separation has been now prevalent for the last probably 10 years, right? When the cloud became very popular. Again, the benefit there is that you can have your storage system on one set of machines and your computes on another set of machines. Because then you can, if you have more storage, like you said, spin up more of your storage nodes.
Starting point is 00:49:47 And if you have more compute needed, you spin up more of your compute nodes. That works well for Snowflake or other warehouses, because most of those things are not real time. Like in Snowflake, when you are depositing data, they will have a staging area where all the new data is being written to. And then every 10 or 15 minutes, you can say that, okay, I will load this data
Starting point is 00:50:07 into a table for query collections. Uh, and that's the time when you compact it, make it column and compressed, tidy it back and then make it available for queries. Um, so just the compute stores. So this is, so in Snowflake, it's linked here. If you want to do real time using SnowBipe and things like those, your costs are like shooting through the roof because you'd have to scale up both sides of your thing.
Starting point is 00:50:32 Whereas in a real-time system, it's not enough to just separate the compute and the storage, like we talked about the LTE architecture. It's also needed to separate the compute meter for ingest and the compute meter for queries and the storage. So Rockset, we have something called compute separation, which is not just compute storage, but compute compute essentially means that you can separate out the ingest compute from the query compute. And you can also have additional compute for other workloads. Take, for example, you are serving a fraud detection
Starting point is 00:51:05 use case, right? So you have a set of compute running on that. But then you also need to run some reports every night on the same data to tell you some metrics or insights into the data system. So you can spin up another set of compute nodes and run those queries. And when you make the update to the data, all these compute instances see the update immediately. So that, this is what I mean by real time, and you can't get this in Snowflake or other warehouses because their latencies are typically like many minutes before you can actually make it visible.
Starting point is 00:51:37 Yep. That compute-compute, man, I'm just thinking back to a prior place I worked where we had Elasticsearch, right? we're indexing, we're like, show where we're showing it to end users, but then someone had also built, like an internal dashboard on top of Elastic Search. And seriously, every time like, our executive would go to that internal dashboard, like all these alarms would go off, because now indexing is getting backed up. And like the user queries are getting way slower, because it's like churning through all this data. And now just being able to segregate that to where you have like a different
Starting point is 00:52:09 set of compute nodes crunching those OLAP queries for that and not affecting indexing or sort of real-time queries. Yeah, that would have been very beneficial. Yeah. I mean, this is a common problem. Yeah, a common problem for most of our customers. Like they use Elasticsearch and one user comes in and affects everybody else.
Starting point is 00:52:29 For Rockset, this is one of the beauty of building something cloud-native where from day one, we know that we can spin up compute nodes as an API
Starting point is 00:52:38 versus buying machines and installing on your data center. So it's easy for us to do these kind of tricks. And so this is why the price performance is so much different
Starting point is 00:52:47 when you use Rockset versus Elastic. Yep, yep, absolutely. That compute-compute separation video of yours is really good. I'll link that in the show notes,
Starting point is 00:52:55 but check that one out. I want to move on to something. Let's go even higher level now. Let's talk about a new feature. So you now have vector search, vector indexing.
Starting point is 00:53:04 You know, generative AI has been huge this year, seen a lot around this. Maybe just tell me, like, what's hard about vector search, especially compared to like other indexing patterns, like the sort of inverted index or columnar index? What's hard about vector search compared to those? Or different? Yeah, I mean, search typically means like there's some kind of indexing system technology needed to serve the search query, right? So when it comes to vector search, I think as a database person, the question that comes to my mind is what kind of index can we build to serve that
Starting point is 00:53:39 search query? So as far as the indexing is concerned, Facebook actually have done a good amount of work here by open-sourcing some of the FACE, F-A-I-S-S libraries, right? And one of the models of operation that's supported by FACE is this called inverted FACE, called IVF. So the challenge for most vector databases out there that you might have heard about a lot of new vector databases
Starting point is 00:54:11 in the last year, the challenge is that they do mostly vector operations, so you can do vector search. But if you look at a real-life application that we see from our current customers, like let's say insurance companies, financial companies, they all want to do vector search. And they don't want to do vector search just by itself.
Starting point is 00:54:30 They want to do vector search with existing data where they also need to join with other data sets, look at access patterns of some of those records, and filter out some of the vector search results. So in my mind, vectors or indexing vectors is just like indexing geo locations or just like indexing uh what is it ip addresses like you know what i'm saying so
Starting point is 00:54:57 people have built different kinds of indices on some of these well-known data types over time. We also have a geo-index, and the vectors is just a floating, like a floating point array of numbers. In data system, I call it an array of numbers. It's nothing new, but the challenge is that sometimes the data sizes could be big, right? Because every vector has a thousand elements, each one of them is a floating point number.
Starting point is 00:55:23 So the challenge for most of these real-life applications is how can we scale this vector search to large sizes? How can we update these vectors? How can we manage these vectors in the sense, how can we get quality search results from these vectors? So there are two types of challenges, again, coming back to your question. One is related to the algorithm that you are using to do the recall. Now let me backtrack. So some of them are exact match vector search algorithms and some of them are approximate vector search. Or exact match is a question of implementation and
Starting point is 00:56:02 optimizations that you do to figure out how can you find all the vectors you are wanting to search. But for approximate search, there are probably four or five widely used algorithms that almost every vector search database is using. So there is less of differences in that. And that affects the recall that you have from your vector search. But I think based on all the current technologies out there, I feel like the differences there are not as much as picking an algorithm that you could pick. For example, let's say you pick the IVF algorithm, or you could pick the hierarchical HNSW algorithm. There are variations of each one of these but at the end of the day the recalls are slightly different between some of these things.
Starting point is 00:56:52 But from what I'm seeing is that not a lot of people are picking vector databases based on how much recall there that algorithm supports but they're mostly picking vector databases based on, can I use it now? Can I update these databases or data fields when I need to? Can I maintain them? Can I associate roles or security policies with these things? This is what is happening in actually production systems.
Starting point is 00:57:20 But as far as the challenges are concerned, it's a challenge of like indexing a large area of numbers I have a question on
Starting point is 00:57:29 that because okay so if you go to open AI and use their embeddings model they're going to give
Starting point is 00:57:34 you you know a vector with like 1500 dimensions right and some other models might return with
Starting point is 00:57:41 you know 300 dimensions or much fewer dimensions is the number of dimensions going to vastly affect my accuracy models might return with 300 dimensions or much fewer dimensions. Is the number of dimensions going to vastly affect my accuracy? I mean, imagine the models themselves. Or is there also like a way to reduce those? If OpenAI gives me 1500 back,
Starting point is 00:57:58 is there a way I can squeeze that into a smaller amount so it's easier to index, maybe faster query? What is that? You can do some types of quantizations, right? This is widely researched now. Take, for example, a 1500 size vector. If you reduce it to, say, 700 bytes or 700 elements, it will half the size.
Starting point is 00:58:19 But the impact or the recall or the accuracy of your queries could only be 5% bad. Again, I'm waving at Limey, it's not 50% bad. So depending on how we compress those is super useful for the app, especially if you are storage bound. So, oh, so there's another point
Starting point is 00:58:40 is that a lot of these open source vector databases, they're mostly people running it on memory, which is why they're very much size bound. But when you run it on SSDs, like how we do on Rockset, we're not really storage bound at the time because we have the ability to
Starting point is 00:58:58 store all the vectors in SSDs so it just opens up a huge dimension of I mean, you could afford to store 1500 element vectors because we store everything on SSDs. But yeah, quantizations are definitely, it's a hot research area right now. Some of the databases do this automatically under the covers. Some of them don't and they leave it to the application to do it.
Starting point is 00:59:21 So an application, let's say, is opening up, then you use another library to compress them into 300 byte vectors and then store it in the database. So I don't know where the final verdict would be, but right now I don't see all the databases doing this automatically on the fly. Gotcha. And what do response times look like for vector search
Starting point is 00:59:43 as compared to like normal, you know, inverted index or columnar search? Is that to like normal, you know, inverted index or columnar search? Is that going to be, you know, if I have millions and millions of records or terabytes of data, is it going to be significantly slower to do sort of vector search as compared to just normal where filter columns? So I think actually faster based on what I'm seeing from our customers. Like for example, we have some customers who are looking at recommendations. So they have a homepage where there's like selling auctions, online auctions.
Starting point is 01:00:13 And when a person logs into the website, they need to show the auctions that are most relevant to that person. So they do that. They used to do it based on some of the keyword searches, matching patterns, and very like machine learn models. But as far as the vector database is concerned, they will use the vector database to reduce the amount of data that they need to, it's say, reduce the candidates from 15 billion records to 500,000 records
Starting point is 01:00:47 and then do more personalization among those 500,000 records to show which recommendations to show. So the total
Starting point is 01:00:55 overall time is actually much lesser compared to before because these models are kind of
Starting point is 01:01:03 becoming very powerful. I mean, that's the whole point of this. It's actually reducing the latency so you can do more stuff during those 500 milliseconds or whatever you have to respond to a user-facing query. Yep. Wow, that's amazing. Okay. I want to shift from technical stuff
Starting point is 01:01:19 and just talk about company stuff. You're one of the co-founders of Rockset and founded in 2016 is that right seven years now okay what what did that look like in terms of you know building a database like you you and your co-founders have a very strong track record of doing great things at facebook and and elsewhere but in terms of convincing people to trust you know trust you with with with their data in a new database? What was that process like?
Starting point is 01:01:47 How long did it take to, I guess, get your initial build or you felt comfortable sharing that with users? Yeah, absolutely. I think building a database is quite, how should I say, it's a longer journey compared to many other startups that are out there. Because it's kind of building like, say, a skyscraper. And you have to build some of the foundations before you can actually start building on the floors in your building. Building the foundation is definitely something that needs time and effort.
Starting point is 01:02:21 What we have seen is that we have, as far as Rockset is concerned, I think we hardly see anybody once they start using Rockset leaving Rockset
Starting point is 01:02:35 because they really feel that this is a great piece of technology. They're just stuck to it. But that doesn't mean that our work is done because a lot of the challenges that we currently see day to day are, oh, I have data in SQS or something. Can I integrate it into Rockset? Can I bring it into Rockset quickly?
Starting point is 01:02:55 So it will still build a lot of integration pipelines so that it's easy for us to get data into the system. And I still have to build a lot of things to be able to interact with other tools that are out there in the ecosystem so that people can use it quickly and efficiently. There are so many data tools that are out there. We have, again, standard SQL over REST, we have JDBC, whatever other APIs. But still, there are a lot of other tools
Starting point is 01:03:22 that people use that might not fit into some of the APIs that we have. So there's still a lot of other tools that people use that does not might not fit into some of the APIs that we have so there's still a lot of more work to do and the data
Starting point is 01:03:31 whole data the ecosystem is changing rapidly because people are accumulating lots more data this year versus last year
Starting point is 01:03:38 so the rate of innovation has increased and we are trying to make sure that we can innovate fast enough on all fronts, not just on the database, so we can keep ahead. So our focus is always speed, scale, and simplicity.
Starting point is 01:03:53 These are the three S's we keep talking about. How can we innovate on three different dimensions so that we can build a great, valuable company for our users. Yep, yep. I remember talking with Venkat at reInvent last year and just some of the super low-level stuff that you guys do. I don't even know enough about it, but the new chipsets and SIMD and different things like that, the work you're doing there. How much of the work you're doing is just staying up on top of like new hardware updates and optimizing for that?
Starting point is 01:04:27 Like, does that change very often? Or is that like, you know, hey, Flash came around, you know, SSDs and things like that. And then it's like, how often does the physical infrastructure change to where you need to make big changes to Rockset as well? Actually, the physical infrastructure is also changing much more rapidly now, right? Like, so we shipped an entirely integrated version of Rockset on Intel hardware.
Starting point is 01:04:53 I think that's the reinventing that you were talking about. Yep, yep. So when a new hardware goes along, we have to make changes to our code so that we can leverage the new features in the hardware. It's not just porting,
Starting point is 01:05:09 but it is actually rewriting some basic primitives. So what we have done is that we have tried to localize those in certain areas so that we know that when we need to port to a different platform, we can port them quickly and efficiently. But the beauty of the solution is that irrespective of the hardware, the service, like the Rockset service remains the same for all our customers. So it's
Starting point is 01:05:30 basically they can leverage the changes in hardware that that's coming in and get more work done. Alex Ferrari- Do you remember, so started Rockset in 2016. Do you remember how long it was before you had your first customer like, hey, this is enough, reliable enough to use and how long did was before you had your first customer like, Hey, this is, um, good enough, reliable enough to use and it can be, how long did that process take? Ankur Kotwalda- Yeah, yeah, no, absolutely. 2016 and I think December, actually September, 2016 is when it started.
Starting point is 01:05:54 So kind of, yeah, seven years now. So the first three or four months was all about writing some code to kind of show a demo. And then in 2017, I think we had our first customer who was, I call them more like the believers rather than the customers, right? It's like a new religion. They'll say, oh, I believe in you.
Starting point is 01:06:15 Let me try to see what you guys are doing. So we used to camp in their office probably in 2017, like three days or a month or a few months to show how our software will look like, right? And in the very early days, we were actually having a very custom API that was posing the raw database internals and it wasn't very SQL focused. And that was a time when we realized from our customer in the first nine months that without SQL, this is a longer journey.
Starting point is 01:06:46 So the first customer was before we are like nine months old or something like that. But we launched Rockset in 2019, which is when we kind of openly wrote about in our website, came alive and things like that. So we have been in the public now for the last four years or so. And every year, I think I see a lot of changes in the volume of our customers and the size and shape of our customers. And it's becoming more exciting and interesting.
Starting point is 01:07:22 Yeah, very cool. Okay, I was going to ask you if you ever considered an API other than SQL or if SQL is a one. So it sounded like you started with that and realized early on, like, I think that just makes so much sense. You see a lot of new databases
Starting point is 01:07:33 and sometimes they want to write their own query language and it just like increases that barrier so much more because you're learning all the mechanics of this database and then having to learn just the basic interactions with it
Starting point is 01:07:43 also slows the adoptions. Yeah, one of the philosophies we have is that we don't want to build something if we don't have a customer. So even new features that we're building, we're never building a new feature by deciding between product and engineering and marketing
Starting point is 01:07:59 whether this is the best feature to build. We always try to build if there are a few customers lining up saying, we will use this feature when needed, when it's available. It's very much like the iterative development model is how I see it. Like we got this when we were at Facebook,
Starting point is 01:08:15 building Facebook infrastructure, where you never build something that is just somebody's intuition or something. Theoretical or something like that. Make it actually practical. Yeah, absolutely. That's great. Well, Drew, I love this conversation.
Starting point is 01:08:36 I loved all the videos you've done and the blog posts your team does on this stuff. Congrats on, y'all raised a round recently, so congrats on that and just the great work you've been doing. If people want to find out more about you, about Rockset,
Starting point is 01:08:49 where can they find you? Yeah, I think the best place would be to go to rockset.com. That's where we have a good
Starting point is 01:08:56 description about our product. And also, there's a free trial if you have some data or one of your listeners have
Starting point is 01:09:02 any data and they want to try. The best would be to just create a free account, try it, see how it works and give us feedback. Alex Williams- Yep, absolutely. I highly recommend that. I highly recommend the white papers and videos and all that stuff. So, Dhruva, thanks for coming on the show and best of luck to you and the team at Rockset going forward.
Starting point is 01:09:20 Dhruva Goelkar- Hey, thank you. Thanks a lot. Great talking to you, Alex. And I hope to hear a lot more of these videos from you in the near future. Alex Williams- Cool. Thank you. I appreciate it, Dhruva. Dhruva Goelkar- Thank you. Thanks a lot. Great talking to you, Alex. And I hope to hear a lot more of these videos from you in the near future. Cool. Thank you. I appreciate it, Dhruva. Thank you. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.