The Data Stack Show - 136: System Evolution from Hadoop to RocksDB with Dhruba Borthakur of Rockset

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Kostas, it's amazing to me the people that we get to have on the show. We're going to talk with Druba of Rockset, which is built on Rock CD,

Starting point is 00:00:31 which is, you know, sort of a legendary piece of technology used at some gigantic companies, you know, to drive things, you know, like your Facebook or LinkedIn newsfeed. Just incredible. And he has started

Starting point is 00:00:46 something new even, which is amazing after how much he's already built. I want to ask him about early days at Facebook. And this may sound funny, but we haven't talked in depth about Hadoop, but he's sort of a Hadoop master. Of course, built a lot of things to solve a lot of the problems that Hadoop had. But we haven't dug into it a ton on the show, so I'm going to actually ask him about that history and especially that transition at Facebook, because I think

Starting point is 00:01:15 that would be very informative. How about you? I mean, okay, but first of all, I think we definitely like a lot to talk about around RocksDB and Rock rock sets so we will definitely spend quite some time like talking about that understanding better what rocks db is and what made it like so that's important to the industry out there and also talk about like Rookset, like what Rookset is doing different, like compared to other vendors out there and how RooksDB is part of that solution.

Starting point is 00:01:55 Right. So I think we will talk about like the whole, I don't know, like probably 20 years of history and from like the Hadoop and Yahoo were out to today. So let's go on top with him. Let's do it. Amazing. Let's dig in. Dhruba, welcome to the Data Stack Show. We are so excited to learn from all of your experience and, of course, hear about what you're doing at Rockset.

Starting point is 00:02:25 So give us your background, how you got into data, and what you're doing today with Rockset. Cool. Hey, thanks, Eric and Kostas, for inviting me here. I really appreciate your time and delighted to be chatting with you today. So, yeah, my name is Dhruva. I am the co-founder and CTO at Rockset. So Rockset is a real-time analytical database that we have. It's a cloud service, and I've been working at Rockset now for worked at Facebook for around nine or ten years, building a lot of data platforms at Facebook. I started off with building a lot of Hadoop backend data platforms at Facebook, Hadoop, Hive, and HBase. And then I moved over to working on another open source project called RocksDB, which was a key value store that we built from scratch or kind of productionized it from scratch at Facebook. And before Facebook I was at Yahoo building a lot of Hadoop file systems. I was the project lead of the Apache HDFS project back in like 2006 or so. So I've been with this data system for probably like

Starting point is 00:03:39 20 plus years now. So it has been a very interesting journey and I've seen a lot of different software being used for processing data. So I'm excited to be here and just have like a try to answer some of your questions and maybe share some of my thoughts and opinions, if you'd like. Oh, we definitely would like that. Let's start with Hadoop. I mean, I think, you know, a lot of people who are, you know, sort of maybe newer to the data industry in the last, you know, sort of, say, several years may not have had direct experience with the impact that Hadoop had early on, you know, sort of in the world of data. I mean, so many things, right? Data processing, big data is, of course, the buzzword.

Starting point is 00:04:30 But you worked on Hadoop at Yahoo and then at Facebook, right? And, you know, sort of early on when Facebook's employee number was still in the hundreds, I think that you mentioned when we were talking before the show. Can you give us a sense of maybe paint a picture of where Facebook was at when you joined the types of data and the data challenges that they were facing and why Hadoop made so much sense as a system in 2008, you know, for you to build the things that you wanted to build? Yeah, that's a good question because it because not just the software, but also the hardware

Starting point is 00:05:08 had something to do with this, right? Or why Hadoop became so popular. So back around 2006, 2007, and 2008, storage became cheaper and cheaper. Like hard disks, like a gigabyte size hard disk were cheaper and cheaper. And nanobytes size disks came around. Prior to that, storage was costly, right? So you have SATA disks. There's a technology called SATA, S-A-T-A disks, SATA disks.

Starting point is 00:05:32 So they became very cheap in price per gigabyte. And so people could buy these disks, or companies like Facebook could buy these disks. Now the question is that what software shall I use to store data in these disks, right? So this is the reason why I think Hadoop became so popular. I mean, I wrote a lot of Hadoop file system code, but I'm not saying that is like the world's best code I have written, right? It's good code. It does its job. But the real challenge for or like the promise of this system is that I

Starting point is 00:06:04 have maybe at that time 100 terabytes of storage. What software shall I use to store data? Can I use Oracle? No, I cannot because I'll run out of money because it's very costly, right? So that's the reason why Hadoop became really popular. And I really love working there because the data sets at that time were a few hundred, maybe tens of terabytes to a few hundred terabytes this is a very big data set but within a year or so it became like a petabyte or so and then multiple petabytes very easily the challenge again was like fault tolerance recovery

Starting point is 00:06:39 can these automatically be handled like a software? Because these disks were cheap. They could fail any time. So it's not like a high-quality hardware that you have. You have low-quality hardware, but so much of them, but you need software to manage these things, right? So fault tolerance was super important. But again, Facebook invested a lot of resources here because very early on, I think the focus was that if we ever want to monetize

Starting point is 00:07:06 a social platform, then you'd have to deal with the data sets, how we're behaving, reacting, those kind of things, right? So 2008, it was mostly about growth for Facebook. how can I use the data to make things more engaging for customers? But back around 2011 or so is when, how can I use Hadoop kind of platforms to monetize my platform? Like, how can I show better advertisements? How can I show 11 advertisements? So, again, the same technology, but used for very different use cases in the company's lifetime. So, but those systems are very bad systems, right? Hadoop is a very bad system.

Starting point is 00:07:51 So, you could just deposit, put a lot of data, and then you could look for intelligence. You could look for mining some nuggets of information that you need to make your, let's say, your advertisements better, right? But, yeah, I mean, that is kind of, Hadoop became kind of, like I said, the granddaddy of all these data systems. It made it very easy for people to store data. It didn't make it easy to query and make sense out of it, but storing all data is the easiest platform there. So can you explain how, in your time at Facebook,

Starting point is 00:08:24 because you were there for some time, how did the teams change around the technology? And when did, was there a huge migration from Hadoop? And actually, I'm interested to know, are they still running some of that old infrastructure or do you know? No, I know. I think they continue to run some similar software, like homegrown software, right? Because they wanted to improve their backend as well. But I did see a very clear kind of, what should I say, path of how the data systems were evolving, right?

Starting point is 00:08:58 So back in the days, it was Hadoop back in 2008, store a lot of data. And then some few analysts will come make some queries. Queries will give some answers in three hours, and then they will rerun some queries. The cycles, iteration cycles, were probably a week or two before you actually can get the intelligence out and seed it back into your product land, right? But then, after I think the company became public around 2011 or 12, that time there was a lot of focus on monetization.

Starting point is 00:09:33 And so monetization, the focus was how can you make these systems more and more real time? Because if a user logs into Facebook, you need to show him the right advertisement at the right time based on what he looked at or where is he located currently, what is his geo position, and you have to do complex... Mobile had skyrocketed because in 2008 it was still pretty early, the iPhone was very young, and so mobile's skyrocketing. Yeah, I mean, surprisingly, Facebook didn't have a good mobile product until 2012. So it came late into the mobile product. Fascinating. But far later in the game. But yeah, real-time became super important and that's the time when

Starting point is 00:10:11 I actually started to work on another project called Rocksteam. So this was a natural progression of events. So Hadoop was good, but we could not make it real-time. It is a very bad system. So the two main use cases that started to use Hadoop but needed more real time was, one is obviously ad placement and ad showing. The other one is about spam

Starting point is 00:10:33 detection. If somebody posts a bad URL to Facebook, we need to ask as quickly as possible. Otherwise, there is all kinds of problems, right? Legal problems, non-legal, some financial issues, everything. So, we need to ask quickly as possible. Otherwise, there is like all kinds of problems, right? Like legal problems, non-legal, some financial issues, everything. So we need to quarantine these bad posts immediately.

Starting point is 00:10:52 So Hadoop could not really keep up with these kind of workloads where you need to react quickly to new things that are happening in your data sets. So this one I got chartered into writing something called RocksDB. It's a database again. It's a key value store. But it is basically low latency queries on large data sets. And there was a hardware change that happened again at this time. 2012 is when SSDs became really cheap. Right?

Starting point is 00:11:18 Before that, flash drives were costly things. I mean, you think twice before you buy flash drives. People mostly store data in hard disks. In 2012, 2013, SSD prices just started to fall through the roof. And so the way we built RocksDB is that can we leverage the power of the SSD devices and build a database from scratch so you can get low latency queries on these large datasets. So that was kind of the nature of progression there.

Starting point is 00:11:46 And then much later after that, it was all about building more kind of reactive systems. Before that, all the systems that I told you are very much built passively. But then we had more reactive systems where a change here kind of, how should I say it? It went more to like a data flow kind of model where you make a change here, it produces events and then goes and affects other systems on the side. So the data platforms evolved over time, which are more proprietary to Facebook, not like open source software,

Starting point is 00:12:25 but similar to, say, Flink or some other open source software that we are familiar with. We built some of those things similar over there where things became more and more reactive. So I can see a real change. Like every five years, I think platforms kind of evolve and take a completely different stint, although the core probably might be still similar in nature. Yeah. So I'm interested to know a couple questions. But the first question is, how far did you push Hadoop?

Starting point is 00:12:56 And how did you know when it was time to explore a new solution? I think it's all driven by the market, right? Like what are our developers demanding? So Facebook, I mean, back-end data engineer writing data infrastructure code, but then there are a lot of people who are writing applications. Yep.

Starting point is 00:13:15 Take, for example, the Facebook app, right? The Facebook app, like when you fire up your Facebook app in your phone and you see all your feeds and posts, that is a data app. You see what I'm saying? And that's one of the world's first data app that needed to process so much data

Starting point is 00:13:31 and give you results very quickly. Yep. So we also saw that if your real timeliness of the Facebook feed is important, like if you see your friends' posts immediately, you have a better engagement versus if you see posts after 15 minutes. So this is a market-driven thing.

Starting point is 00:13:47 So now we said that, okay, we need to build more better data systems which are more real-time. Like if somebody comments on photos everywhere, then that photo should somehow be highly ranked in your feed. So we built a Facebook. We built something called a news feed, which is essentially the backend which powers the Facebook app. And that uses RocksDB now.

Starting point is 00:14:09 Again, Hadoop, we just cannot use those for those kind of latency, low latency that our users are facing. So most of these are, I think, application driven. When applications start using data sets, then the demands are different versus business analysts using data sets. Like Hadoop, mostly business analysts use data sets or data scientists use data, those systems, right, to answer like what-if questions, those kind of questions. But when the applications started to use data, one first application is the Facebook app. Then the demands were very different. It cannot be batch. It cannot be stale data versus live data. So all those things are all driven by applications requiring

Starting point is 00:14:51 to do more intelligent things with data versus just doing offline analytics. Sure. And so, okay, so when you started to work on Hadoop stuff early in 2008, you know, a couple hundred people at the company, you were trying to drive growth and understand, you know, sort of the dynamics of the social graph, et cetera. Fast forward a couple of years, you now have an app that has real time demands. How did you decide to build RocksDB and what was available to you at the time? Like, and I know Facebook builds a lot of stuff,

Starting point is 00:15:27 and it's a very sort of engineering-forward culture, especially historically, but can you describe that process and who ultimately said, okay, we're going to build sort of a low-latency key-value-store-type query engine? Oh, yeah. So I remember the certain step functions that happen in this process, right? So earlier times, in the earlier part of that real-time journey, we at Facebook used something called HBase.

Starting point is 00:15:59 I think you guys might have heard about it. It's basically a database built on top of Hadoop, right? So we tried to use Hadoop. It was a data system that was powering some of our ads backend. Facebook also had a very cutting edge engineering culture at the time of trying maybe 50 experiments and letting 45 fail. If they fail, but at least five would be successful. I remember that well, reading all the blog posts about this.

Starting point is 00:16:27 Yeah. Yeah. So I try, I mean, as I work closely with some of the upper management to say, hey, shall we build something which is actually better than HBase? So I got maybe around eight or nine months of time. I looked at other solutions that are out there. Facebook also had some existing solutions, but I figured that those don't really leverage the capabilities of a flash storage system, right? So I got

Starting point is 00:16:55 kind of a charter saying that, hey, can I build something where you can give low latency queries on flash drives? And I built something and I had one more people, one more engineer with me first. And then after around eight or nine months, we could replace the 500 node HBase cluster with a 32 node RocksDB cluster. So when we could do this,

Starting point is 00:17:21 then other teams at Facebook figured out, oh, this is technology that is disruptive. This is not like a 30% better technology. You see what I'm saying? 500 node HBase cluster getting replaced by 32 node ROX to be classed. Orders of magnitude. Orders of magnitude.

Starting point is 00:17:38 This is what I mean by step function. When a step function is like this, we can show to developers. Everybody's very up to date and understand technology well. They say, oh, yeah, this is very different from previous generation software. So then immediately there are like five other use cases and then more people in the team. But it's the first six, nine months of, or maybe close to a year, is when you are mostly working on a belief on saying that, okay, if I build this by leveraging these things, it will be far better.

Starting point is 00:18:12 And then I think after that, it was like a kind of a self-drive. Good engineers joined the team because they think, oh, this is great technology. They're excited. Yeah, it's like a self-fulfilling prophecy after a while. Going from zero to one is the hardest, I think. Can you describe in building RockCB, you know, there's sort of two dynamics driving it, like the limitations of HBase and then sort of the ability to dream about what you could do without limitations. How did you balance those two drivers?

Starting point is 00:18:52 Were you more focused on what was possible or were you more focused on overcoming limitations? I think, how should I say it? So it's like when you look at HBase, I think it's a great product at the time, but it feels like, but when your surrounding changes, then I think you need a different kind of product to evolve along with your surroundings. So, this is what I

Starting point is 00:19:16 mean by when the hardware changes, it's like, so, HBase was built for hard disks. So, what the optimum is seek times on the disk. 10 milliseconds is your seek time on a disk. Whereas in SODs, there's no seek time. Everything is random. You can get like microsecond latencies from flash drives.

Starting point is 00:19:34 So this is what I mean. I think I tried to understand the limitation of the older system. And I tried to look at what does the new hardware offers me and how can I leverage this to build higher layer applications on the stack. I think it's both sides of your question essentially. Sure. It's about overcoming limitations as well as kind of dreaming saying that if I can overcome these limitations because the hardware is helping me, I can actually enable all these type of applications that were not previously possible.

Starting point is 00:20:09 I think both of them. Hadoop really made big data storage possible. Before that, it was not possible at all. And then RocksDB and more low latency query engines let you really store fast or let you really access data fast from SSD based devices and enable all these data applications that are out there nowadays. Yeah, I can give it also data applications also if you like, but yeah. That'd be great. I mean,

Starting point is 00:20:40 who's running RocksDB or running sort of their applications on RocksDB or running sort of their, you know, their applications on RocksDB? So RocksDB is a key value store. It's a very, like, high performance, low latency C++ backend. So Hadoop, HBase, I wrote a lot of Java code because those are good systems as well. But then when I try to focus on performance and low latency, move to like C++ and build RocksDB. So RocksDB, so Facebook News Feed, which is your Facebook app that you use, every time you update, that's served from a

Starting point is 00:21:15 RocksDB-based backend. Similarly, a lot of data platforms inside Facebook, which also deals with a lot of analytics, is also RocksDB-based. Open source-wise, I think Kafka uses, Kafka Streams uses RocksDB internally. Flink uses RocksDB internally. LinkedIn Feed, I think, also uses RocksDB internally. Again, some of

Starting point is 00:21:36 the blog posts, this is where I have learned these things. It's not like I have proprietary information there, but there's a whole bunch of companies now who uses RocksDB inside their own software. And of course, at Rockset, which is where I currently work, we use RocksDB a lot because we do data analytics, which is all focused on real time analytics. And RocksDB is kind of our building block. So we have something called rocks db cloud

Starting point is 00:22:06 it's an open source project as well it's a sub project of rock set and it lets you run rocks db well on the cloud platform so the reason we do that is because rock set is a purely a cloud-based service and all our data we store using rocks db because RocksDB is essentially like a very powerful indexing engine. So use RocksDB as a indexing engine for all these analytical data sets. And that's the reason why we could serve low latency queries. I mean, that's one of the reasons, not everything. But one of the reasons is that, yes, we can fetch data from large data systems quickly enough using the RocksDB indexing engine. Yep. Makes total sense.

Starting point is 00:22:50 Two more questions because I've been monopolizing the microphone and I know Kostas has a bunch of questions. One is technical and then one is about your time at Facebook. The technical question is, what was the most difficult challenge you faced in building RocksDB?

Starting point is 00:23:09 Good question. So I think when you're building infrastructure, right? Like data infrastructure or any kind of infrastructure, right? The question is always about price performance. It's not about performance, right? It's like, think about it. It's like the pipes in your building, right? Like the water

Starting point is 00:23:28 pipes in your building. They have to do, have to sustain some pressure. They have to be cost efficient because you don't want to spend too much money on a lot of these pipes. Without water, it's not functioning in your building, right? Same thing with a lot of this infrastructure.

Starting point is 00:23:44 I think it is price performance which matters. It's not about just functionality and features. So the focus, again, I was at Facebook and there we are talking about scale, right, at scale. We can't build something. Building infrastructure the first time is easy.

Starting point is 00:24:00 It's to build it at scale is the challenging part in my mind because it's it. So the biggest challenge was how can you make it efficient and cost effective and kind of leverage or extract everything we can to make sure that you can get low latency queries, huge number of QPS, and make sure that the hardware you're running on is the cheapest hardware so that you don't have price performance challenges. So I would think that measuring, performance measuring, benchmarking, iterating, making sure that it does power real-time analytics, those kind of were the challenges. It's not one thing.

Starting point is 00:24:41 It's a series of things, but it's all focused on performance. So, yeah, performance is kind of the key differentiator for OxDB compared to every other key value stores or databases that are out there. Yep. Love it. Okay. Last question. This is about your time at Facebook. Do you have any fun stories about interactions with Mark Zuckerberg?

Starting point is 00:24:58 You know, because you were there when it was 200 or so in play. So you had to be in a meeting with him at some point if you were working on sort of core data infrastructure. Yeah, I mean, obviously the first time was the interview session. So because he was a person who is very handsome, right? Yeah. He knows everything. I mean, at that time, at least he knows everything. So but then over time, I think I really liked it. The fun part in my mind was that there are very few people that I know who have a great sense of technology and product.

Starting point is 00:25:31 So I think Steve Jobs is obviously one I read about him as well. But this is another person that I've seen from close. And I think there are very few people like that who have great technology interest and understanding, but also understand products so well. It's amazing. Yeah. I mean, there are a lot of fun stories. Otherwise, yeah.

Starting point is 00:25:52 I mean, like that time, Facebook used to be like a really small company. So, yeah. So we used to be Palo Alto downtown. There were like seven buildings. I was below a Quiznos. So I'll be like, dungeon. And then because you get all the Quiznos was like a sandwich place.

Starting point is 00:26:11 I mean, you know. Yeah, you could like smell the food cooking next door. Exactly, yeah. So it was a lot of fun. Yeah, very cool. I'm sure it was just so inspiring to work with someone like mark zuckerberg okay costas please jump in here i could keep going but please jump in yeah it's like you can't like it's such an interesting conversation like i really enjoy like listening you're too like talking

Starting point is 00:26:40 okay i have a question though and like i don't know it's I found like super interesting that there's like a pattern in what you were saying like we had one storage revolution or evolution and a new

Starting point is 00:27:01 software technology came out of this then we had the next one. We went from SATA to SSDs. We ended up having Rogues DB and this whole family of database systems that take advantage of this new storage. what do you expect? Do you have a prediction about what's going to be the next evolution in hardware and storage that might trigger another evolution like that? It's awesome that you're asking me this. Because I think this is the reason why I started Rockset. So what happened at least in 2015 and 2016,

Starting point is 00:27:42 I really saw that the cloud is becoming really popular. The cloud, I think, is a different piece of software. I mean, hardware, right? The reason cloud is a different piece of hardware is because you can provision new hardware by using a software API. You know what I'm saying? So give me a thousand machines.

Starting point is 00:28:02 There's a software API to get a thousand machines. In the old times, you'd have to provision, you'd have to get set up racks and put a data center, right? So the reason I'm really excited about this new phase that I'm working on is because the cloud has become really popular. And the cloud is the third type of hardware change that I have seen now in my lifetime, right? Like first SATA disks, then SSDs, now it's the cloud.

Starting point is 00:28:27 And what is different in the cloud is that you could provision hardware by using a software API. You could get machines, you could get CPU, you could get storage, whatever else, right? So this is kind of the vision for Rockset, is that how can we build a cloud database? The primary reason why Rockset is price performance. Again, price performance is my key for every software that I'm trying to build. The reason Rockset is best price performance is because it's built natively for a cloud. It's not something that you download and install on your data centers, on your machines. Take, for example, this database

Starting point is 00:29:07 compared to all other databases. We have complete segregation of storage and compute. So it's a database where storage and compute is together. And the fact that Dropset can separate these two out is great for applications because that's, if you have a lot of data, you need more storage. if you have a lot of data, you need more storage. If you need a lot of queries, you need more compute. But it gives you, without giving, without being slow is the problem, right? Like if you disaggregate, many other systems are there where if you disaggregate, your queries are slow.

Starting point is 00:29:38 But the key for us is that how can you build a disaggregated system, but the queries are faster than existing systems that are out there. So that's one. And the second one that I see, the change is that almost everybody is moving from stale analytics to real-time analytics. Like if you look at, say, EMR, right, AWS EMR, or even Snowflake. They're all about state analytics. Can I get data 15 minutes ago and run some analytics on it? Whereas for Rockset, it's all about real-time analytics. How can I look at data that just got produced a few seconds earlier or a few minutes earlier and take an action on it? It's not people who is taking an action.

Starting point is 00:30:21 It's other data software that's taking action on the data. So, yeah, I mean, I think those are the two trends that I see, the hardware change about the cloud and the market is just right for evolving data systems to produce new features or new facilities for applications. So would you say that like Rockset is, let's say, like a piece of infrastructure that someone would use to build other software or it's closer, let's say, to like data warehouse where people will go and like use it to do like even in real time. Right. But like still like ad hoc analytics or like reporting how did you position like the product itself like this landscape to be honest is like pretty crazy right like there's so many things happening so that's why i'm asking no great question yeah so what happens is that this is

Starting point is 00:31:17 again the trend that i see right had and other systems, they made data analytics or analysts or quants look at data, more analog analysis. But I think what is happening now is that it is software who is using this data. I'll give you examples, right? Take, for example, we have a use case where, like I said, the largest payment system is using, largest payment system, microfinancing system in europe they're using rock set they're getting events from all the transactions that they're doing right but then they want to quickly figure out which events or which payment systems are fraud

Starting point is 00:31:56 or scam so they need to take action if the quarantine that action within a few seconds versus a minute or so this this saves a lot of money. You see what I'm saying? That's one example. And again, this is an application that is running. No analyst are sitting and making a lot of queries on Rockset. Another one, we have a good,

Starting point is 00:32:16 a big airlines who is using Rockset. That airlines is doing, so when you buy an airlines ticket, your price of the ticket is different on different days. And so now they take feedback on demand and supply to figure out what is real-time ticket price. Again, people are querying this data, like travel agents or whatever, to buy tickets at the end of the day. But the back end is the one that is querying systems like Rockset to figure out what is the current price of this ticket when somebody is moving,

Starting point is 00:32:50 flying from one place to another. It's all about automatic systems making queries on datasets. It's not about manual people doing ad hoc queries and figuring out. Those are also there for Rockset because Rockset has a SQL interface on Rockset. So what Rockset is, it's a RockDB-based database, but we have a SQL interface on it. So you can do standard SQL using join, segregations, group by, sort, everything else. So people find it easy to use because RocksDB is a C++ backend, right? Not everybody, not every data person or a developer can write c plus code or should so we have a very standard sql over rest api and easy to use but you get the power of rocks db on the back end so you kind of marriage of both sides answer your question

Starting point is 00:33:39 about how like where the usage is become like some some follow-up questions on that, to be honest. Okay, consider, let's say, some other real-time data processing systems like Druid or ClickHouse. What's the other one? Pino. These are the ones that really come in my mind right now. How is this difference like between these systems? Good question. Yeah. So what happens is that Druid, I think the Druid project started probably in 2008 or

Starting point is 00:34:16 9. I mean, it's been around for a while. Yeah. So what happened is that for both Druid and Pinot or even Snowflake for that matter, right, they all leverage the thing about columnar organization of your data. This means that if your record has 50 fields, they store every field in its own column. And then the processing is that can I scan this column as quickly as possible? Right. So all of those systems are column based systems. And query is all scan based, which means how can I parallelize my query and scan every column as quickly as possible? So that works when there is ad hoc queries. Right.

Starting point is 00:34:57 But now when your QPS increases, let's say analyst is making a query. I mean, he can probably make a query once every 10 seconds or whatever. But when a software is making queries, the QPS of the system is high. So let's say there is 5 QPS, 10 QPS, hundreds of QPS. So just imagine the amount of compute you need to keep scanning this data set again and again for every query. You see what I'm saying? If you're looking for a word in your book, if you look at, scan the 500-page book,

Starting point is 00:35:28 it takes you so much energy and time to find the string you're looking for. It's just going to the end of the book and looking up the index on the book and say, oh, this is the string I'm looking for. So Rockset is built with an indexing technology and not a scan-based technology. Okay. It means that

Starting point is 00:35:43 when a query comes in, we don't need to go scan all the data again and over again for every query. We leverage the index very efficiently to figure out where the data that is matching the query exists and return. Okay. Basically, the difference between Rockset and all other systems,

Starting point is 00:36:02 which includes Druid, Snowflake, Pino, everybody else, some of these systems are trying to think about building an index now, or can I make an index manually? But for Rockset, every field is indexed. We call it the converged index. So our converged index is the differentiator. Why our queries are fast versus other scan-based approaches that are out there. So that's one.

Starting point is 00:36:28 And the second one is that we work only on the cloud versus, you know, and ClickHouse and everything else. These are all pre-cloud software, which basically means that storage and computer are together. It's separate of the storage and compute. If you need to, just like Hadoop, like nothing wrong with it, but it works well when you have your own data centers with your own machines. In the cloud system, I think it's super important to be able to separate compute and storage.

Starting point is 00:36:56 That's the only way for you to scale up and be cost effective. So none of the other systems can give you desegregation. Rockset can separate query, compute, storage compute, and because it's a real-time system, it also segregates in just compute. So the three-way desegregation is compute needed for writes, compute needed for queries, and storage needed to store your data. So Rockset is essentially a three-way disaggregated architecture, which is why you get the best price performance if you use Rockset.

Starting point is 00:37:28 Again, you could do similar things on ClickHouse or Snowflake, but the price performance would be very worse compared to the one if you use Rockset. Basically, the difference. Am I able to explain it? Yes, 100%. Okay, and you mentioned, like, when you started talking about RocksDB, like, the first thing that you said, it's like a key value store, right? How do we go from a key value store to that index that you're talking about?

Starting point is 00:37:55 Because you don't necessarily need to have an index to have a key value store, right? Like, key value store is exactly not. Like, it's like a hasma, right? Like, I have, like, a key, and I want to go it's like a hasma right like i have like a key and i want to go and like pull the the information like based on the key that i have so tell us a little bit about that and like how did this like is this first of all like part of rocks db or this is like more part of rock sets ah great question so rock set uses rockssDB to build an index, right? But it is, I mean, there's precedence of doing it, and I saw it is being done at Facebook as well.

Starting point is 00:38:32 So, I'll give you an example, right? In Facebook, we used to use RocksDB for storing the social graph. Like, let's say this is the username and these are the post IDs, right? Social graphs. But then very quickly, at Facebook I'm talking about, how you use RocksDB is that we also want an index based on the geolocation. You see what I'm saying? You want to go to a

Starting point is 00:38:53 geolocation saying Golden Gate Bridge is the key. Who all has posted photos in the Golden Gate Bridge? Now that's a secondary index on the social graph. Or an application uses RocksDB to build a secondary index on the entire Gate Bridge. Now that's a secondary index on the social graph. An application which uses RocksDB to build a secondary index on the entire social graph. I'm talking about 20 petabyte of social graph data. We use RocksDB to build a secondary index to be able

Starting point is 00:39:15 to serve queries like, show me all my friends who visited this location between these two dates. So this is what kind of inspired us to build Rockset as well. So in Rockset, people actually have data that is stored in RocksDB, but we also use RocksDB to build a secondary index on every field so that when your query is coming, you can ask arbitrary questions to the data and all queries are fast. Other systems like, take for example, ClickHouse, for example, right? If you use ClickHouse, that's good if you're making a query,

Starting point is 00:39:47 but once you want to make change in a query and you say, I need to add these filters, now you'd have to go talk to the ClickHouse database administrator or whoever is managing the database saying that, I'm going to make this query now. Can you create a secondary index on this column? Or can you denormalize this data so that I know my queries will have this additional filter? In Rockset,

Starting point is 00:40:07 everything is pre-built for you. So you can actually make queries on any of these systems or any of these data set on any fields without having to re-ingest this data or make the cost of indexing so cheap that people don't think more about

Starting point is 00:40:23 indexing cost. Prior generation databases, they think that indexing is costly, right? But because of cloud-friendliness and separation of compute and storage, we can build indices really cost-effective. Using Rockset, you don't have to think, shall I build an index or is it going to be prohibitively costly? That is kind of the change in thinking in a developer's mind of when and how to use

Starting point is 00:40:50 Rockset. That's super interesting. And, okay, I have like a, I don't know, maybe it's a bit of like a dumb question, but in my mind there's always like a trade-off when you are indexing in terms of like how fast you can write

Starting point is 00:41:08 your data right like yes as you start like adding more processing during the ingestion process the slower like the process is going to be right and that's like i think you have systems that are extremely, let's say, optimized for writing. And I think RocksDB is an example of that. You can literally build a data system based on RocksDB that's going to be extremely fast in writing data. But if you start adding indexing there, and you want to keep latenciesencies low and you want to have also a really fast ingestion and at the same time being able to serve the indexes to your users to use them, how do you do that? How do you balance and do the right trade dose there? Because at the end, that's what engineering is, right? Figuring out the right trade dose. So how do end, that's what engineering is, right? Figuring out the right trade

Starting point is 00:42:06 dose. So, how do you do that? Exactly, yeah. I think so you're absolutely right. Indexing basically means that you need more compute when you write data because now every byte that you write needs to be indexed, right? So, the

Starting point is 00:42:21 fact that, so let me explain to you why it is easier or cost-effective to do it in RocksDB. So, RocksDB is a LSM engine. It's a load-structured merge tree. So, unlike a B-tree or a binary tree or whatever else that other databases use like Postgres or MySQL, right? So, for prior generation systems, if you do a write, it needs to do a, the database will go read a page from the storage and then update it and then write it back. So, there's a read-modify write for every write that's happening. Whereas, for an LSM engine like RocksDB, when a

Starting point is 00:42:56 new write happens, it all goes to a new place on the disk. It doesn't go and overwrite stuff. So, the write rates are similar to what you see on a disk device or an SSD device. If an SSD device is able to do 500 megabytes per second writes, RocksDB can keep up with it as long as you have some compute associated with that storage device. You see what I'm saying? So it's very different write rate compared to the binary tree that most databases used earlier. So that's one thing that Rockset uses a lot, which basically means that we can write at RocksDB speeds. And the other thing is that in Rockset,

Starting point is 00:43:31 we have the way we shard is different from most databases. So if you use HBase or if you use Cassandra or other database systems, when an update happens and you need to build indices, the update will be here, the indices will be on different machines, so you need Paxos, Raft, or some other protocol to be able to keep all the machines in sync.

Starting point is 00:43:51 So Rockset doesn't do that. Rockset, what it does is basically it's document sharded. So a document goes to one machine on the cluster, and all the secondary indices are built in that node. Secondary indices are not spread out among other machines. So writes don't need Raft or any other back-source algorithms. It all goes to one machine. So these two reasons is why we can sustain high write rates. I'm talking about like, say, 500 megabytes per second write rates on

Starting point is 00:44:19 data systems that we have, which is constantly indexing and storing data in Rosti. And you don't have to kind of provision saying that because again, because in the cloud it's you don't have to provision for peak capacity, right? Because we can get machines when needed. So this is why this is now economically feasible for users. You see what I'm saying? In the old times, it is not economically feasible because I had to provision for peak capacity and I had to buy all my machines and keep it there when my highest ride rate happens. But in the cloud, that's not true. Maybe your highest ride rate happens from 9 to 4 o'clock in the daytime and all the other times, like let's say you were looking at market data or something like that, right?

Starting point is 00:45:01 Half of the time your market is not alive or like whatever the stock market. So many other reasons is why these kind of indexing technologies are becoming cost effective at scale now. Yeah, makes a lot of sense. That was super informative. All right. So what next? And when I say what's next, it's two questions, actually. What do you see, like, as next for the industry overall? And what is next also for Oxit? What's the next thing that you really anticipate, like, to see, you know, going live on the product? Yeah. For Oxit, again, it's – so let me know, going live on the product? Yeah.

Starting point is 00:45:46 For Rockset, again, so let me answer your first question, the second question first, right? Because then I can explain where the industry, I think the industry is going. So Rockset, we are a cloud service, so we're constantly making improvements to our backend and shipping new products. The thing that excites me the most is that

Starting point is 00:46:06 most data systems I feel currently are not great at giving isolation of different applications running on the same data sets. You see, I have hands-on experience with Hadoop and Kafka and all these other open source technologies. The good technologies, no way, but when you want to use it for five different applications on the same topics or the same database, it is very difficult to do. So this is something that Rockset is innovating a lot, where you could have one storage, one database, but you could have five different applications. One is like, say, a real-time ticketing application. One is a fraud detection application. One is a marketing application running on the same database, completely separate

Starting point is 00:46:54 compute engines, but they all see live data that is changing in the database without the customer having to copy data from one place to another and make sense out of it so some multi-tenancy and ability for different apps to leverage these large data sets with the least amount of complexity is what we are kind of innovating on and on the rock set side as far as the industry is concerned i think how, how should I say it? I would say that real-time is very addictive for most applications. It's like real addiction in my mind. Like what we have seen is that when customers use a real-time system, they cannot go back to a stale analytics system.

Starting point is 00:47:39 They can feel the difference of – it's like tasting sugar for the first time, right? If you – there's some restaurant I went the other day. I won't name the country because I don't want to say bad things about the country, but that country did not have sugars until 1876 or until the English people went there. So then suddenly everybody got addicted to sugar. But yeah, real-time analytics is like that. I think most people are used to state analytics saying okay i got to wait for one hour to figure out what to do next how to make my business better but

Starting point is 00:48:12 once you taste this it just sticks with you and i think a lot of applications data applications starting from your food delivery to your book shipping or whatever else, everything is more and more real time. And I feel like data is just transforming everything that we have in the world. Data is powering. I mean, I don't want to be the cliche, but data is the new oil and all this stuff I keep sharing. This is coming true in my mind. A lot of automation being built on the data sets. And it's not people making decisions anymore. It's some other piece of software that's making built on the data sets. And it's not people making decisions anymore.

Starting point is 00:48:45 It's some other piece of software that's making decisions on the data. And this is what Rockset and real-time kind of applications are driving more and more into this area. Yeah, 100%. All right, one last question from me, and then I'll give the microphone back to Eric. So RocksDB is like I don't know, like one of the most successful open source projects out there, right? Like it's phenomenal, like not just the

Starting point is 00:49:14 use of it, but how much has been used to build other technologies on top of that. Like you mentioned a few. I mean, I think like a testament to this is like if someone goes to GitHub and see like RocksDB there, yeah, you can see like

Starting point is 00:49:28 the thousands, the tens of thousands of like the stars. But what is like so impressive is like how many clones exist, like how much has been cloned, right? Which means like people are like working on it.

Starting point is 00:49:40 What in your opinion, outside of, okay, obviously like there's something revolutionary about the technology itself, right? But what else do you think that contributed to the success of RocksDB

Starting point is 00:49:54 as a technology? I mean, as a project and, like, the adoption and all that stuff. Outside of, like, okay, the technology itself. I think it's the people and the funding. I think no software becomes useful unless there are good people working on it. And nothing happens in the world nowadays

Starting point is 00:50:11 by one or two people. You need a good set of great people to band together to build this software. That's the first one, I think. And the second one is that you need some kind of support so that the community and software grows along with it. And I think Facebook provided that a lot, especially testing frameworks,

Starting point is 00:50:31 especially like leveraging many other systems that were there at Facebook to make RocksDB better, toolings to be able to figure out, find bugs quickly. A lot of those, basically, again, I think there are two things that are, I think, make a lot of these open source projects successful, right? One is the people. If you can assemble 20 great people to build a project, I think that'll be a fantastic project. And then the next one is that is there a force behind this community so that it can move forward? I can see that happen with many other open source communities as well.

Starting point is 00:51:03 I mean, Hadoop community, I still participate in it, but I don't write Hadoop code much anymore. But I see that the community is very big there as well. So there's, so yeah. So I think open source is an interesting, I know you were at Starburst, so there's a lot of open source development there as well. And it's, I think the open source community kind of feeds with one another. So it's kind of a, it's a good

Starting point is 00:51:27 cycle to kind of participate and make things better. So yeah, that's the answer about Rocks TV. Yeah, 100%. Eric, all yours. Yes. Okay, one philosophical question and one practical question to close out the show. The philosophical question, so you mentioned, you know, more and more real time is having an impact and there are machines making decisions. You've seen this on a much closer level from sort of a, you know, deep in the stack data perspective. Is there anything that worries you?

Starting point is 00:52:08 Or maybe a better way to phrase the question is, do you have thoughts on how we steward this technology as we implement it? Because people have lots of opinions on machines making decisions, and the technology is obviously enabling that. So thoughts? I mean, I know, like, take, for example example all the drone systems that are out there, right? There is a good feedback system of real-time analytics of what the effect of the drone stuff is when you, like, put bombs somewhere

Starting point is 00:52:35 and things like those, right? Yeah. So, there is real-time, I think, changing the world, like we kind of discussed. I think there is a fine balance between how to channel that thing for greater good. There's always 10% or some few percentage of usage which are probably not the most ideal for humanity. It's just like the atomic bomb. I think it produces a lot of energy now, atomic, right? I think it produces a lot of energy now,

Starting point is 00:53:05 atomic energy, but I think we would be able to leverage it well, I feel like. We have lots of bells and whistles, a lot of our automatic processes that we build always have fallback mechanisms that it works within a band. It doesn't just go

Starting point is 00:53:21 haywire and ruin everything. So I think the human mind is still at the top of the food chain. So I think there's 100 years before all the automatic decision-making becomes life-threatening or anything. I think it goes back to what you said about people, right? The right people behind the technology, I think, is the most important thing. So thank you for entertaining a somewhat ethical question. OK, last question, which is more practical.

Starting point is 00:53:52 You have seen such a really built on such a wide swath of data technologies, you know, even reaching back into the days of making architectural decisions based on hardware, which I think a decent proportion of our listeners, you know, sort of to them that, you know, they won't ever have to make those decisions because, you know, Rockset has made those decisions for them in the cloud. Right. And so they can just sort of scale without, you know, without thinking about it. And now you're building for the cloud. When you mentor younger people in the data industry, what do you tell them, you know, sort of about how to think about their career and how to think about data? Because you bring such an immense amount of perspective of the history. And so how do you package that into advice? One of my core philosophies, again, I think is to add value to somebody's life. I really don't care whether it is monetizable or like something where you can make money about. But I feel like if you can add value to somebody else's life, then automatically as a side effect, you make the ecosystem better.

Starting point is 00:55:03 You probably make things better. But as if you're starting off in a data carrier in the early parts of the carrier, I think the focus is always how can I build something or how can I do something that adds additional value to somebody else? Somebody else could be peers in your team, right? That's a great thing as well. It could be customers if you're selling stuff or it could be just plain users like open source software, right? There's no customers,

Starting point is 00:55:32 they're mostly users. So as long as I think you are focusing on building value, I think you get into this cycle of becoming more impactful yourself and enjoying the work at the same time. So work doesn't become work. Work becomes like more enjoyment because you're adding

Starting point is 00:55:52 value. You see people liking your stuff and you build more of it. I'd probably give you a meta answer. I mean, this is applicable to whatever industry you are. It doesn't have to be software or anything else. Whatever you are doing. I feel like if you're building value to somebody else's work or somebody else's product or life, I think that's a great thing to be, as long as you're also enjoying the work that you yourself are doing. Yep. No, I think that's not only wonderful career advice for people working in data, but just wonderful life advice. So, Dhruva, thank you so much for that.

Starting point is 00:56:31 And thank you for being so generous with your time. And amazing to talk with you, the builder of some of the highest impact technologies that we see driving a lot of the things we use every day. So thank you so much for giving us more of your time. Hey, thank you. Thanks a lot, Eric and Costas. It was really good chatting with both of you. Well, I don't think many people know that, you know, RocksDB was originally fueled by the smell coming from a Quiznos that was baking subs next to the Facebook office.

Starting point is 00:57:05 But now you have the backstory brought to you exclusively on the Data Stack Show. Now, I think it was great to hear him talk about, you know, Facebook's sort of first office and being in the basement. So many interesting things that we covered. I was just so impressed with druva just has maintained a i would say like just a high level of interest and joy in the space and in building things after he's seen so much i mean it's cool to hear the stories but you have to imagine that the day to day of trying to build that stuff and scale that stuff inside of an organization that's growing like, you know,

Starting point is 00:57:48 Facebook, you know, being a founder. I mean, those are really intense experiences and he still seems just, you know, full of joy and energy. And that brought me a lot of joy. So I think that's the main thing I'm going to take from the episode. How about you? There are many takes, but I'll keep for sure like one, which is how much of the innovation that we have seen in software is actually triggered by innovation in hardware.

Starting point is 00:58:24 This is one of the insights that I don't think that you can get from someone unless this person has been in this world a while and doing the stuff that we were talking about today. I think discussing about storage and how storage actually how, like, storage actually, like, dictates, like, the things that we can do and how this, like, works with Hadoop and how, like, then the SSDs, like, brought, like, ROPS know, like, I think I'll be looking into what new storage technologies are going to be coming in the next, like, months and years, like, with much more interest now than before. So I'll keep that. I'll keep that. And let's arrange to have some back soon. We will.

Starting point is 00:59:23 What to talk about. Well, thanks for listening. Many more great shows coming up. Subscribe if you haven't, and we will catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com.

Starting point is 00:59:46 That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.

The Data Stack Show - 136: System Evolution from Hadoop to RocksDB with Dhruba Borthakur of Rockset

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.