Screaming in the Cloud - Data Analytics in Real Time with Venkat Venkataramani

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and it's spelled R-E-V-E-L-O.

Starting point is 00:00:39 It means I reveal. Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Ravello has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to basically those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes but isn't limited to talent in Mexico, Costa Rica, Brazil, and Argentina.

Starting point is 00:01:20 Now, not only do they wind up spreading all of their talent on English ability as well as, you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States. So you can hire full time remote engineers who share most of the workday as your team. It's an end-to-end talent service. So you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Ravello handles all of it. If you're hiring engineers, check out revelo.io slash screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming. This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into

Starting point is 00:02:21 production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this.

Starting point is 00:02:41 To learn more, visit launchdarkly.com and tell them Corey sent you and watch for the wince. Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted guest episode is one of those questions I really like to ask because it can often come across as incredibly, well, direct, which is one of the things I love doing. In this case, the question that I am asking is, when you look around at the list of colossal blunders that people make in the course of careers in technology and the rest, it's one of the most common is, oh, yeah, I don't like the way that this thing works, so I'm going

Starting point is 00:03:23 to build my own database. That is the siren call to engineers, and it is often the prelude to horrifying disasters. Today, my guest is Venkat Venkat Armani, co-founder and CEO at Rockset. Venkat, thank you for joining me. Thanks for having me, Corey. It's a pleasure to be here. So it is easy for me to sit here in my beautiful ivory tower that is crumbling down around me and use my favorite slash the best database imaginable, which is text records shoved into Route 53. Now, there are certainly better databases than that for most use cases. Almost anything, really, to be honest with you, because that is a terrifying pattern. Good joke, terrible practice. What is Rockset as we look at

Starting point is 00:04:11 the broad landscape of things that store data? Rockset is a real-time analytics platform built for the cloud. Let me break that down a little bit. I think it's a very good question when you say, does the world really need another database? Don't we have enough already? SQL databases, NoSQL databases, warehouses, lake houses now. in the 80s was when people actually retired pen and paper records and started using a relational database to actually manage their business records and what have you instead of ledgers and books and what have you and that was the digital force digital transformation that was an oracle called their rose in a table records for a reason they're called records to this date and then you know 20 years later when all businesses were doing system of record and transactions and transactional databases, then analytics was born. This was the whole reason why I wanted to make better data-driven business decisions.

Starting point is 00:05:17 And BI was born, warehouses and data lakes started becoming more and more mainstream. And there was really a second category of database management systems, because the first category was very good at to be a system of record, but not really good at complex analytics that businesses are asking to be able to guide their decisions. Fast forward 20 years from then, the nature of applications are changing. The world is going from batch to real time. Your data never stops coming. Advent of Apache Kafka and technologies like that, 5G, IOTs,

Starting point is 00:05:50 data is coming from all sorts of nooks and corners within an enterprise. And now customers and enterprises are acquiring that data in real-time at a scale that the world has never seen before. Now, how do you get analytics out of that? And then if you look at the database market, entire market,

Starting point is 00:06:07 there's still only two large categories of databases, OLTP databases for transaction processing and warehouses and data lakes for batch analytics. Now, suddenly you need the speed of OLTP at the scale of batch, right? In terms of like complexity of compute, complexity of storage. So that is really why we thought the data management space needs a third leg.

Starting point is 00:06:28 And we call it real-time analytics platform or real-time analytics processing. And this is where the data never stops coming. The queries never stop coming. You need the speed and the scale. And it's about time we innovate and solve the problem well. Because 2015, 2016, when I was researching for this, every company that was looking to solve, build applications

Starting point is 00:06:48 about real-time applications was building a custom Rube Goldberg machine of sorts. And it was insanely complex. It was insanely expensive. Fast forward now, you can build a real-time application in a matter of hours

Starting point is 00:07:01 with the simplicity of the cloud using Rockset. There's a lot to be said that the way we used to do things after the first transformation and we got into the world of batch processing, where in the days of punch cards, which was a bit before my time, and I believe yours as well, where they would drop them off and then the next day or two days, they would come back later. And after the run, they would get the results only to figure out syntax error

Starting point is 00:07:23 because you put the wrong card first or something like that. And it was maddening. In time, that got better, but still, nightly runs have become a thing, to the point where even now, by default, if you wind up looking at the typical timing of a default Linux install, for example, you see that in the middle of the night is when a bunch of things will rotate, when various cleanup jobs get done, et cetera, et cetera. And that seemed like a weird direction to go in. One of the most famous Google April Fool's Day jokes was when they put out their white paper onto MapReduce, and then Yahoo, Felford, Hook, Line, and Snicker built out Hadoop, and we've been stuck with this idea of performing these big query jobs on top of existing giant piles of data where ideally you can measure it with a wall clock.

Starting point is 00:08:11 In practice, you often measure it with a calendar in some cases. And as the world continues to evolve, being able to do streaming processing and understand in real time what is going on is unlocking different approaches, at least by all accounts. Do you have an example you can give me of a problem that real-time analytics solves for a customer? Because I can sit here and talk all day about how things might theoretically work, but I have to get out of my Route 53-based ivory tower over here. What are customers saying? That's a great question, and I 100% agree. I think Google did build MapReduce. And I think it's a very nice continuation of what happened there

Starting point is 00:08:51 and what is happening in the world now. And they built MapReduce and they quickly realized re-indexing the whole world every night as the size of the internet is exploding, it's a bad idea. And you know how Google indexes now? They do real-time indexing. That is how they index the web. And they look for the changes that are happening in the internet, and they only index the changes. And that is exactly the same principle behind one of the core principles behind Rockset's real-time analytics platform. So what is a customer story? So let me give you one of my favorite ones. So the world's number one or number two buy-now-pay-later company, they have hundreds of millions of users.

Starting point is 00:09:28 They have 300,000 plus merchants. They operate in like maybe 100 plus countries. So many different payment methods. You can imagine the complexity. At any given point in time, some part of their product is broken. Oh, Apple Pay stopped working in Switzerland for this e-commerce merchant. Oh God, like we got to first detect that. Forget even debugging and figuring out what happened

Starting point is 00:09:52 and having an incident response team. So what do they do as they scale the number of payments processed in the system across the world? It's like in millions. First it was millions in a day and it was millions in an hour. So like everybody else,

Starting point is 00:10:06 they built a batch-based system. So they would accumulate all these payment records every six hours. So initially it was a day and then afterwards, you know, you try to see how far I can push it

Starting point is 00:10:17 and they couldn't push it beyond every six hours. Every six hours, some batch job would come and process through all the payments that happened, have some statistical models to detect, detect hey here are some of the things that you might want to double click and follow up on and as they were scaling the batch job that they would kick off every six hours was starting to take more than six hours so you can see how the story goes

Starting point is 00:10:41 now fast forward they came to us and say, it's almost like Roxette has like a big red button that says real time this. And then they're kind of like, can you make this real time? Because not only that we are losing millions of potential revenue dollars in a year, because something stops working, and we're not processing payments. And we don't find out about that up to like, three hours later, five hours later, six hours later. But our merchants are also very unhappy. We're also not able to protect our customers' business because that is all we are about. And so fast forward, they use Rockset and simply using SQL, now they have all the metrics and statistical computation that they want to do happens in real time that are accurate up to

Starting point is 00:11:22 the second. All of their anomaly detectors run every minute, and the anomaly detectors take hundreds of milliseconds to run. And so now they've cut down the business observability, I would say. It's not metrics and machine observability. It's actually they have now a business observability in real time. And that not only actually saves them a lot of potential revenue loss from downtimes, that's also allowing them to build a better product and give their customers a better experience. Because they are now calling their merchants and their customers that something is not working in some part of your e-commerce footprint before even the customers notice that

Starting point is 00:12:01 something is wrong. And that allows them to build a better product and a better customer experience than their competitors. So this is a very real-world example of why companies and enterprises are moving from batch to real-time. The stories that you and, frankly, a lot of other data analytics companies tend to fall back on all the time has been stories of the ones you're telling,

Starting point is 00:12:26 where you're talking about the largest buy-now-pay-later lender, for example. These are companies operating at massive scale who have tremendous existing transaction volume, and they're built out already. That's great, but then I wind up trying to cut to the truth of some of these things. And when I visit your pricing page at Rockset, it doesn't have what I would expect if that were the only use case. And what that would be is, great, call here to open up a sales quote and we'll talk to you, et cetera, et cetera, et cetera. And the answer then is, okay, I know it's going to have at least two commas in it, ideally not three, but okay, great. Instead, you have a free tier where it's, hey, we'll give you a pile of credits. Here's some limits on our free account, et cetera, et cetera. Great. That is awesome. So it tells me

Starting point is 00:13:17 that there is a use case here for folks who have not already on some level made a good show of starting the process of conquering the world. Rather, someone with an idea some evening at two in the morning can wind up diving in and getting started. What is the Twitter for pets in my garage, spare time side project story for using something like Roxette? What problem will I have as I wind up building those things out when I don't have any user traffic or data yet, but I want to, you know, for once in my life, do the smart thing in advance rather than building an impressive tower of technical debt. That is the first thing we built, by the way. When we finished our product, the first thing we built was self-service.

Starting point is 00:13:56 The first thing we built was a free-for-every tier, which has certain limits because somebody has to pay the bill, right? And then we also have compute instances that are very, very affordable that cost you like approximately $1 a day. And so we built all of that because real-time analytics is not a need that only like the large scale companies have. And I'll give you a very, very simple example. Let's say you're building a game.

Starting point is 00:14:20 It's a mobile game. You can use Amazon DynamoDB and use AWS Lambdas and have a serverless stack. And you're really only paying. You're kind of keeping your footprint very, very small. And you're able to build a very lively game and see if it gets viral and is growing. And once it grows, you can have all that big company scaling problems. But in the early days, you're just getting started. Now, if you think about DynamoDB and Lambdas and whatnot,

Starting point is 00:14:45 you can build almost every part of the game, except probably the leaderboard. So how do I build a leaderboard when thousands of people are playing and all of their individual game plays and scores and everything is just another simple record in DynamoDB? It's all serverless. But DynamoDB doesn't give me a SQL, select star,

Starting point is 00:15:09 order by score, limit 100, distinct by the same player. No, this is an analytical question. And it has to be updated in real time. Otherwise, you really don't have this thing where I just finished playing and I go to the leaderboard and within a second or two, if it doesn't update, you kind of lose people along the way. So this is one of actually a very popular use case when the scale is much smaller, which is like Rockset Augments, a NoSQL database like a Dynamo or a Mongo, where you can continue to use that for, or even a Postgres or a MySQL for that example, for that case, where you can use that as your system of record and keep it small,

Starting point is 00:15:47 but power all of your compute-heavy and analytical parts of your application with Rockset. So it's almost like a kind of a CQRS pattern where you use your OLTP database as your system of record, you connect Rockset to it. And so Rockset comes in with built-in connectors, by the way,

Starting point is 00:16:04 so you don't have to write a single line of code for your inserts and updates and deletes in your transactional database to get reflected in Rockset within one to two seconds. And so now, all of a sudden, you have a fully indexed, fast SQL replica of your transactional database that on which you can do all sorts of analytical queries,

Starting point is 00:16:21 and that's fully isolated with your transactional database. So this is the pattern that I'm talking about. The mobile leaderboard is an example of that pattern where it comes in very handy. But you can imagine almost everybody building some kind of an application has certain parts of it that is very analytical in nature. And by augmenting your transactional database with Rockset, you can have your cake and eat it too. One of the challenges I think that at least I've run into when it comes to working with data, and let's be clear, I tend to deal with data in relatively small volumes mostly. The stuff that's significant and large,

Starting point is 00:17:00 like, oh, I don't know, AWS bills from large organizations, the format of those is mostly predefined. When I'm building something out using, I don't know, DynamoDB or being dangerous with SQLite or whatnot, invariably, I find that even at small scale, I paint myself into corners by data model design or how I wind up structuring access or the rest. And the thing that I'm doing that makes perfect sense today winds up being incredibly challenging to change later. I mean, I still, in production, have a DynamoDB table that has the word test in its name, because of course I do. It's not a great place to find yourself in some cases. And I'm curious as to what you've seen

Starting point is 00:17:44 as you've been building this out and watching customers, especially ones who already have significant data sets as they move to you, do you have any guidance around how to avoid falling down that particular well? I will say a lot of the complexity in this world is by solving the right problems using the wrong tool or by solving the right problem on the wrong part of the stack. I'll unpack this a little bit, right? So when your patterns change, your application is getting more complex.

Starting point is 00:18:14 It is demanding more things. That doesn't necessarily mean the first part of the application you build, and let's say DynamoDB was your solution for that, was the wrong choice. That is the right choice, but now you've expanded the scope of your application and the demand that you have

Starting point is 00:18:32 on your backend transactional database, and now you have to ask the question, now in the expanded scope, which ones are still more of the same category of things on why I chose Dynamo, and which ones are actually not at all. And so instead of going and abusing the GSIs and other really complex and expensive indexing options and whatnot that Dynamo has built and has all sorts of limitations, instead of that,

Starting point is 00:18:59 what do I really need? And what is the best tool for the job? What is the best system for that? And how do I augment? And how do I manage these things? And this goes to the first thing I said, which is like this tremendous complexity when you start to build a Rube Goldberg machine of sorts. Okay, now I'm going to start making changes to Dynamo. Oh God, like how do I pick up all of those things and not miss a single record? Now replicate that to another second system that is going to be search centric or reporting centric and do i have to resync this once in a while do i have to build and manage these pipelines and suddenly instead of going from one system to two system you actually end up going from one system

Starting point is 00:19:35 to like four different things that with all the pipes and tubes going in the middle and so this is what we really observed and so when you come into Rockset and you point us at your DynamoDB table, you don't write a single line of code and Rockset will automatically scan your Dynamo tables, move that into Rockset. And in real time, your changes, insert, updates, release to Dynamo will be reflected in Rockset. And this is all using Dynamo Streams API, Dynamo Scan API, and whatnot behind the scenes. This just gives you an example of if you use the right tool for the job here, when suddenly your application is demanding analytical queries on Dynamo, and you do very, very good at, while augmenting that with a system built for analytics with full-featured SQL and other capabilities that I can talk about for the parts of your application for which Dynamo is not a good fit.

Starting point is 00:20:35 And so if you use the right tool for the job, you should be in a very good place. The other thing is part about this wrong part of the stack. I'll give a very kind of naive example, and then maybe you can extrapolate that to other patterns on how people could, you know, accidental complexity is the worst. So let's just say you need to implement

Starting point is 00:20:54 access control on your data. Let's say the best place to implement access control is at the database level. Just happens to be that is the right thing. But this database that I picked doesn't really have role-based access control or what have you. It doesn't really give me all the

Starting point is 00:21:09 security features to be able to protect the data that way I want it. So then what I'm going to do is I'm going to go look at all the places that is actually having business logic and querying the database. And I'm going to put a whole bunch of permission management and roles and privileges. And you can just see how that will be so error prone, so hard to maintain, and it will be impossible to scale. And this is what is the worst form of accidental complexity. Because if you had just looked at it in that one week or two weeks, how do I get something out? Or the database I picked doesn't have it. And in that two weeks, you feel like you made some progress by kind of like putting some duct tape if conditions on all the access paths. But now you've just painted yourself

Starting point is 00:21:55 into a really, really bad corner. And so this is another variation of the same problem where you end up solving the right problems in the wrong part of the stack. And that just introduces tremendous amount of accidental complexity. And so I think, yeah, both of these are the common pitfalls that I think people make. I think it's easy to avoid them. I would say there's so much research, there's so much content. And if you know how to search for these things, they're available in the internet. It's a beautiful place.

Starting point is 00:22:26 But I guess you have to know how to search for these things, they're available in the internet. It's a beautiful place. But I guess you have to know how to search for these things. But in my experience, these are the two common pitfalls that a lot of people fall into when painting themselves in a corner. Couchbase Cappella. Database as a service is flexible, full-featured, and fully managed with built-in access via key value, SQL, and full-text search. Flexible JSON documents align to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost.

Starting point is 00:22:58 Capella has the best price performance of any fully managed document database. Visit couchbase.com slash screaming in the cloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella. Make your data sing. A question that I have, though, that is an extension of this, and I want to give some flavor to it, but why is

Starting point is 00:23:26 there a market for real-time analytics? And what I mean by that is, early on in my tenure of fixing horrifying AWS bills, I saw a giant pile of money being hurled over at, effectively, a MapReduce cluster for Elastic MapReduce. Great, okay. Well, stream processing is kind of a thing. What about migrating to that? Well, that was a complete non-starter because it wasn't just the job running on those things. There were downstream jobs with their own downstream jobs.

Starting point is 00:23:58 There were thousands of business processes tied to that thing. And similarly, the idea of real-time analytics, we don't have any use for that because of, oh, I don't know. I only wind up pulling these reports on a once-a-week basis, and that's fine. So what do I need that updated for in real time if I'm looking at them once a week? In practice, the answer is often something aligned with the, well, yeah, but if you had a real-time updating dashboard, you would find that more useful than those reports. But people's expectations and business processes have shaped themselves around constraints that now can be removed. But how do you get them to see that? How do you get them to buy in on that? And then how do you untangle that enormous pile

Starting point is 00:24:37 of previous constraint into something that leverages the technology that's now available for a brighter future? I think it's a really good question. Who are the people moving to real-time analytics? What do they see, and why can't they do it with other tech? Like, you know, as you say, EMR, you know, it's just MapReduce. Can I just run it in sort of every 24 hours, every six hours, every hour? How about every five minutes? It doesn't work that way. How about I spin up a whole bunch of parallel clusters on different timescales so I constantly

Starting point is 00:25:07 have a new report coming in. It's real time, except you're constantly spitting out new ones, but they're just six hours delayed every time. Exactly. So you don't really want to do this. And so let me unpack it one at a time, right? I mean, we talked about a very good example of a business team, which is building business observability at a buy-now-pay-later company. It's a very clear value prop on why they want to go from batch to real-time because it saves their company tremendous losses, potential losses,

Starting point is 00:25:33 and also allows them to build a better product. So it could be a marketing operations team looking to get more real-time observability to see what campaigns are working well today and how do I double down and make sure my ad budget for the day is put to good use. I don't have to mention security operations needing real-time. Don't tell me I got owned three days ago. Tell me somebody is breaking glass and might be entering into your house right now and tell me then and not three

Starting point is 00:26:01 days later. What alert system do you have for security intrusions? Oh, I read the front page of the New York Times every morning. Yeah, and waiting to see my company's name. There probably are better ways to reduce that cycle time. Exactly right. And so that is really the need, right? Like I think more and more business teams are saying, I need operational intelligence and not business intelligence. Don't make me play Monday morning quarterback.

Starting point is 00:26:24 My favorite analogy is it's the middle of the third quarter. I'm six points down. A couple of people, star players in my team and my opponent's team are injured, but there are some in offense, some in defense. What plays do I do and how do I play the game slightly differently to change the outcome of the game and win this game as opposed to losing by six points? So that I think is kind of really what is driving businesses.

Starting point is 00:26:46 I want to be more agile. I want to be more nimble and take kind of being data-driven decision-making to another level. So that, I think, is the real force in play. So now the real question is, why can't they do it already? Because if you go ask 100 people, do you want fast analytics on real-time data or slow analytics on stale data, how many people are going to say, give me slow and stale? Zero, right? Exactly zero people. But then why hasn't it happened yet? I think it goes back to the world only has seen two kinds of databases, transaction processing

Starting point is 00:27:16 systems, built for system of record, don't lose my data kind of systems, and then batch analytics, all these warehouses and data lakes. And so in real-time analytics use cases, the data never stops coming. So you have to actually need a system that is running 24-7. And then what happens is, as soon as you build a real-time dashboard, like this example that you gave, which is like, I just want all of these dashboards to automatically update all the time, immediately people's response is, but I'm not going to be like clockwork orange, you know, toothpicks in my eyelids and be staring at this 24-7. Can you do something to alert or detect some anomalies and tap on my shoulder when something

Starting point is 00:27:57 off is going on? And so now what happens is somebody is actually a program, more than a person, is actually actively monitoring all of these metrics and graphs and doing some analysis and only bringing this to your attention when you really need to because something is off, right? So then suddenly what happens is you went from accumulate all the data and run a batch report to, God, the data never stops coming. The queries never stop coming. I never stop asking questions. It's just a programmatic way of asking those things. And at that point, you have a data

Starting point is 00:28:31 app. This is not a analytics dashboard report anymore. You have a full-fledged application. In fact, that application is harder to build and scale than any application you've ever built before. Because in those situations, again, you don't have this torrents of data coming in all the time and complex analytical questions you're asking on the data 24-7. And so that, I think, is really why real-time analytics platform has to be built as almost a third leg. So this is what we call data apps,

Starting point is 00:29:00 which is when your data never stops coming and your queries never stop coming. So this is really, I think, what is pushing all the expensive EMR clusters or misusing your warehouse, misusing your data lakes. At the end of the day is what is, I think, blowing up your snowflake bills, is what blowing up your warehouse bills, because you somehow accidentally use the wrong tool for the job. Going back to the one that we just talked about, you accidentally say,

Starting point is 00:29:27 oh God, I just need some real-time with enough thrust, pigs can fly. Is that a good idea? Probably not, right? And so I don't want to be building a data app on my warehouse just because I can. You should probably use the best tool for the job and really use something that was built,

Starting point is 00:29:44 ground up for it. And I'll give you one technical insight about how real-time analytics platforms are different than warehouses. Please, I am here for this. Yes. So really, if you think about warehouses and data lakes, I call them storage-optimized systems. I mean, I've been building databases all my life.

Starting point is 00:30:01 So if I have to really build a database that is for batch analytics, you just break down all of your expenses in terms of, let's say, compute and storage. What I'm burning 24-7 is storage. Compute comes and goes heck out of the data and I want to store it in very cheap media. I want to store it. I want to make the storage as cheap as possible. So I want to optimize the heck out of the storage use. And I want to make computation on that possible, but not efficient. I can shuttle things around and make the analysis possible, but I'm not trying to be compute efficient. And we just talked about how as soon as you get into real-time analytics, you very quickly get into the data app business. You're not building a real-time dashboard anymore. You're actually building a data application. And so as soon as you get into that, what happens is you start burning both storage and compute 24

Starting point is 00:30:59 7. And we all know, relatively, compute and RAM is about 100 to 1000 times more expensive than storage in the grand scheme of things. And so if you actually go and look at your snowflake bill, if you go look at your warehouse bill, BigQuery, no matter what, I bet the computational part of it is about 90 to 95% of the bill and not the storage. And then if you again break down, okay, who's spending all the compute? And you very quickly narrow down all these real-time-y and data-app-y use cases where you can never turn off the compute on your warehouse or your BigQuery. And those are the ones that are blowing up your costs and complexity. And on the Rockstead side,

Starting point is 00:31:41 we are actually not storage-optimized, we're compute optimized. So we index all the data as it comes in. And so the storage actually goes slightly higher because we store the data and also the indexes of those data automatically. But we usually fold the computational cost to a quarter of what a typical warehouse needs. So the TCO for our customers goes down by two to four folds. It goes down by half or even to a quarter of what they used to spend, even though their storage cost goes up in net. That is a very, very small fraction of their spend.

Starting point is 00:32:18 And so really, I think good real-time analytics platforms are all compute-optimized and not storage-optimized. And that is what allows them to be a lot more efficient at being the backend for these data applications. As someone who spends a lot of time staring into the depths of AWS bills, I think that people also lose sight of the reality that it doesn't matter what you're spending on AWS. It invariably pales in comparison to what you're spending on people to work with these things. The reason to go to cloud is not because it is the cheapest possible way to get computers to do things. It's because it's a capability story. It's about unlocking capacity and capabilities you do not have otherwise. And it dramatically

Starting point is 00:33:01 increases your feature velocity, and it lets you achieve things faster, sooner, with better results. And unlocking a capability is always going to be more interesting to a company than saving money on it. When a company cares first, last, and always about just save money, make the bill lower at the end, it's usually a company in decline. Or ultimately, something very strange is going on over there. I agree with that. One of our favorite customers told us that Roxette took their six-month roadmap and shrunk it to a single afternoon. And they're a supply chain SaaS backend for heavy construction. 80% of concrete that are being delivered and tracked in North America follows through their platform.

Starting point is 00:33:45 And Roxette powers all of their real-time analytics and reporting. And before Rockset, what did they have? They had built a beautiful serverless stack using DynamoDB, Event Hub, AWS Lambdas, and what have you. And why did they have to do all serverless? Because the entire team was two people. And maybe a third person person once in a while they'll get so 2.5 brilliant people like you know really pioneers of building an entire data stack

Starting point is 00:34:13 on aws on a serverless fashion no pipes no etl and then they were like oh god finally i have to do something because my business demands and my customers are demanding real-time reporting on all of these concrete trucks and aggregate trucks delivering stuff. And real-time reporting is the name of the game for them. And so how do I power this? So I have to build a whole bunch of pipes, deliver it to some elastic search or some kind of a cluster that I have to keep up in real time. And this will take me a couple of months. That will take me a couple of months that will take me a couple of months they came into rock set on a thursday build their mvp over the weekend and they had the first working version of their product the following tuesday so and then you know there was no turning back at that point not a single line of code was written you know you just go and create an account with

Starting point is 00:35:00 rock set point us at your dynamo and then off you you go. You can start using SQL and go start building your real-time application. So again, I think the tremendous value, I think a lot of customers like us and a lot of customers love us. And if you really ask them, what is one thing about Rockset that you really like? I think it'll come back to the same thing, which is you gave me a lot of time back. What I thought would take six months is now a week. What I thought would be three weeks we got there in a day and and that allows me to focus on my business i want to spend more time with my stakeholders you know my cpo my sales teams and see what they need to grow our business and succeed and not build yet another data pipeline and have data pipelines and other

Starting point is 00:35:44 things coming out of my nose, you know. So at the end of the day, the simplicity aspects of it is very, very important for real time analytics. Because, you know, we can't really realize our vision for real time being the new default in every enterprise for whenever analytics concern, without making it very, very simple and accessible to everybody. And so that continues to be one of our core thing. And I think you're absolutely right when you say the biggest expense is actually the people and the time and the energy they have to spend. And not having to stand up a huge data ops team that is building and managing all of

Starting point is 00:36:19 these things is probably the number one reason why our customers really, really like working with our product. I want to thank you for taking so much time to talk me through what you're working on these days. If people want to learn more, where's the best place to find you? We are Rockset. I'll spell it out for your listeners. R-O-C-K-S-E-T, Rockset.

Starting point is 00:36:41 Rockset.com, you can go there. You can start a free trial. There's a blog, rockset.com slash You can go there. You can start a free trial. There's a blog, rockset.com slash blog has a prolific blog that is very active. We have all sorts of stories there and engineers talking about

Starting point is 00:36:55 how they implemented certain things to customer case studies. So if you're really interested in this space, that's one space to follow and watch. If you're interested in giving this a spin, you can go to rockset.com and start a free trial.

Starting point is 00:37:09 If you want to talk to someone, there is a request demo button there. You click it and one of our solutions people or somebody that is more familiar with Rockset would get in touch with you and you can have a conversation with them. Excellent. And links to that will, of course, go in the show notes.

Starting point is 00:37:25 Thank you so much for your time today. I appreciate it. Thanks, Corey. It was great. Wenkat Wenkat Andramani, co-founder and CEO at Rockset. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.

Starting point is 00:37:40 If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting, crappy comment that I will immediately see show up on my real-time dashboard. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill

Starting point is 00:38:14 Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. This has been a HumblePod production. Stay humble.

Screaming in the Cloud - Data Analytics in Real Time with Venkat Venkataramani

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.