Screaming in the Cloud - Data Analytics in Real Time with Venkat Venkataramani
Episode Date: April 27, 2022About VenkatVenkat Venkataramani is CEO and co-founder of Rockset. In his role, Venkat helps organizations build, grow and compete with data by making real-time analytics accessible to develo...pers and data teams everywhere. Prior to founding Rockset in 2016, he was an Engineering Director for the Facebook infrastructure team that managed online data services for 1.5 billion users. These systems scaled 1000x during Venkat's eight years at Facebook, serving five billion queries per second at single-digit millisecond latency and five 9's of reliability. Venkat and his team also created and contributed to many noted data technologies and open-source projects, including Facebook's TAO distributed data store, RocksDB, Memcached, MySQL, MongoRocks, and others. Prior to Facebook, Venkat worked on tools to make the Oracle database easier to manage. He has a master’s in computer science from the University of Wisconsin-Madison, and bachelor’s in computer science from the National Institute of Technology, Tiruchirappalli.Links Referenced:Company website: https://rockset.comCompany blog: https://rockset.com/blog
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored by our friends at Revelo.
Revelo is the Spanish word of the day, and it's spelled R-E-V-E-L-O.
It means I reveal.
Now, have you tried to hire an engineer lately?
I assure you it is significantly harder
than it sounds. One of the things that Ravello has recognized is something I've been talking
about for a while, specifically that while talent is evenly distributed, opportunity is absolutely
not. They're exposing a new talent pool to basically those of us without a presence in Latin America via their platform.
It's the largest tech talent marketplace in Latin America with over a million engineers in their network,
which includes but isn't limited to talent in Mexico, Costa Rica, Brazil, and Argentina.
Now, not only do they wind up spreading all of their talent on English ability as well as, you know, their engineering skills, but they go significantly beyond that.
Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to.
Let's also not forget that Latin America has high time zone overlap with what we have here in the United States.
So you can hire full time remote engineers who share most of the workday as your team. It's an end-to-end talent service. So you can find and
hire engineers in Central and South America without having to worry about, frankly, the
colossal pain of cross-border payroll and benefits and compliance because Ravello handles all of it. If you're hiring engineers, check out revelo.io
slash screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.
This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into
production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process.
What if launching new features
didn't require you to do a full-on code
and possibly infrastructure deploy?
What if you could test on a small subset of users
and then roll it back immediately
if results aren't what you expect?
LaunchDarkly does exactly this.
To learn more, visit launchdarkly.com
and tell them Corey sent you and watch for the
wince. Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted guest episode is
one of those questions I really like to ask because it can often come across as incredibly,
well, direct, which is one of the things I love doing.
In this case, the question that I am asking is, when you look around at the list of colossal
blunders that people make in the course of careers in technology and the rest, it's one
of the most common is, oh, yeah, I don't like the way that this thing works, so I'm going
to build my own database. That is the siren call to engineers, and it is often the prelude to horrifying disasters.
Today, my guest is Venkat Venkat Armani, co-founder and CEO at Rockset.
Venkat, thank you for joining me.
Thanks for having me, Corey. It's a pleasure to be here. So it is easy for me to sit here in
my beautiful ivory tower that is crumbling down around me and use my favorite slash the best
database imaginable, which is text records shoved into Route 53. Now, there are certainly better
databases than that for most use cases. Almost anything, really, to be honest with you,
because that is a terrifying pattern. Good joke, terrible practice. What is Rockset as we look at
the broad landscape of things that store data? Rockset is a real-time analytics platform built
for the cloud. Let me break that down a little bit. I think it's a very good question when you say, does the world really need another database? Don't we have enough already? SQL databases, NoSQL databases, warehouses, lake houses now. in the 80s was when people actually retired pen and paper records and started using a relational
database to actually manage their business records and what have you instead of ledgers and books and
what have you and that was the digital force digital transformation that was an oracle called
their rose in a table records for a reason they're called records to this date and then you know 20
years later when all businesses
were doing system of record and transactions and transactional databases, then analytics was born.
This was the whole reason why I wanted to make better data-driven business decisions.
And BI was born, warehouses and data lakes started becoming more and more mainstream.
And there was really a second category of database management
systems, because the first category was very good at to be a system of record, but not really good
at complex analytics that businesses are asking to be able to guide their decisions. Fast forward
20 years from then, the nature of applications are changing. The world is going from batch to
real time. Your data never stops coming. Advent of Apache Kafka
and technologies like that,
5G, IOTs,
data is coming from all sorts of nooks and corners
within an enterprise.
And now customers and enterprises
are acquiring that data in real-time
at a scale that the world has never seen before.
Now, how do you get analytics out of that?
And then if you look at the database market,
entire market,
there's still only two large categories of databases,
OLTP databases for transaction processing
and warehouses and data lakes for batch analytics.
Now, suddenly you need the speed of OLTP
at the scale of batch, right?
In terms of like complexity of compute,
complexity of storage.
So that is really why we thought the data management space needs a third leg.
And we call it real-time analytics platform or real-time analytics processing.
And this is where the data never stops coming.
The queries never stop coming.
You need the speed and the scale.
And it's about time we innovate and solve the problem well.
Because 2015, 2016, when I was researching for this,
every company that was looking to solve,
build applications
about real-time applications
was building a custom
Rube Goldberg machine of sorts.
And it was insanely complex.
It was insanely expensive.
Fast forward now,
you can build a real-time application
in a matter of hours
with the simplicity of the cloud
using Rockset.
There's a lot to be said that the way we used to do things
after the first transformation and we got into the world of batch processing,
where in the days of punch cards, which was a bit before my time,
and I believe yours as well, where they would drop them off
and then the next day or two days, they would come back later.
And after the run, they would get the results only to figure out syntax error
because you put the wrong card first or something like that. And it was maddening. In time, that got
better, but still, nightly runs have become a thing, to the point where even now, by default,
if you wind up looking at the typical timing of a default Linux install, for example, you see that
in the middle of the night is when a bunch of things will rotate, when various cleanup jobs get done, et cetera, et cetera. And that seemed like a weird direction
to go in. One of the most famous Google April Fool's Day jokes was when they put out their
white paper onto MapReduce, and then Yahoo, Felford, Hook, Line, and Snicker built out Hadoop,
and we've been stuck with this idea of performing these big query jobs on top of existing giant piles of data where ideally you can measure it with
a wall clock.
In practice, you often measure it with a calendar in some cases.
And as the world continues to evolve, being able to do streaming processing and understand
in real time what is going on is unlocking different approaches, at least by all accounts.
Do you have an example you can give me of a problem that real-time analytics solves for a
customer? Because I can sit here and talk all day about how things might theoretically work,
but I have to get out of my Route 53-based ivory tower over here. What are customers saying?
That's a great question, and I 100% agree. I think
Google did build MapReduce. And I think it's a very nice continuation of what happened there
and what is happening in the world now. And they built MapReduce and they quickly realized
re-indexing the whole world every night as the size of the internet is exploding, it's a bad idea.
And you know how Google indexes now? They
do real-time indexing. That is how they index the web. And they look for the changes that are
happening in the internet, and they only index the changes. And that is exactly the same principle
behind one of the core principles behind Rockset's real-time analytics platform.
So what is a customer story? So let me give you one of my favorite ones. So
the world's number one or number two buy-now-pay-later company, they have hundreds of millions of users.
They have 300,000 plus merchants.
They operate in like maybe 100 plus countries.
So many different payment methods.
You can imagine the complexity.
At any given point in time, some part of their product is broken.
Oh, Apple Pay stopped working in Switzerland for this e-commerce merchant.
Oh God, like we got to first detect that.
Forget even debugging and figuring out what happened
and having an incident response team.
So what do they do as they scale
the number of payments processed in the system
across the world?
It's like in millions.
First it was millions in a day
and it was millions in an hour.
So like everybody else,
they built a batch-based system.
So they would accumulate
all these payment records
every six hours.
So initially it was a day
and then afterwards,
you know, you try to see
how far I can push it
and they couldn't push it
beyond every six hours.
Every six hours,
some batch job would come
and process through
all the payments that happened,
have some statistical models to detect, detect hey here are some of the things that you might want to double click and follow up on and as they were scaling the batch job that they would kick off
every six hours was starting to take more than six hours so you can see how the story goes
now fast forward they came to us and say, it's almost like Roxette has like
a big red button that says real time this. And then they're kind of like, can you make this
real time? Because not only that we are losing millions of potential revenue dollars in a year,
because something stops working, and we're not processing payments. And we don't find out about
that up to like, three hours later, five hours later, six hours later. But our merchants are
also very unhappy. We're also not able to protect our customers' business because that is all we
are about. And so fast forward, they use Rockset and simply using SQL, now they have all the
metrics and statistical computation that they want to do happens in real time that are accurate up to
the second. All of their anomaly detectors run every minute,
and the anomaly detectors take hundreds of milliseconds to run.
And so now they've cut down the business observability, I would say. It's not
metrics and machine observability. It's actually they have now a business observability
in real time. And that not only actually saves them a lot of potential revenue
loss from downtimes, that's also allowing them to build a better product and give their customers
a better experience. Because they are now calling their merchants and their customers that something
is not working in some part of your e-commerce footprint before even the customers notice that
something is wrong. And that allows them to build a better product and a better customer experience than their competitors.
So this is a very real-world example
of why companies and enterprises
are moving from batch to real-time.
The stories that you and, frankly,
a lot of other data analytics companies
tend to fall back on all the time
has been stories of the ones you're telling,
where you're talking about the largest buy-now-pay-later lender, for example.
These are companies operating at massive scale who have tremendous existing transaction volume,
and they're built out already. That's great, but then I wind up trying to cut to the truth of some of these things. And when I visit your pricing page at Rockset, it doesn't have what I would expect if that were the
only use case. And what that would be is, great, call here to open up a sales quote and we'll talk
to you, et cetera, et cetera, et cetera. And the answer then is, okay, I know it's going to have
at least two commas in it, ideally not three, but okay,
great. Instead, you have a free tier where it's, hey, we'll give you a pile of credits. Here's
some limits on our free account, et cetera, et cetera. Great. That is awesome. So it tells me
that there is a use case here for folks who have not already on some level made a good show of
starting the process of conquering the world. Rather, someone with an idea some evening at two in the morning
can wind up diving in and getting started. What is the Twitter for pets in my garage,
spare time side project story for using something like Roxette? What problem will I have as I wind
up building those things out when I don't have any user traffic or data yet, but I want to,
you know, for once in my life, do the smart thing in advance rather than building an impressive tower of technical debt.
That is the first thing we built, by the way.
When we finished our product, the first thing we built was self-service.
The first thing we built was a free-for-every tier, which has certain limits because somebody has to pay the bill, right?
And then we also have compute instances that are very, very affordable
that cost you like approximately $1 a day.
And so we built all of that
because real-time analytics is not a need
that only like the large scale companies have.
And I'll give you a very, very simple example.
Let's say you're building a game.
It's a mobile game.
You can use Amazon DynamoDB and use AWS Lambdas and have a serverless stack.
And you're really only paying.
You're kind of keeping your footprint very, very small.
And you're able to build a very lively game and see if it gets viral and is growing.
And once it grows, you can have all that big company scaling problems.
But in the early days, you're just getting started.
Now, if you think about DynamoDB and Lambdas and whatnot,
you can build almost every part of the game,
except probably the leaderboard.
So how do I build a leaderboard
when thousands of people are playing
and all of their individual game plays and scores
and everything is just another simple record in DynamoDB?
It's all serverless.
But DynamoDB doesn't give me a SQL, select star,
order by score, limit 100, distinct by the same player.
No, this is an analytical question.
And it has to be updated in real time.
Otherwise, you really don't have this thing
where I just finished playing and I go to the leaderboard
and within a second or two, if it doesn't update, you kind of lose people along the way.
So this is one of actually a very popular use case when the scale is much smaller, which is like Rockset Augments, a NoSQL database like a Dynamo or a Mongo, where you can continue to use that for, or even a Postgres or a MySQL for that example, for that case, where you can use that as your system of record
and keep it small,
but power all of your compute-heavy
and analytical parts of your application with Rockset.
So it's almost like a kind of a CQRS pattern
where you use your OLTP database
as your system of record,
you connect Rockset to it.
And so Rockset comes in with built-in connectors,
by the way,
so you don't have to write a single line of code
for your inserts and updates and deletes
in your transactional database
to get reflected in Rockset within one to two seconds.
And so now, all of a sudden,
you have a fully indexed, fast SQL replica
of your transactional database
that on which you can do all sorts of analytical queries,
and that's fully isolated with your transactional database. So this is the pattern that I'm talking about. The mobile leaderboard
is an example of that pattern where it comes in very handy. But you can imagine almost
everybody building some kind of an application has certain parts of it that is very analytical
in nature. And by augmenting your transactional database with Rockset, you can have your cake and eat it too.
One of the challenges I think that at least I've run into
when it comes to working with data, and let's be clear,
I tend to deal with data in relatively small volumes mostly.
The stuff that's significant and large,
like, oh, I don't know, AWS bills from large organizations,
the format of those is mostly
predefined. When I'm building something out using, I don't know, DynamoDB or being dangerous with
SQLite or whatnot, invariably, I find that even at small scale, I paint myself into corners by
data model design or how I wind up structuring access or the rest. And the thing that I'm doing
that makes perfect sense today winds up being incredibly challenging to change later. I mean,
I still, in production, have a DynamoDB table that has the word test in its name, because of course
I do. It's not a great place to find yourself in some cases. And I'm curious as to what you've seen
as you've been building this out and watching
customers, especially ones who already have significant data sets as they move to you,
do you have any guidance around how to avoid falling down that particular well?
I will say a lot of the complexity in this world is by solving the right problems using the wrong
tool or by solving the right problem on the wrong part of the stack.
I'll unpack this a little bit, right?
So when your patterns change,
your application is getting more complex.
It is demanding more things.
That doesn't necessarily mean
the first part of the application you build,
and let's say DynamoDB was your solution for that,
was the wrong choice.
That is the right choice,
but now you've expanded the scope of your application
and the demand that you have
on your backend transactional database,
and now you have to ask the question,
now in the expanded scope,
which ones are still more of the same category of things
on why I chose Dynamo,
and which ones are actually not at all.
And so instead of going and abusing the GSIs and other really complex and expensive indexing
options and whatnot that Dynamo has built and has all sorts of limitations, instead of that,
what do I really need? And what is the best tool for the job? What is the best
system for that? And how do I augment? And how do I manage these things? And this goes to the first thing I said,
which is like this tremendous complexity when you start to build a Rube Goldberg machine of sorts.
Okay, now I'm going to start making changes to Dynamo. Oh God, like how do I pick up all of
those things and not miss a single record? Now replicate that to another second system that is
going to be search centric or reporting centric
and do i have to resync this once in a while do i have to build and manage these pipelines and
suddenly instead of going from one system to two system you actually end up going from one system
to like four different things that with all the pipes and tubes going in the middle and so this
is what we really observed and so when you come into Rockset and you point us at your DynamoDB
table, you don't write a single line of code and Rockset will automatically scan your Dynamo tables,
move that into Rockset. And in real time, your changes, insert, updates, release to Dynamo will
be reflected in Rockset. And this is all using Dynamo Streams API, Dynamo Scan API, and whatnot behind the scenes.
This just gives you an example of if you use the right tool for the job here, when suddenly your application is demanding analytical queries on Dynamo, and you do very, very good at, while augmenting that with a system built for analytics with full-featured SQL and other
capabilities that I can talk about for the parts of your application for which Dynamo
is not a good fit.
And so if you use the right tool for the job, you should be in a very good place.
The other thing is part about this wrong part of the stack.
I'll give a very kind of naive example,
and then maybe you can extrapolate that
to other patterns
on how people could, you know,
accidental complexity is the worst.
So let's just say you need to implement
access control on your data.
Let's say the best place
to implement access control
is at the database level.
Just happens to be that is the right thing.
But this database that I picked
doesn't
really have role-based access control or what have you. It doesn't really give me all the
security features to be able to protect the data that way I want it. So then what I'm going to do
is I'm going to go look at all the places that is actually having business logic and querying
the database. And I'm going to put a whole bunch of permission management and roles and
privileges. And you can just see how that will be so error prone, so hard to maintain,
and it will be impossible to scale. And this is what is the worst form of accidental complexity.
Because if you had just looked at it in that one week or two weeks, how do I get something out?
Or the database I picked doesn't have it. And in that two weeks, you feel like you made some progress by kind of like
putting some duct tape if conditions on all the access paths. But now you've just painted yourself
into a really, really bad corner. And so this is another variation of the same problem where you
end up solving the right problems in the wrong part of the stack.
And that just introduces tremendous amount of accidental complexity.
And so I think, yeah, both of these are the common pitfalls that I think people make.
I think it's easy to avoid them.
I would say there's so much research, there's so much content.
And if you know how to search for these things, they're available in the internet.
It's a beautiful place.
But I guess you have to know how to search for these things, they're available in the internet. It's a beautiful place. But I guess you have to know how to search for these things.
But in my experience, these are the two common pitfalls that a lot of people fall into when painting themselves in a corner.
Couchbase Cappella.
Database as a service is flexible, full-featured, and fully managed with built-in access via
key value, SQL, and full-text search.
Flexible JSON documents align to your applications and workloads.
Build faster with blazing fast in-memory performance
and automated replication and scaling while reducing cost.
Capella has the best price performance of any fully managed document database.
Visit couchbase.com slash screaming in the cloud to try Capella today for free
and be up and running in three minutes with no credit card required.
Couchbase Capella.
Make your data sing.
A question that I have, though, that is an extension of this,
and I want to give some flavor to it,
but why is
there a market for real-time analytics? And what I mean by that is, early on in my tenure of fixing
horrifying AWS bills, I saw a giant pile of money being hurled over at, effectively, a MapReduce
cluster for Elastic MapReduce. Great, okay.
Well, stream processing is kind of a thing.
What about migrating to that?
Well, that was a complete non-starter
because it wasn't just the job running on those things.
There were downstream jobs with their own downstream jobs.
There were thousands of business processes tied to that thing.
And similarly, the idea of real-time analytics, we don't have any use for
that because of, oh, I don't know. I only wind up pulling these reports on a once-a-week basis,
and that's fine. So what do I need that updated for in real time if I'm looking at them once a week?
In practice, the answer is often something aligned with the, well, yeah, but if you had a real-time
updating dashboard, you would find that more useful than those reports. But people's expectations and business processes
have shaped themselves around constraints that now can be removed. But how do you get them to
see that? How do you get them to buy in on that? And then how do you untangle that enormous pile
of previous constraint into something that leverages the technology that's now available
for a brighter future?
I think it's a really good question.
Who are the people moving to real-time analytics?
What do they see, and why can't they do it with other tech?
Like, you know, as you say, EMR, you know, it's just MapReduce. Can I just run it in sort of every 24 hours, every six hours, every hour?
How about every five minutes? It doesn't work that way.
How about I spin up a whole bunch of parallel clusters on different timescales so I constantly
have a new report coming in. It's real time, except you're constantly spitting out new ones,
but they're just six hours delayed every time. Exactly. So you don't really want to do this.
And so let me unpack it one at a time, right? I mean, we talked about a very good example of
a business team, which is building business observability at a buy-now-pay-later company.
It's a very clear value prop
on why they want to go from batch to real-time
because it saves their company
tremendous losses, potential losses,
and also allows them to build a better product.
So it could be a marketing operations team
looking to get more real-time observability
to see what campaigns are working well today
and how do I double down
and make sure my ad budget for the day is put to good use. I don't have to mention security
operations needing real-time. Don't tell me I got owned three days ago. Tell me somebody is
breaking glass and might be entering into your house right now and tell me then and not three
days later. What alert system do you have for security intrusions? Oh, I read the front page of the New York Times every morning.
Yeah, and waiting to see my company's name.
There probably are better ways to reduce that cycle time.
Exactly right.
And so that is really the need, right?
Like I think more and more business teams are saying,
I need operational intelligence and not business intelligence.
Don't make me play Monday morning quarterback.
My favorite analogy is it's the middle of the third quarter.
I'm six points down.
A couple of people, star players in my team and my opponent's team are injured,
but there are some in offense, some in defense.
What plays do I do and how do I play the game slightly differently
to change the outcome of the game and win this game
as opposed to losing by six points?
So that I think is kind of really what is driving businesses.
I want to be more agile.
I want to be more nimble and take kind of being data-driven decision-making to another level.
So that, I think, is the real force in play.
So now the real question is, why can't they do it already?
Because if you go ask 100 people, do you want fast analytics on real-time data
or slow analytics on stale data, how many people are going to say,
give me slow and stale? Zero, right? Exactly zero people. But then why hasn't it happened yet?
I think it goes back to the world only has seen two kinds of databases, transaction processing
systems, built for system of record, don't lose my data kind of systems, and then batch analytics,
all these warehouses and data lakes. And so in real-time analytics use cases, the data never stops coming.
So you have to actually need a system that is running 24-7.
And then what happens is, as soon as you build a real-time dashboard, like this example that
you gave, which is like, I just want all of these dashboards to automatically update all
the time, immediately people's response is, but I'm not going to be like
clockwork orange, you know, toothpicks in my eyelids and be staring at this 24-7.
Can you do something to alert or detect some anomalies and tap on my shoulder when something
off is going on?
And so now what happens is somebody is actually a program, more than a person, is actually actively monitoring all of these metrics and graphs and doing some analysis and only bringing
this to your attention when you really need to because something is off, right?
So then suddenly what happens is you went from accumulate all the data and run a batch
report to, God, the data never stops coming.
The queries never stop coming.
I never stop asking
questions. It's just a programmatic way of asking those things. And at that point, you have a data
app. This is not a analytics dashboard report anymore. You have a full-fledged application.
In fact, that application is harder to build and scale than any application you've ever built
before. Because in those situations, again, you don't have this torrents of data coming in all the time
and complex analytical questions you're asking on the data 24-7.
And so that, I think, is really why
real-time analytics platform has to be built
as almost a third leg.
So this is what we call data apps,
which is when your data never stops coming
and your queries never stop coming.
So this is really, I think, what is pushing all the expensive EMR clusters or misusing
your warehouse, misusing your data lakes.
At the end of the day is what is, I think, blowing up your snowflake bills, is what blowing
up your warehouse bills, because you somehow accidentally use the wrong tool for the job.
Going back to the one that we just talked about,
you accidentally say,
oh God, I just need some real-time with enough thrust,
pigs can fly.
Is that a good idea?
Probably not, right?
And so I don't want to be building a data app
on my warehouse just because I can.
You should probably use the best tool for the job
and really use something that was built,
ground up for it.
And I'll give you one technical insight about how real-time analytics platforms are different
than warehouses.
Please, I am here for this.
Yes.
So really, if you think about warehouses and data lakes, I call them storage-optimized
systems.
I mean, I've been building databases all my life.
So if I have to really build a database that is for batch analytics, you just break down all of your expenses in terms of, let's say, compute and storage.
What I'm burning 24-7 is storage. Compute comes and goes heck out of the data and I want to store it in
very cheap media. I want to store it. I want to make the storage as cheap as possible. So I want
to optimize the heck out of the storage use. And I want to make computation on that possible,
but not efficient. I can shuttle things around and make the analysis possible,
but I'm not trying to be compute efficient. And we just talked about how as soon as you get into real-time analytics, you very quickly get into the data app business.
You're not building a real-time dashboard anymore. You're actually building a data application.
And so as soon as you get into that, what happens is you start burning both storage and compute 24
7. And we all know, relatively, compute and RAM is about 100 to 1000 times more expensive than storage in the grand scheme of things. And so if you actually go and look at your snowflake bill, if you go look at your warehouse bill, BigQuery, no matter what, I bet the computational part of it is about 90 to 95% of the bill and not the storage. And then if you again break down, okay, who's spending all the compute?
And you very quickly narrow down
all these real-time-y and data-app-y use cases
where you can never turn off the compute
on your warehouse or your BigQuery.
And those are the ones that are
blowing up your costs and complexity.
And on the Rockstead side,
we are actually not storage-optimized,
we're compute optimized.
So we index all the data as it comes in. And so the storage actually goes slightly higher because we store the data and also the indexes of those data automatically.
But we usually fold the computational cost to a quarter of what a typical warehouse needs.
So the TCO for our customers goes down by two to four folds.
It goes down by half or even to a quarter of what they used to spend,
even though their storage cost goes up in net.
That is a very, very small fraction of their spend.
And so really, I think good real-time analytics platforms
are all compute-optimized and not storage-optimized.
And that is what allows them to be a lot more efficient at being the backend for these data applications.
As someone who spends a lot of time staring into the depths of AWS bills, I think that people also lose sight of the reality that it doesn't matter what you're spending on AWS.
It invariably pales in comparison to what you're
spending on people to work with these things. The reason to go to cloud is not because it is
the cheapest possible way to get computers to do things. It's because it's a capability story.
It's about unlocking capacity and capabilities you do not have otherwise. And it dramatically
increases your feature velocity, and it lets you achieve things faster, sooner, with better results.
And unlocking a capability is always going to be more interesting to a company than saving money on it.
When a company cares first, last, and always about just save money, make the bill lower at the end, it's usually a company in decline.
Or ultimately, something very strange is going on over there.
I agree with that.
One of our favorite customers told us that Roxette took their six-month roadmap and shrunk it to a single afternoon.
And they're a supply chain SaaS backend for heavy construction.
80% of concrete that are being delivered and tracked in North America follows through their platform.
And Roxette powers all of their real-time analytics and reporting.
And before Rockset, what did they have?
They had built a beautiful serverless stack using DynamoDB,
Event Hub, AWS Lambdas, and what have you.
And why did they have to do all serverless?
Because the entire team was two people.
And maybe a third person person once in a while
they'll get so 2.5 brilliant people like you know really pioneers of building an entire data stack
on aws on a serverless fashion no pipes no etl and then they were like oh god finally i have to
do something because my business demands and my customers are demanding real-time reporting on
all of these concrete trucks and aggregate trucks delivering stuff. And real-time reporting
is the name of the game for them. And so how do I power this? So I have to build a whole bunch of
pipes, deliver it to some elastic search or some kind of a cluster that I have to keep up in real
time. And this will take me a couple of months. That will take me a couple of months that will take me a couple of months they came into rock set on a thursday build their mvp over the weekend and they had the first working
version of their product the following tuesday so and then you know there was no turning back at
that point not a single line of code was written you know you just go and create an account with
rock set point us at your dynamo and then off you you go. You can start using SQL and go start building your real-time application. So again, I think the tremendous value,
I think a lot of customers like us and a lot of customers love us. And if you really ask them,
what is one thing about Rockset that you really like? I think it'll come back to the same thing,
which is you gave me a lot of time back. What I thought would take six months is now a week.
What I thought would be three weeks
we got there in a day and and that allows me to focus on my business i want to spend more time
with my stakeholders you know my cpo my sales teams and see what they need to grow our business
and succeed and not build yet another data pipeline and have data pipelines and other
things coming out of my nose,
you know. So at the end of the day, the simplicity aspects of it is very, very important for real
time analytics. Because, you know, we can't really realize our vision for real time being the new
default in every enterprise for whenever analytics concern, without making it very, very simple and
accessible to everybody.
And so that continues to be one of our core thing.
And I think you're absolutely right when you say the biggest expense is actually the people and the time and the energy they have to spend.
And not having to stand up a huge data ops team that is building and managing all of
these things is probably the number one reason why our customers really, really like working with our product.
I want to thank you for taking so much time
to talk me through what you're working on these days.
If people want to learn more,
where's the best place to find you?
We are Rockset.
I'll spell it out for your listeners.
R-O-C-K-S-E-T, Rockset.
Rockset.com, you can go there.
You can start a free trial.
There's a blog, rockset.com slash You can go there. You can start a free trial. There's a blog,
rockset.com slash blog
has a prolific blog
that is very active.
We have all sorts of stories there
and engineers talking about
how they implemented certain things
to customer case studies.
So if you're really interested
in this space,
that's one space to follow and watch.
If you're interested
in giving this a spin,
you can go to rockset.com and start a free trial.
If you want to talk to someone,
there is a request demo button there.
You click it and one of our solutions people
or somebody that is more familiar with Rockset
would get in touch with you
and you can have a conversation with them.
Excellent.
And links to that will, of course, go in the show notes.
Thank you so much for your time today.
I appreciate it.
Thanks, Corey.
It was great.
Wenkat Wenkat Andramani,
co-founder and CEO at Rockset.
I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud.
If you've enjoyed this podcast,
please leave a five-star review
on your podcast platform of choice.
Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice.
Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting, crappy comment that I will immediately see show up on my real-time dashboard.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill
Group works for you, not AWS. We tailor recommendations to your business and we get
to the point. Visit duckbillgroup.com to get started.
This has been a HumblePod production.
Stay humble.