Screaming in the Cloud - It’s like a HeatWave, Burning in my Heart with Nipun Agarwal
Episode Date: March 29, 2022About NipunNipun Agarwal is a Senior Vice President, MySQL HeatWave and Advanced Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies a...nd security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.Links:Oracle: https://www.oracle.com
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Vulture, spelled V-U-L-T-R,
because they're all about helping save money, including on things like, you know, vowels.
So what they do is they are a cloud provider that provides surprisingly high
performance cloud compute at a price that, well, sure, they claim it is better than AWS's pricing.
And when they say that, they mean that it's less money. Sure, I don't dispute that. But what I find
interesting is that it's predictable. They tell you in advance on a monthly basis what it's going
to cost. They have a bunch of advanced networking features. They tell you in advance on a monthly basis what it's going to cost.
They have a bunch of advanced networking features.
They have 19 global locations and scale things elastically, not to be confused with openly,
which is apparently elastic and open.
They can mean the same thing sometimes.
They have had over a million users.
Deployments take less than 60 seconds across 12 pre-selected operating systems,
or if you're one of those nutters like me,
you can bring your own ISO and install basically any operating system you want.
Starting with pricing as low as $2.50 a month
for Vulture Cloud Compute,
they have plans for developers and businesses of all sizes,
except maybe Amazon,
who stubbornly insists on having something of the scale on their
own. Try Vulture today for free by visiting vulture.com slash screaming, and you'll receive
$100 in credit. That's v-u-l-t-r dot com slash screaming. Couchbase Cape Database as a service is flexible, full-featured, and fully managed,
with built-in access via key-value, SQL, and full-text search. Flexible JSON documents align
to your applications and workloads. Build faster with blazing fast in-memory performance and
automated replication and scaling scaling while reducing cost.
Capella has the best price performance of any fully managed document database.
Visit couchbase.com slash screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required.
Couchbase Capella.
Make your data sing.
Welcome to Screaming in the Cloud. I'm Corey Quinn.
Today's promoted episode is a returning guest with a slight difference. When last we spoke,
Nipun Agarwal was a VP over at Oracle, but now that's right. When people stay at a company long enough and perform well, they wind up getting additional adjectives in lieu of other things.
Nipun, you're now a senior VP over at Oracle. Congratulations, I think,
unless that just means you've gotten older. Welcome back.
Thank you, Corey.
So now that you're at SVP level, I can ask some of the harder questions that we didn't necessarily,
it seemed fair to get into
the last time we spoke, such as what is an oracle and what might they do these days for folks who
have, I don't know, been living in a cave for 40 years? Corey, glad to be back on your show. And
since the last time we spoke, we have had like, you know, a lot of enhancements and innovations,
and I'll be happy to describe those in detail whenever is a good time.
Absolutely. So you've been focused on MySQL for a very long time. I mean, you've been using it so
long, I really should be calling it your SQL, but that's neither here nor there. And you've also
been focusing on HeatWave, which is effectively MySQL with then some, I'm just going to cheat and call it magic, that is layered on top
of it. That is probably a terrible descriptor of what it actually does, but understand I'm coming
from a perspective where I firmly believe the best database in the world is, you know, Amazon Route
53, which is a DNS server. So people look at that and say, well, that's not really what it's designed
to do, which really sounds like a them problem. And fair and fair enough, we're going to invert it here. So
why is HeatWave a terrible DNS server? What is it exactly?
So MySQL is the most popular database in the world. It's the most popular open source database
in the world. Lots of people use it. All the major cloud vendors, they take the MySQL database
and either as is or with some enhancements, they offer a managed service, whether it's Amazon,
Azure, Google, pretty much all the major cloud vendors. Now, MySQL has been designed and optimized
for transaction processing. So it does a great job for transaction processing.
But when customers need to run complex queries, or when they need to run analytics,
customers would have to take the data out of the MySQL database into some other database for
running analytics. Let me make sure I understand your terms properly. When you say transactional,
you're talking about I'm shopping for underpants on a website. I go ahead and make a purchase
that's considered a transaction,
and a database change reflecting my purchase makes sense.
From an analytics perspective, you're like,
all right, let's see who bought underpants during this time period.
It's effectively usually a small individual record
versus now we're going to start doing deep dives
into effectively a lot of those records in aggregate.
Is that directionally correct, or is my understanding more than a little flawed about things beyond DNS?
Right. What you described is very accurate, that transactional processing is about point queries
making frequent changes, whereas when we talk about analytics, it typically involves scanning
a much larger amount of data to get the results, And aggregations is a very good example of that.
So historically, it seems that people have used very different tooling for different sides of those.
Ideally, I admit back in the battle days when I was a systems administrator,
we were running MySQL a fair bit and we had the primary database,
which was the thing that handled all of the live transactions and the rest.
And whenever we ran business reporting queries on it, it's like, huh, why is the website super slow?
And it didn't seem to work very well. Now, back then, at the scale we were operating at,
the solution was, ah, we're going to use a replica, and then we're going to basically
beat the crap out of the replica for our reporting queries. And if that gets a little slow and bogged
down, who cares? Well, just other people running reporting queries, people can still buy underpants.
So that was the way that we handled it back then.
This was a decade ago.
Datasets have gotten significantly larger since then.
And apparently my way of viewing it is, as they say, quaint when they're trying not to be actively insulting.
The right way to do it these days is to have completely separate systems that
wind up handling those queries with different user interfaces by and large. That is, to my
understanding, the rise of big data. And you can hear the initial caps in big data when people
talk about it like that. Correct. So what you describe is absolutely correct, that people would
extract the data out of databases, take it to specialized databases which are apt
for running decision-making and re-processing.
But the downside is that A,
people need to express the logic and write code to extract
this data and then customers end up with these two different databases.
They got to keep the data in sync,
they got to move the data periodically.
There are a lot of issues
in terms of having to manage two different databases,
one for transaction processing, one for analytics.
What we have done with HeatWave
is to enhance the MySQL database service
in the Oracle Cloud
so that now the single MySQL database
is optimized both for transaction processing
as well as analytics. So now you have a single database, and whether you want to run point
queries or these aggregate queries, you can do it on the same data. So the data remains as is.
You're bringing richness of computation, richness in query processing to the customers.
One of the truisms of cloud is that it forces a re-evaluation, in many cases,
of things that people historically hadn't had to think about.
A classic example, when I was consulting on cloud migrations,
was building up costing models, as you might imagine.
And my customers would ask me questions such as,
great, so what's this going to cost us?
And I would come back with, well, okay, how many gigabytes in a given month does transfer between this database and that other database, you know, in the machine sitting right
next to it? And their response started off with a, why on earth do you think we would know that?
Followed by, wait, why do we need to know that? Followed by, oh God, it costs us to do what?
And very quickly, an architectural pattern has emerged within cloud of, you know, people
experience this the second time they plan for it.
And as a result, whatever database is the most cost effective is the one the data's
already in.
Because moving data from point to point is inherently an expensive proposition.
Depending on where the second point is,
it can be an extortionately expensive proposition,
which means that very often we'll start to see patterns
that are, I guess, sacrificing one side
of the database interaction model or the other,
that transactions are going to be a little slower
because you need to have it in the same place
you're going to be running large-scale analytics on,
or alternately, analytics are going to be super crappy just because you have to wind
up querying systems during downtimes and low periods. It just becomes a giant mess. Regardless
of whether it's bad in one way, bad in another, or just expensive, it hasn't worked for people.
And my sense is that that is what HeatWave is directly aimed at. Yes, indeed.
So there are multiple reasons why HeatWave is being so successful.
One is the case that customers need a single database instead of having multiple.
The second thing is there is absolutely no change required to MySQL applications.
So the MySQL applications or MySQL-compatible applications work as is with this query accelerator HeatWave without any change.
But the third reason why this is so popular is that HeatWave has been designed from the ground up for scalability, performance, and optimized for the underlying gear, which is the underlying cloud platform. As a result, it offers a very good price performance compared to any of
the service we have run against. So not only is it providing the benefits of having a single database,
no change to the application, but also it is extremely fast and low priced. And that's because
a lot of technology innovations we did, like almost like over a decade, to build this scale-out system for analytic processing, which has been optimized for the underlying cloud commodity gear.
So help me understand, is HeatWave a effectively re-engineering of MySQL? Is it a completely separate layer that exists distinct from an existing MySQL database, or is it something else entirely?
So we started off designing HeatWave separately as something ground up,
which came out of many years of research
and advanced development.
And once we knew that we could scale up HeatWave
for analytic processing,
and it is very well optimized
for the underlying hardware and such,
then we did the work of enhancing the MySQL database
so that it could be integrated, right?
So yes, it started off as a standalone effort
from the ground up so that we didn't have
to live any constraints of any existing code base
so we could design it and optimize it, right,
from the ground up to be the best possible.
But then we integrated this thing
with the MySQL database so that the customers
can use it without requiring any change to the application in terms of the semantics or any
new syntax, right? So there's absolutely no new syntax and no change to the semantics for existing
MySQL applications. So it gives you best of both worlds. So this has frequently been described in the context of a competitor to very, again, forgive the Amazonian focus.
That's where I spend most of my time, usually complaining about things.
But it's been positioned in some ways as a competitor to things such as RDS or Aurora, as well as Redshift or Snowflake, if we're stepping slightly outside that ecosystem. The challenge that I keep running into very often is that when I talk to customers using
those systems, and yes, those systems invariably show up on the bill as one of the big numbers,
regardless of how you slice it, it feels like their use case for each of those is very different.
It feels very much like half of those are aimed at purely transactional and half of them are aimed at
the data warehousing story, the large amounts of data for analytics queries.
And my default knee-jerk reaction whenever someone says, ah, we built a thing that does
both of those super well, it's, yeah, I've heard this before.
It was the HP multifunction printer where it does three things, none of them well.
And no one has a multifunction printer that they liked three things, none of them well. And no one has a
multifunction printer that they liked for the longest time because it's moving parts and
computers and the devil in equal measure. And it's, okay, so you're trying to build something
that stands between two worlds, but it's easy to come away with the conclusion as a result that
it's not the best of breed for either use case, but rather a series of trade-offs or compromises
that are made to enable both use cases.
I get the sense that that is not your impression of what you've built.
Correct.
And I'll give you a data point for that.
And the data point is...
Yay, data.
I love that, as opposed to your opinion's bad, because my opinion's good.
No, no, coming with data is a great approach.
Please continue.
In terms of the customers who are using or adopting MySQL HeatWave,
one of the largest segments of the customers who are migrating their production workloads
from other databases or other services and coming to HeatWave
are AWS customers who are migrating their production workloads from RDS or Aurora
and are going production with MySQL HeatWave.
The fact that the customers are doing that is
an evidence that there is some value to it.
The reasons they are doing it is
absolutely no change to their application.
It is faster, it is cheaper.
Now, in addition, what they find is that many of
these customers were moving their data from
Aurora or RDS into
Redshift or Snowflake for analytics. They don't need to do that. And that's an additional savings
they get. But we have a lot of evidence that existing customers of MySQL-based services,
definitely AWS, but even on other clouds and Aurora are migrating. And that's very encouraging
for us that, hey, we should be doing something right
for customers to want to migrate
their workloads to MySQL HeatWave.
You had a couple of announcements coming out
about what's new and what's coming to HeatWave.
And one of the ones that we're talking about today
is the idea of elasticity.
Something you just said reminds me
of a couple of years ago
when Amazon
had relatively recently brought out Aurora and they said much the same thing of, oh, it's super
elastic. You don't have to take it down to make it bigger. And it's great. Well, you just talked
about people removing data as they migrate somewhere else. And the question I had at the
time was, okay, great. So that's how the database in Biggins. That's great. How does it in Smolin,
does that wind up having that same elastic property? And the response was a very defensive,
well, why would someone ever do that? Data only gets bigger. And it's, yeah, well, you haven't
worked with me in production where I accidentally drop a table now and again, and data does get
smaller. And the answer for the longest time there was elasticity and auto scaling was basically unidirectional because
that's what customers are asking for. Right. So I have to ask, when you say elasticity around
heatwave, is that unidirectional or does it mean that, oh, now there's less data,
so we're going to go back down again? It is bid-directional, so customers can upsize or they can downsize. Now, I have to say
that HeatWave is a highly scalable system. And what that means is that as customers add more
nodes to the cluster, the performance of the system improves almost linearly with the number
of nodes which have been added. So as a result, we have a lot of customers who start with a
cluster size of certain number, and based on the workloads,
they either add nodes or they reduce the number of nodes.
It's a very common operation.
People want to scale up and scale down.
With the real-time elasticity feature we have introduced,
customers can do either operation and with absolutely no downtime.
There's absolutely no time when the cluster is not available
for queries or for DMS, right?
So while the resize operation is going on,
the cluster is fully available
and customers can upsize
to any number of nodes
and downsize to any number of nodes.
As it scales in or scales out,
is that effectively doing
its own internal sharding
and rebalancing of data under the hood, invisible to customers?
Is there something else going on?
How does this work?
Right.
So take the example that customer has, say, four nodes, and they want to add two more nodes.
There are a couple of interesting properties over here.
We have a technique called super partitioning, by which we know exactly which are the blocks of data which have to be populated to the new nodes which have been added.
However, one of the key design points of
our elasticity is that there is no data movement between the nodes.
All the data which has to be populated in the new nodes which are being added,
is fetched from the object store, the OC object store.
As a result, the existing cluster of four nodes
is working as is, queries are working as is
without any degradation in performance.
When the data has been populated to these additional nodes,
the system then starts having the queries execute
on the larger cluster.
So the smaller cluster is available all the time
and the larger cluster is available.
So from a user's perspective, they see absolutely no downtime. And since there is no data movement
happening from the initial four nodes, there is no degradation of the existing queries,
which will be running on the older cluster. It's 2022, and you're announcing enhancements
to a technology. So of course, it is a given that you are now talking as well about machine learning.
Now, in a general sense, whenever someone says that, my immediate instinctive reaction
is to check my wallet in case someone is in the middle of picking my pocket because it
seems like it winds up in some very weird places.
What is machine learning and its applicability to heatwave?
Because generally speaking, when I look at things you can use machine learning for,
the answer is often finding signal from noise in large datasets
and, of course, the ever-popular bias laundering. But I get the sense that
neither one of those is quite what you're talking about here. What monstrosity have you built?
With MySQL heatwave,
customers are bringing in more data from either consolidating multiple MySQL databases into one, bringing workloads from other databases into MySQL.
But the volume of data which now customers are putting into MySQL HeatWave is growing because they want to run transaction processing analytics all together in one database. Now, as the size of the data is growing, we are finding that many customers want to extract the data or
currently need to extract the data out of
the MySQL database to run machine learning processing.
Some of the very large customers of MySQL HeatWave have been
using HeatWave very successfully
for transaction processing and analytics,
but they had to extract the data out
to some other ecosystem, to some other service
for machine learning processing.
With the announcement we have made,
which is HeatWave ML,
we are now providing in database support
for machine learning,
meaning that customers of MySQL HeatWave
can do training,
inference, as well as explanations,
all inside MySQL HeatWave without the data
or the model ever having to leave MySQL.
This is something which is fairly unique.
Apart from the Oracle database,
I'm not aware of any other database which provides
in-database machine learning capabilities,
and certainly not as rich, right?
Which is very efficient training, inference, and explanations.
And all models which are created by HeatWave ML inside MySQL HeatWave can be explained,
which is a pretty important capability which enterprise customers like to have.
This episode is sponsored in part by our friends at Sysdig.
Sysdig is the solution for securing DevOps.
They have a blog post that went up recently about how an insecure AWS Lambda function
could be used as a pivot point to get access into your environment.
They've also gone deep in depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's s-y-s-d-i-g dot com. My thanks to them for their continued support of this ridiculous nonsense. what does this wind up empowering customers to do? Do you have an example or two? Just because it's easy to talk about this stuff
in the abstract as far as,
oh, it would theoretically let someone do X, Y, or Z.
But the problem I found, generally speaking,
in the world of machine learning
is that it is challenging to articulate it
in a way that people hear the story and think,
hey, that looks like something I might want to do,
as opposed to the common stories are,
well, if you have a world-spanning data set and want to do this, this, the common stories are, well, if you have a
world-spanning data set and want to do this, this, and this, like, well, I don't, and I don't, and I
don't, and I don't. So what value is it to me? What capabilities does it unlock? Right. So with the
introduction of HeatWave, what we had said is that customers don't need multiple databases,
one for transactional processing, one for analytics. They can do both transactional
processing and analytics with one database, right? That's what we started off with.
Now the same thing holds true for machine learning.
Current customers of most databases need to extract data
out of the database for doing machine learning.
And we are saying, hey,
whether it's now OLTP, analytics, mixed workloads,
or machine learning,
your data can all be inside MySQL, MySQL HeatWave, and you can do all the processing with that service.
Now, the kind of capabilities customers like to have for machine learning, training is the most important one.
And training is a very time-consuming operation.
And typically, when customers do training and they're using some other service, it's time-consuming and it is very expensive as well.
One of the very interesting properties here is that when you're
running machine learning inside HeatWave,
you don't need to provision any additional cluster,
or you don't need to have any custom gear.
This machine learning training is happening on
the same cluster which the user has provisioned for analytics or for transaction processing.
So on the same hardware, on the same cluster,
now they can run machine learning processing.
So the kind of use case which you are asking is
when customers have this data, and I'll walk you through an example.
Take the case of credit card.
If a bank wants to determine whether they want to deny someone a credit card or approve
it, it's based on some characteristics.
Many of the times people use a rule-based mechanism, but now with data-driven approaches,
people want to look at a lot of data and the system makes a recommendation that, yes, this
person is appropriate for granting the loan or not.
And this is something for which customers or the enterprises want to have rich models,
which accurately provide a characterization of the data so that they can make the right predictions.
So training is very important because you want to get the training be done right on the data
because it influences the quality of the predictions which are being made.
And once a prediction is made, there may be reasons, like there could be regulatory compliance reasons,
because of which the enterprise may need to offer an explanation that why was the credit card denied,
just to kind of make sure that there wasn't any bias or unfairness.
And that's where machine learning explanation capabilities are also very helpful.
So this is an example where someone goes for applying for a credit card, whether it's
rejected or approved.
Another example is that when someone is making a call, like a marketing team is making a
call, and the system wants to predict that will a call lead to a successful outcome or not, right?
That's another example.
So machine learning is being used pretty now extensively,
and one of the advantages of a database is a database is where there's a lot of data.
So it's a very, very good opportunity to harness this data using machine learning
because machine learning is really tied to the richness of data and to the amount of data someone has.
That makes a lot of sense.
It definitely shines a light at, if not the easy answer for a lot of those questions,
directions that people are going to have a better time of mapping to their specific use cases.
One that I think is easier for everyone to map to a specific use case is another component of what you folks are announcing, which is cost reduction, which is, to be direct, not something people generally think
of Oracle as the first example of a company that's like, ah, that's the thing that's going to cost me
less money. And to be clear, I have no problem with that. I pride myself on absolutely not being
the least expensive answer to basically anything. But it is an interesting direction to go in. There are a few
ways you can wind up saving folks money. Which path have you folks taken? Now, there are multiple
ways in which we can reduce the cost for the customer. So one thing to realize is MySQL
customers are very cost sensitive. And in the previous benchmarks and results we have shown,
we have shown that compared to other Windows,
we are significantly faster,
the heatwave is significantly faster and significantly cheaper.
We had a class of customers come to us saying,
hey, you know what, can you trade off
some performance for even lower cost?
The way we have done is the following.
We have doubled the amount of
data which can be processed on a heatwave node.
Heatwave is an in-memory system.
The size of the cluster depends upon
the amount of data which is being processed,
and it depends upon the amount of
data which can be processed per node.
If you double the amount of data that can be processed per node,
it means that now customers need a cluster half the size
compared to what they were doing in the past, which reduces their cost by half. Now, please note,
when they're running on a cluster half the size, the amount of time it takes to run the same query
will double. So what it means is the system is providing the same price performance because
half the cost at double the time. But it's a choice the customers
have. If they still want to get the same performance slice earlier, they can continue to run on the
larger cluster, but now they have a choice. So in a way, we are providing an even lower entry point
for customers. That's the first part of cost savings. And it makes sense because with a lot
of the workloads you see where it's nice to be able to run analytics on the same type of data, you don't need the same level of responsiveness on a lot of those queries either.
So we're trying to get an answer to this giant analytics query.
Okay, so great.
How quickly do you need it?
What transactions are measured in fractions of a second?
The answer to analytics queries is, well, Tuesday would be nice.
We'd like it by Tuesday if you can find a way to pull that off. So there's no reason to pay for near-line rate speeds
if you don't need it for a lot of those queries,
which is absolutely going to be an interesting option for folks.
Now, you said there was a second aspect as well.
Yes.
And the second aspect is, again, for analytics, right?
Customers want to run the queries.
They want to run it occasionally.
They don't want to run it all the time.
So what we are now introducing is a feature called Pause and Resume.
What it does is that if you're not using the cluster,
you can pause and the system makes a copy of
the data and all the metadata associated with
the data in a backup and when the user wants,
they can resume and fetch the data,
which is still in the in-memory representation,
and all the metadata associated with
Autopilot,
and just start resuming.
So this is another way by which customers, when they're
not using the cluster for some duration
of time, they can pause it,
and for the duration they pause it, they're not
being charged.
I am a big believer of the number
one step of cloud economics is like,
oh, should I buy some reservations or lock in a long
term contract? No, you should turn things off when you're not using them. And people look at you
strange as in, what? You can turn things off? And yes, you absolutely can, which makes people feel
better about generally not doing it. But again, customer behaviors are usually ones that make
sense in their context. I just look at it from a billing perspective, and it seems a little weird.
I like the option, particularly for things that are either non-production or only going to be relevant
to production during certain time windows. There are a number of areas where that begins
to make an awful lot of sense, and people would do it if it didn't require backing up the database,
destroying the cluster, then reprovisioning the database, restoring the cluster, and yet people
don't generally have weeks to spend on spin-up and spin-down. Yes. In fact, that's a very, very good observation, Corey. I want to say that many of our
customers who are running the production workloads on HeatWave, they also have a test environment.
And exactly on the lines of what you said, that they want to have a copy of this data in the test
environment should something bad happen, but they don't want the cluster on all the time. They just
want it for some duration of time. And for them, this pause and resume would be a very good idea and also save them money. So something which we have seen with
many of our customers. The last component of your announcement is one that I approach with a
significant amount of skepticism. Because every time I start drifting in this direction, one thing
is for certain that it's that I'm going to get yelled at on the internet. I'm referring, of course, to benchmarking. Now, Oracle historically has been a company that
prefers people not benchmark and publish results of those benchmarks, and it's backcoding into
the mists of history. And the argument has always been that people don't generally tend to benchmark
database workloads appropriately due to a series of misunderstandings. And let's be clear, this stuff is complicated.
And a number of companies in the space love to talk about their benchmarks are great.
And when you look into it, it's okay, those numbers are great.
And you sort of know that the benchmarks that didn't perform so well are not the ones that
they're talking about.
And then their competitor immediately winds up chiming in where it's, ah, they're talking about. And then their competitor immediately winds up chiming in, where it's, ah, they're doing it wrong, because when you do these other benchmarks,
our solution winds up being better. And it winds up in a nerd slap fight that no one,
even the participants, particularly enjoy. What makes your benchmarks interesting is that
you talk through not just what the benchmark results are, because, of course, that's the entire point.
You're also putting the benchmark methodology and tooling up on GitHub
where people can grab it and run it themselves
and see for yourself is the entire approach.
That is, how do I put this politely,
that is atypical of large companies in general
and Oracle in particular.
What changed?
Right.
So there are three things over here, Corey.
The first thing is, as we talked about, MySQL is the most popular open source database in
the world.
Pretty much all cloud vendors, they have some version of MySQL which they're offering as
a managed service.
And in many cases, they're enhancing MySQL and then offering their service.
In the context of MySQL,
it becomes very important for us to give
the opportunity to our customers for them to
compare which service is better for their needs.
It is more important in the context of MySQL,
since everyone is offering it and some of them have
derivatives that we provide
some mechanism for people to compare.
That's the trust for having a benchmark.
That's the first point.
Second thing is, when you want to compare the performance of the cost of these various
flavors, instead of us coming with our own, say, workloads, which we see from customers,
it's good to have a well-published benchmark or well-understood benchmark so that people
can say, okay, you know what?
Based on TPC-H, what is the performance? Or on TPC-DS, what is the performance? In some cases, when a benchmark
isn't available, what we have done is for machine learning, we have used a bunch of open datasets,
and based on those open datasets, we are publishing the benchmarks to say, hey,
we are so much faster or so much cheaper. And then the third aspect is in terms of why we are making them all available in
GitHub or open source, that these benchmarks are a starting point, but customers will have workloads
which are different from these benchmarks. So we want to provide the opportunity for the customers
to first look at what is our methodology, what have we used to come up with these numbers,
so A, they can reproduce them, But B, if their workloads are different,
they can enhance or augment these benchmarks
in the way they would like
and then run them to see how do they compare, right?
So we want to be fully transparent
about what we have done,
how we have done,
and let customers decide on their own
which is going to be the best platform
from a cost perspective,
from a performance perspective.
So this is the reason why we have chosen to benchmark and get
Omnic available all of our scripts in the open source.
One of the things I think I admire the most about that is
I've always viewed benchmarks as being borderline worthless because I do not care
in the slightest how your system performs on
hand-selected ratings on sample data that you
provide, whereas I care everything for how the system performs with my workloads and my data
sets. So unless I am talking to someone who is effectively a neutral third-party benchmark
source, in which case they are immediately attacked for being shills for one company or
another and sometimes both or neither at the same time because people are terrible.
But seeing how it runs on my workloads and with my constraints is the important and valuable thing.
And this is the easiest I can ever see it being for getting a good representative feel for exactly
how different offerings are going to perform under the specific conditions that my production environment lives within. Because it's me, we're talking about the specific
conditions of my production environment are, of course, terrifying.
So I want to point out, yes, one is the fact that we have made these benchmarks methodology
like, you know, very transparent. But the second aspect of that is what we talked about last time,
which is MySQL autopilot. This is machine learning
based automation, data-driven based automation. So we are very actively working on making it easy
for customers to not have to do any configuration changes or optimizations that the system
determines based on the queries, based on the workloads, how to best tune the system.
So we're working on both angles. One is to make the system more intelligent so that based on the queries, based on the workloads, how to best tune the system, right? So we are working in both the angles.
One is to make the system more intelligent
so that based on the workloads,
the system can optimize for the user's workload,
and then B, making our approach very transparent
so that customers can compare for themselves.
So we are very, very aware of this.
And again, for MySQL customers,
for many of these open source customers,
simplicity is very important.
And we are working hard to make it simpler and transparent to our users.
I really want to thank you for taking me on a tour of what you're announcing today.
Now, so let me ask one of the forbidden questions.
What's on the roadmap?
What's coming that customers can look forward to?
So one of the things which we are working on is that there has been a very good reception
of the heatwave capabilities you've introduced.
So MySQL heatwave is one of the fastest growing services
in the Oracle Cloud.
But there has been a lot of interest in customers
who have been asking us to provide similar capabilities
on AWS.
So this is something which we are working on.
It's in the roadmap.
And please stay tuned for more news around this. You can bet that I will. I really want to thank
you for taking the time out of your day to basically suffer my slings and arrows and also
spend time teaching what amounts to a remedial database course to a moron. But thank you once
again for being as generous with your time as you always are.
Well, thank you, Corey. It's always a pleasure to come and talk at your show. Thank you again for the opportunity.
Always. Nipun Agarwal, SVP at Oracle in charge of MySQL, U-SQL, and HeatWave. I'm cloud economist
Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice.
Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice
and explain how databases always fail your personal benchmark of doing a select star on a terabyte of data at once.
If your AWS bill keeps rising
and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill
by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business
and we get to the point.
Visit duckbillgroup.com to get started.
This has been a HumblePod production.
Stay humble. this has been a humble pod production stay humble