Screaming in the Cloud - The Complexities of AWS Cost Optimization with Rick Ochs
Episode Date: December 1, 2022About RickRick is the Product Leader of the AWS Optimization team. He previously led the cloud optimization product organization at Turbonomic, and previously was the Microsoft Azure Resource... Optimization program owner.Links Referenced:AWS: https://console.aws.amazon.comLinkedIn: https://www.linkedin.com/in/rick-ochs-06469833/Twitter: https://twitter.com/rickyo1138
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud. vendor due to proprietary data collection, querying, and visualization. Modern-day containerized
environments require a new kind
of observability technology that
accounts for the massive increase in scale
and attendant cost of data.
With Chronosphere, choose where
and how your data is routed and stored,
query it easily, and get better
context and control.
100% open-source compatibility means
that no matter what your setup is,
they can help. Learn how Chronosphere provides complete and real-time insight to ECS, EKS,
and your microservices, wherever they may be, at snark.cloud slash chronosphere. That's snark.cloud
slash chronosphere. This episode is brought to you in part by our friends at Veeam.
Do you care about backups?
Of course you don't. Nobody cares about backups. Stop lying to yourselves. You care about restores,
usually right after you didn't care enough about backups. If you're tired of the vulnerabilities,
costs, and slow recoveries when using snapshots to restore your data, assuming that you even have them at all, living in AWS
land, there's an alternative for you. Check out Veeam. That's V-E-E-A-M for secure, zero-fuss
AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances
with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous
podcast. Welcome to Screaming in the Cloud. I'm Corey Quinn. For those of you who've been
listening to this show for a while, a theme has probably emerged, and that is that one of the
key values of this show is to give the guest a chance to tell their story. It doesn't beat the guest up about how they
approach things. It doesn't call them out for being completely wrong on things. Because honestly,
I'm pretty good at choosing guests and I don't bring people on that are, you know,
walking trash fires. And that is certainly not a concern for this episode. But this might devolve
into a screaming loud argument despite my best effort. Today,
I'm joined by Rick Oaks, Principal Product Manager at AWS. Rick, thank you for coming back on the
show. The last time we spoke, you were not here. You were at, I believe it was Turbonomic.
Yeah, that's right. Thanks for having me on the show, Corey. I'm really excited to talk to you
about optimization and my current role and what we're doing.
Well, let's start at the beginning. Principal Product Manager. It sounds like one of those
corporate titles that can mean a different thing in every company or every team that you're talking
to. What is your area of responsibility? Where do you start and where do you stop?
Awesome. So I am the product manager lead for all of AWS optimizations team.
So I lead the product team that includes several other product managers that focus in on compute
optimizer, cost explorer, right sizing recommendations, as well as reservation and savings plan purchase
recommendations. In other words, you are the person who effectively oversees all of the
AWS cost optimization tooling and approaches to same? Yeah. Give or take. I mean, you could argue
that, oh, every team winds up focusing on helping customers save money. I could fight that argument
just as effectively, but you effectively start and stop with respect to helping customers save money or
understand where the money is going on their AWS bill. I think that's a fair statement. And I also
agree with your comment that I think a lot of service teams do think through those use cases
and provide capabilities. You know, there's like S3 Storage Lens, you know, there's all sorts of
other products that do offer optimization capabilities as well. But as far as the unified
purpose of my team, it is unilaterally focused on how do we help customers safely reduce their spend
and not hurt their business at the same time. Safely being the key word. For those who are
unaware of my day job, I am a partial owner of the Duck Bill Group, a consultancy where we fix
exactly one problem,
the horrifying AWS bill. This is all that I've been doing for the last six years.
So I have some opinions on AWS bill reduction as well. So this is going to be a fun episode for the two of us to wind up more or less smacking each other around, but politely,
because we are both professionals. So let's start at the very high level. How does
AWS think about AWS bills from a customer perspective? You talk about optimizing it,
but what does that mean to you? Yeah. So, I mean, there's a lot of ways to think about it,
especially depending on who I'm talking to, where they sit in an organization.
I would say I think about optimization in four major themes. The first is how do you scale correctly, whether that's
right sizing or architecting things to scale in and out. The second thing I would say is
how do you do pricing and discounting, whether that's reservation management,
savings plan management, coverage, how do you handle the
expenditures of prepayments and things like that. Then I would say suspension. What that means is
turn the lights off when you leave the room. We have a lot of customers that do this, and I think
there's a lot of opportunity for more. Turning EC2 instances off when they're not needed, if
they're non-production workloads or other sort of stateful services that charge by the hour. I think there's a lot of opportunity there. And then the last of
the four methods is cleanup. And I think it's maybe one of the lowest hanging fruit, but essentially,
are you done using this thing? Delete it. And there's a whole opportunity of cleaning up,
you know, IP addresses, unattached EBS volumes, sort of these resources that hang around in AWS
accounts that sort of get lost that hang around in AWS accounts that
sort of get lost and forgotten as well. So those are the four kind of major thematic strategies
for how to optimize a cloud environment that we think about and spend a lot of time working on.
I feel like there's, at least the way that I approach these things, that there are a number
of different levels you can look at AWS billing constructs on.
The way that I tend to structure most of my engagements when I'm working with clients is
we come in and step one, cool. Why do you care about the AWS bill? It's a weird question to ask
because most of the engineering folks look at me like I've just grown a second head. Like,
so why do you care about your AWS bill? They're like, what, why do you?
You run a company doing this.
It's, no, no, no.
It's not that I'm being rhetorical
and I don't, or I'm trying to be clever somehow
and pretend that I don't understand
all of the nuances around this,
but why does your business care
about lowering the AWS bill?
Because very often the answer is,
is they kind of don't.
What they care about from a business perspective is being able to accurately attribute costs for the service or good that they provide, being able to predict what that spend is going to be. And also, yes, a sense of being good stewards of the money that has been entrusted to them by via investors, public markets, or the budget allocation process of their companies,
and make sure that they're not doing foolish things with it. And that makes an awful lot of
sense. It is rare at the corporate level that the stated number one concern is make the bill lower.
Because at that point, well, easy enough. Let's just turn off everything you're running in
production. You'll save a lot of money on your AWS bill. You won't be in business anymore,
but you'll be saving a lot of money on the AWS bill. The answer is always deceptively nuanced and complicated.
At least, that's how I see it. Let's also be clear that I talk with a relatively narrow subset of the
AWS customer totality. The things that I do are very much intentionally things that do not scale.
Definitionally, everything that you do has to scale. How do you wind up approaching this in ways that will work for customers spending billions versus independent learners who are
paying for this out of their own personal pocket? It's not easy. Let me just preface that. The team
we have is incredible. And we spend so much time thinking about scale and the different personas that engage with our products and what their experience is when they interact with a bill or AWS platform at large.
There's also a couple of different personas here, right? the cloud bill, the finance, whether that's if an organization has created a FinOps organization,
if they have a cloud center of excellence versus an engineering team that maybe has started to go
towards decentralized IT and has some accountability for the spend that they attribute to their AWS
bill. And so these different personas interact with us in really different ways where CostExplorer,
you know, downloading the curve and taking a look at
the bill. And one thing that I always kind of imagine is somebody putting a headlamp on and
going into the caves in the depths of their AWS bill and kind of like spelunking through their
bill sometimes. Right. And so you have these FinOps folks and billing and people that are
deeply interested in making sure that the spend they do have meets their business goals.
Meaning this is providing high value to our company. It's providing high value to our customers.
We're spending on the right things. We're spending the right amount on the right things.
Versus the engineering organization that's like, hey, how do we configure these resources? What types of instances should we be focused on using? What services should we be building on top of that maybe are more flexible
for our business needs? And so there's really like two major personas that I spend a lot of time,
our organization spends a lot of time wrapping our heads around because they're really different.
We very different approaches to how we think about cost because you're right. If you just
wanted to lower your AWS bill, it's really easy. Just size everything to a T2 nano and you're done. Move on, right? T3 or T4 nano, depending upon whether regional
availability is going to save you less. I'm still better at this. Let's not kid ourselves. I kid.
For sure. So T4 nano, absolutely. T4G, remember, now the way forward is everything has this
explicit letter designator to define which processor company made the CPU that underpins the instance itself.
Because that's a level of abstraction we certainly wouldn't want the cloud provider to take away from us, honey.
Absolutely. And actually, the performance differences of those different processor models can be pretty incredible.
So there's huge decisions behind all of that as well.
Oh, yeah. There's so many factors
that factor into all these things. It's gotten to a point of, you see this usually with lawyers and
very senior engineers, but the answer to almost everything is it depends. There are always going
to be edge cases. Easy example of if you check a box and enable an S3 gateway endpoint inside of
a private subnet.
Suddenly, you're not passing traffic through a 4.5 cent per gigabyte managed NAT gateway.
It's being sent over that endpoint for no additional cost whatsoever.
Check the box, save a bunch of money.
But there are scenarios where you don't want to do it. Always double-checking and talking to customers about this is critically important.
Just because the first time you make a recommendation that does not work for their constraints, you lose trust
and make a few of those. And it looks like you're more or less just making naive recommendations
that don't add any value and they learn to ignore you. So down the road, when you make a really high
value, great recommendation for them, they stop paying attention. Absolutely. And we have that really high bar for recommendation accuracy, especially with
right sizing. That's such a key one. Although I guess savings plan purchase recommendations can
be critical as well. If a customer over commits on the amount of savings plan purchase they need
to make, right, that's a really big problem for them. So recommendation accuracy must be above
reproach. Essentially, if a customer takes
a recommendation and it breaks an application, they're probably never going to take another
right-sizing recommendation again. And so this bar of trust must be exceptionally high.
That's also why out of the box, the compute optimizer recommendations can be a little bit
mild. They're a little tame. Because the first order of business is do no harm, focus on the performance requirement of the application first, because we have to make sure that the reason you
build these workloads in AWS is served. Now, ideally, we do that without overspending and
without over provisioning the capacity of these workloads, right? And so, for example, like if we
make these right sizing recommendations from Compute Optimizer, we're taking a look at the utilization of CPU, memory, disk network, throughput, IOPS, and we're vending these recommendations to customers.
And when you take that recommendation, you must still have great application performance for your business to be served.
It's such a crucial part of how we optimize and run long term because optimization
is not a one-time band-aid it's an ongoing behavior so it's really critical that for
that accuracy to be exceptionally high so we can build business process on top of it as well
let me ask you this how do you contextualize what the right approach to optimization is. What is your entire...
There are certain tools that you have,
you, I mean, of course, as an organization,
have repeatedly gone back to in different approaches
that don't seem to deviate all that much
from year to year and customer to customer.
How do you think about the general things
that apply universally?
So we know that EC2 is a very popular service for us. We know
that sizing EC2 is difficult. We think about that optimization pillar of scaling. It's an obvious
area for us to help customers. We run into this sort of industry-wide experience where whenever
somebody picks the size of a resource, they're going to pick one generally larger than they need. It's almost like asking a new employee to your company, hey, pick your laptop. We have a
16 gig model or a 32 gig model. Which one do you want? That person making the decision on capacity,
hardware capacity, they're always going to pick the 32 gig model laptop, right? And so we have
this sort of human nature in IT of, we don't want to get
called at two in the morning for performance issues. We don't want our apps to fall over.
We want them to run really well. So we're going to size things very conservatively,
and we're going to oversize things. So we can help customers by providing those recommendations to
say, you can size things up in a different way way using math and analytics based on the utilization
patterns.
And we can provide and pick different instance types.
There's hundreds and hundreds of instance types in all of these regions across the globe.
How do you know which is the right one for every single resource you have?
It's a very, very hard problem to solve.
And it's not something that is lucrative to solve one by one. If you have 100
EC2 instances, trying to pick the correct size for each and every one can take hours and hours of
IT engineering resources to look at utilization graphs, look at all of the instance types
available, look at what is the performance difference between processor models and
providers of those processors? Is there application compatibility constraints that I have to consider?
The complexity is astronomical. And then not only that, as soon as you make that sizing decision,
one week later, it's out of date and you need a different size. So you didn't really solve the
problem. So we have to programmatically use data science and math to say, based on these
utilization values, these are the sizes that would
make sense for your business that would have the lowest cost and the highest performance together
at the same time. And it's super important that we provide this capability from a technology
standpoint, because it would cost so much money to try to solve that problem that the savings you
would achieve might not be meaningful. Then at the same time, you know, that's really
from an engineering perspective. But when we talk to the FinOps, the finance folks,
the conversations are more about reservations and savings plans. How do we correctly apply
savings plans and reservations across a high percentage of our portfolio to reduce the costs
on those workloads, but not so much that dynamic capacity levels in our organization mean we
all of a sudden have a bunch of unused reservations or savings plans.
And so a lot of organizations that engage with us and we have conversations with, we
start with the reservation and savings plan conversation because it's much easier to click
a few buttons and buy a savings plan than to go institute an entire right-sizing campaign
across multiple engineering teams.
That can be very difficult, a much higher bar.
So some companies are ready to dive
into the engineering task of sizing.
Some are not there yet.
And they're maybe a little earlier
in their FinOps journey
or the building optimization technology stacks
or achieving higher value out of their cloud environment.
So starting with kind of the low-hanging can, it can vary depending on the company,
size of company, technical aptitude, skill sets, all sorts of things like that. And so those finance
focused teams are definitely spending more time looking at and studying what are the best practices
for purchasing savings plans, covering my environment, getting the most out of my dollar that way, then they don't have to engage the engineering teams. They can kind of
take a nice chunk off the top of their bill and sort of have something to show for that amount
of effort. So there's a lot of different approaches to start in on optimization.
My philosophy runs somewhat counter this because everything you're saying does work globally. It's
safe. It's non-threatening. And it also really on some level feels like it is
an approach that can be driven forward by finance or business. Whereas my worldview is it cost and
architecture and cloud are one in the same. And there are architectural consequences of cost
decisions and vice versa that can be adjusted and addressed.
Like one of my favorite party tricks, although I admit it's a weird party,
is I can look at the exploded PDF view of a customer's AWS bill and describe their architecture
to them. And people have questioned that a few times. And now I have a testimonial on my client
website that mentions it was weird how he was able to do this. Yeah, it's real. I can do it. And it's not a skill I would recommend cultivating for most people.
But it does also mean that I think I'm onto something here
where there's always context that needs to be applied.
It feels like there's an entire ecosystem of product companies out there
trying to build what amount to a better cost explorer
that also is not free the way that
cost explorer is. So the challenge I see there is they all tend to look more or less the same.
There is very little differentiation in that space. And in the fullness of time,
cost explorer does ideally get better. How do you think about it?
Absolutely. And if you're looking at ways to understand your bill, there's obviously Cost Explorer,
the CUR.
That's a very common approach is to take the CUR and put a BI front end on top of it.
That's a common experience.
A lot of companies that have chops in that space will do that themselves instead of purchasing
a third party product that does do bill breakdown and dissemination. There's also
the cross-charge, showback, organizational breakdown and boundaries because you have
these super large organizations that have fiefdoms. You have HR IT and sales IT and product IT. You
have all these different IT departments that are fiefdoms within your AWS bill and construct,
whether they have different AWS accounts or
say different AWS organizations sometimes, right?
It can get extremely complicated.
And some organizations require the ability to break down their bill based on those organizational
boundaries.
Maybe tagging works, maybe it doesn't.
Maybe they do that by using a third-party product that lets them set custom scopes on
their resources based on organizational boundaries.
That's a common approach as well.
We do also have our first-party solutions that can do that, like the Kudos dashboard
as well.
It's something that's really popular and highly used across our customer base.
It allows you to have a dashboard and customizable view of your AWS costs and kind of split it up based on tag,
organizational value, account name, things like that as well. So you mentioned you feel like the
architectural and cost problem is the same problem. I really don't disagree with that at all. I think
what it comes down to is some organizations are prepared to tackle the architectural element of cost and some are not.
And it really comes down to how does the customer view their bill?
Is it somebody in the finance organization looking at the bill?
Is it somebody in the engineering organization looking at the bill?
Ideally, it would be both. Ideally, you would have some of those skill sets that overlap, or you would have an organization that does focus in on FinOps or cloud operations
as it relates to cost. But then at the same time, there are organizations that are like,
hey, we need to go to cloud. Our CIO told us, go to cloud. We don't want to
pay the least renewal on this building. There's a lot of reasons why customers move to cloud.
A lot of great reasons, right?
Three major reasons you move to cloud.
Several terrible ones.
Yeah, and some not so great ones too.
So there's so many different dynamics
that get exposed when customers engage with us
that they might or might not be ready to engage
on the architectural element
of how to build hyperscale systems.
So many of these customers
are bringing legacy workloads and applications to the cloud and something like a re-architecture to
use stateless resources or something like spot that's just not possible for them. So how can
they take 20% off the top of their bill? Savings plans or reservations are kind of that easy,
low-hanging fruit answer to just say
we know these are fairly static environments that don't change a whole lot that are going to exist
for some amount of time their legacy you know we can't turn them off it doesn't make sense to
rewrite these applications because they just don't change they don't have high business value or
something like that and so the architecture part of that conversation doesn't always come into play.
Should it? Yes. The long-term maturity and approach for cloud optimization does absolutely
account for architecture, thinking strategically about how you do scaling, what services you're
using. Are you going down the Kubernetes path, which I know you're going to laugh about, but
how do you take these applications and componentize them?
What services are using to do that?
How do you get that long-term scale and manageability out of those environments?
Like you said at the beginning, the complexity is staggering and there's no one unified answer.
That's why there's so many different entrance paths into how do I optimize my AWS bill?
There's no one answer.
And every customer I talk to has a different comfort level and appetite.
And some of them have tried suspension.
Some of them have gone heavy down savings plans.
Some of them want to dabble in rightsizing.
So every customer is different.
We want to provide those capabilities for all of those different customers that have
different appetites or comfort levels with each
of these approaches. This episode is sponsored in part by our friends at Redis, the company behind
the incredibly popular open source database. If you're tired of managing open source Redis on your
own, or if you're looking to go beyond just caching and unlocking your data's full potential,
these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process,
deliver, and store data. To learn more from the experts in Redis how to be real-time,
right now, from anywhere, visit snark.cloud slash redis. That's snark.cloud slash r-E-D-I-S. And I think that's very fair.
I think that it is not necessarily a bad thing
that you wind up presenting a lot of these options to customers.
But there are some rough edges.
An example of this is something I encountered myself somewhat recently
and put on Twitter because I have those kinds of problems.
Where originally I remember this,
that you were able to buy hourly savings plans,
which again,
savings plans are great.
No knock there.
I would wish that they applied to more services there rather than,
Oh,
SageMaker is going to do its own savings plan.
No,
stop keeping me from going from something where I have to manage myself on
EC2 to something you manage for me, and making that cost money.
You've nailed it with Fargate.
You've nailed it with Lambda.
Please just have one unified savings plan thing.
I digress.
But you had a limit once upon a time
of $1,000 per hour.
Now it's $5,000 per hour,
which I believe in a three-year all up front
means you will cheerfully add $130 million purchase
to your shopping cart.
And I kept adding a bunch of them and then had a little over a billion dollars, a single button click away
from being charged to my account. Let me begin with what's up with that. Thank you for the tweet,
by the way, Corey. Always sort of ruin your month, Rick. You know that. Yeah, fantastic. We took that
tweet, you know, it was tongue in cheek, but also it was a serious opportunity for us to ask the question of what does happen. And it's something we did ask internally
and have some fun conversations about. I can tell you that if you click purchase, it would have been
declined. So you would have not been... American Express would have had a problem with that. But
the question is, would you have attempted to charge American Express or would something
internally have gone, this has a few too many commas for us to wind up presenting it to the card issuer with a straight face.
Right. So it wouldn't have gone through. And I can tell you that if your account was on a PO-based
configuration, it would have gone to the account team and it would have gone through our standard
process for having a conversation with our customer there. That being said, it's an awesome
opportunity for us to examine what is that shopping cart experience.
We did increase the limit. You're right.
And we increased the limit for a lot of reasons that we sat down and worked through.
But at the same time, there's always an opportunity for improvement of our product and experience.
We want to make sure that it's really easy and lightweight to use our products, especially purchasing savings plans.
Savings plans are already kind of wrought with mental concern and risk of purchasing something so expensive and large that has a big impact on your AWS bill. So we don't really want
to add any more friction necessarily to the process, but we do want to build an awareness
and make sure customers understand, hey, you're purchasing this. This has a pretty big impact.
And so we're also looking at other ways we can kind of improve the ability for the savings plans chopping cart experience to ensure customers don't put themselves in a
position where you have to unwind or make phone calls and say oops right we
want to avoid those sort of situations for our customers so we are looking at
quite a few additional improvements to that experience as well that I'm really
excited about that I probably can't share here, but stay tuned. I am looking forward to it. I will say the counterpoint to that is having
worked with customers who do make large eight-figure purchases at once, there's a psychology
element that plays into it. Everyone is very scared to click the button on the buy it now thing or the approve it. So what I've often found is at
that scale, one, you can reduce what you're buying, buy half of it, and then see how that treats you,
and then continue to iterate forward rather than doing it all at once. Or reach out to your account
team and have them orchestrate the buy. In previous engagements, I had a customer do this
religiously, and at one point, the concierge team bought the wrong thing in the wrong region. And from my perspective,
I would much rather have AWS apologize for that and fix it on their end than for us having to go
over the customer side of, oh crap, oh crap, please be nice to us. Not that I doubt you would
do it, but that's not the nervous conversation I want to have in quite the same way. It just
seems odd to me that someone would want to make that scale of purchase without
ever talking to a human.
I mean, I get it.
I'm as antisocial as they come some days.
But for that kind of money, I kind of just want another human being to validate that
I'm not making a giant mistake.
We love that.
That's such a tremendous opportunity for us to engage and discuss with an organization
that's going to make a large commitment that here's the impact, here's how we can help,
how does it align to our strategy? We also do recommend from a strategic perspective,
those more incremental purchases. I think it creates a better experience long-term when
you don't have a single savings plan that's going to expire on a specific day that all of a sudden increases your entire bill by a significant percentage.
So making staggered monthly purchases makes a lot of sense.
And it also works better for incremental growth.
Right. If your organization is growing five percent month over month or year over year or something like that, you can purchase those incremental savings plans that sort of stack up on top of each other. And then you don't have that risk of a cliff one day where one super large SP expires and boom,
you have to scramble and repurchase within minutes because every minute that goes by is an additional
expense, right? That's not a great experience. And so that's really a large part of why those
staggered purchase experiences make a lot of sense. That being said, a lot of companies do
their math and their finance in different ways. And single large purchases make sense to go
through their process and their rigor as well. So we try to support both types of purchasing patterns.
I think that that is an underappreciated aspect of cloud cost savings and cloud
cost optimization, where it is much more about humans than it is about math. I see this most notably when I'm helping customers negotiate
their AWS contracts with AWS, where they're often perspectives such as, well, we feel like we really
got screwed over last time, so we want to stick it to them and make them give us a bigger percentage
discount on something. And it's like, look, you can do that,
but I would much rather, if it were me,
go for something that moves the needle
on your actual business
and empowers you to move faster, more effectively,
and lead to an outcome that is a positive for everyone
versus the, well, we're just going to be difficult
in this one point
because they were difficult on something last time.
But ego's a thing. Human psychology is never going to have an API for it. And again,
customers get to decide their own destiny in some cases. I completely agree. I've actually
experienced that. So this is the third company I've been working at on cloud optimization. I
spent several years at Microsoft running the optimization program. I went to Turbonomic for several years, building out the right sizing and savings plan reservation purchase capabilities there.
And now here at AWS and through all of these journeys and experiences working with companies to help optimize their cloud spend, needle, moving the needle is significantly harder than the technology stack of sizing something
correctly or deleting something that's unused. We can solve the technology part. We can build
great products that identify opportunities to save money. There's still this psychological
component of IT for the last several decades has gone through this maturity curve of if it's not broken,
don't touch it.
Five, nine, six sigma, all of these methods of IT sort of rationalizing do no harm, don't
touch anything, everything must be up.
And it even kind of goes back several decades back when if you rebooted a physical server,
the motherboard capacitors would pop, right?
So there's even this anti or the stigma against even rebooting servers sometimes.
And the cloud really does away with a lot of that stuff because we have live migration
and we have all of these sort of stateless designs and capabilities, but we still carry
along with us this mentality of don't touch it.
It might fall over and we have to really get past that. And
that means that the trust, we went back to the trust conversation where we talk about the
recommendations must be incredibly accurate. You're risking your job in some cases. If you
are a DevOps engineer and your commitments on your yearly goals are uptime, latency response time,
load time, these sorts of things, these operational metrics,
KPIs that you use, you don't want to take a downsized recommendation.
It has a severe risk of harming your job and your bonus.
These instances are idle.
Turn them off.
It's like, yeah, these instances are the backup site or the DR environment or something that
takes very bursty but occasional traffic.
And yeah, I know it costs
us some money, but here's the revenue figures for having that thing available. Like, oh yeah,
maybe we should shut up and not make dumb recommendations around things is the human
response. But computers don't have that context. Absolutely. And so the accuracy and trust
component has to be the highest bar we meet for any optimization activity or behavior.
We have to circumvent or supersede the human aversion, the risk aversion that IT is built on,
right? Oh, absolutely. And let's be clear, we see this all the time where I'm talking to customers
and they have been burned before because we tried to save money. And then we took a production outage as a side effect of a change that we made.
And now we're not allowed to try to save money anymore.
And there's a hidden truth in there, which is auto-scaling is something that a lot of
customers talk about, but very few have instrumented true auto-scaling.
Because they interpret as we can scale up to meet demand.
Because yeah, if you don't do that, you're dropping customers on the floor.
Well, what about scaling back down again?
And the answer there is like,
yeah, that's not really a priority
because it's just money.
We're not disappointing customers,
causing brand reputation,
and we're still able to take people's money
when that happens.
It's only money, we can fix it later.
COVID was shining a real light on a lot of this stuff,
just because there are customers that we've spoken to
whose, their user traffic dropped off a cliff,
infrastructure spend remained constant day over day. And yeah, they believe genuinely they were
auto-scaling. The most interesting lies are the ones that customers tell to themselves, but
the bill speaks. So getting a lot of modernization traction from things like that was really neat to
watch. But customers, I don't think necessarily intuitively understand
most aspects of their bill because it is a multidisciplinary problem. It's engineering,
it's finance, it's accounting, which is not the same thing as finance. And you need all three of
those constituencies to be able to communicate effectively using a shared and common language.
It feels like we're marriage counseling between engineering and finance most weeks.
Absolutely, we are. And it's important we get it right, that the data is
accurate, that the recommendations we provide are trustworthy. If the finance team gets their hands
on the savings potential they see out of right-sizing, takes it to engineering, and then
engineering comes back and says, no, no, no, we can't actually do that. We can't actually size
those, right? We have problems.
And they're cultural, they're transformational.
Organizations' appetite for these things varies greatly.
And so it's important we address that problem from all of those angles.
And it's not easy to do.
How big do you find the optimization problem is when you talk to customers?
How focused are they on it?
I have my answers, but that's the scale of anecdata.
I want to hear your actual answer.
Yeah.
So we talk with a lot of customers that are very interested in optimization, and we're
very interested in helping them on the journey towards having an optimal estate.
There are so many nuances and barriers, most of them psychological,
like we already talked about. I think there's this opportunity for us to go do better exposing
the potential of what an optimal AWS estate would look like from a dollar and savings perspective.
And so I think it's kind of not well understood. I think it's one of the biggest areas or barriers of companies really attacking the optimization problem with more vigor is if they knew that the potential savings they could achieve out of their AWS environment would really align their spend much more closely with the business value they get.
I think everybody would go bonkers.
And so I'm really excited about us making progress on exposing that capability or the total savings potential and amount is something we're looking into doing in a much more obvious way.
And we're really excited about customers doing that on AWS where they know they can trust AWS to get the best value for their cloud spend, that it's a long-term good bet because their resources
that they're using on AWS are all focused on giving business value. And that's the whole key.
How can we align the dollars to the business value, right? And I think optimization is that
connection between those two concepts. Companies are generally not going to greenlight a project whose sole job is
to save money unless there's something very urgent going on. What will happen is as they iterate
forward in the next generation of services or migration of a service from one thing to another,
they will make design decisions that benefit those optimizations. There's low-hanging fruit we can
find, usually
of the form, turn that thing off or configure this thing slightly differently. That doesn't
take a lot of engineering effort in place, but on some level, it is not worth the engineering
effort it takes to do an optimization project. We've all met those engineers, speaking as one
of them myself, who left to our own devices. We'll spend two months just knocking a few hundred bucks a month
off of our AWS developer environment.
We steal more than that in office supplies.
I'm not entirely sure what the business value of doing that is in most cases.
For me, yes, okay, this is, things that work in small environments
work very well in large environments, generally speaking.
So I learn how to save 80 cents here,
and that's a few million bucks a month somewhere else. Most folks don't have that benefit happening. So it's
a question of meeting them where they are. Absolutely. And I think the skill component
is huge, which you just touched on. When you're talking about 100 EC2 instances versus 1000,
optimization becomes kind of a different component of how you
manage that AWS environment. And while single decision recommendations to scale an individual
server, the dollar amount might be different. The percentages are just about the same. When you look
at what is it to be sized correctly, what is it to be configured correctly? And so it really does come
down to priority. And so it's really important to really support all of those companies of all
different sizes and industries, because they will have different experiences on AWS. And some will
have more sensitivity to cost than others,
but all of them want to get great business value out of their AWS spent.
And so as long as we're meeting that need and we're supporting our customers to
make sure they understand the commitment we have to ensuring that their AWS
spend is valuable, it is meaningful, right?
They're not spending money on things that are not adding value.
That's really important to us.
I do want to have as a last topic of discussion here,
how AWS views optimization,
where there have been a number of repeated statements
where helping customers optimize their cloud spend
is extremely important to us.
And I'm trying to figure out where that falls
on the spectrum from,
it's a thing we say because they make us say it, but no, we're here to milk them like cows
all the way on over to, no, no, we passionately believe in this at every level, top to bottom in
every company. We're just bad at it. So I'm trying to understand how that winds up being expressed
from your lived experience, having solved this problem first outside and then
inside? Yeah. So it's kind of like part of my personal story. It's the main reason I joined
AWS. And when you go through the interview loops and you talk to the leaders of an organization
you're thinking about joining, they always stop at the end of the interview and ask,
do you have any questions for us? And I asked that question to pretty much every single person I interviewed with, like, what is AWS's appetite
for helping customers save money? Because like, from a business perspective, it kind of is a
little bit wonky, right? But the answers were varied, and all of them were customer obsessed
and passionate. And I got this sense that my personal passion for helping companies have better efficiency of their IT
resources was an absolute primary goal of AWS and a big element of Amazon's leadership principle,
be customer obsessed. Now, I'm not a spokesperson, so we'll see. But we are deeply interested in
making sure our customers have a great long-term experience and a high trust relationship.
And so when I ask these questions in these interviews, the answers were all about we have to do the right thing for the customer.
It's imperative.
It's also in our DNA.
It's one of the most important leadership principles we have to be customer obsessed. And it is the primary reason why I joined, because of that answer to that question.
Because it's so important that we achieve a better efficiency for our IT resources,
not just for like AWS, but for our planet.
If we can reduce consumption patterns and usage across the planet for how we use data
centers and all the power
that goes into them. We can talk about meaningful reductions of greenhouse gas emissions, the cost
and energy needed to run IT business applications. And not only that, but most all new technology
that's developed in the world seems to come out of a data center these days. We have a real
opportunity to make a material impact to how much resource we use to a data center these days. We have a real opportunity to make a material
impact to how much resource we use to build and use these things. And I think we owe it to the
planet, to humanity. And I think Amazon takes that really seriously. And I'm really excited
to be here because of that. As I recall, and feel free to make sure that this comment never
sees the light of day, you asked me before interviewing for the role and then deciding to accept it,
what I thought about you working there and whether I would recommend it, whether I wouldn't.
And I think my answer was fairly nuanced.
And you're working there now and we still are on speaking terms.
So people can probably guess what my comments took the shape of, generally speaking.
But I have to ask now,
it's been what, a year since you joined? Almost. I think it's been about eight months.
Time during a pandemic is always strange, but I have to ask, did I steer you wrong?
No, definitely not. I'm very happy to be here. The opportunity to help such a broad
range of companies get more value out of technology.
And it's not just cost, right?
Like we talked about, it's actually not about the dollar number going down on a bill.
It's about getting more value and moving the needle on
how do we efficiently use technology to solve business needs.
And that's been my career goal for a really long time. I've been working on optimization for like seven or eight, I don't know, maybe even
nine years now. And it's like this strange passion for me, this combination of my dad taught me how
to be a really good steward of money and a great budget manager, and then my passion for technology.
So it's this really cool combination of like childhood life skills that really came together for me to create a career that I'm really passionate about.
And this move to AWS has been such a tremendous way to supercharge my ability to scale my personal mission and really align it to AWS's broader mission of helping companies achieve more with cloud platforms.
Right.
And so it's been a really nice eight months.
It's been wild.
Learning the AWS culture has been wild.
It's a sharp, divergent culture from where I have been in the past.
But it's also really cool to experience the leadership principles in action.
They're not just things we put on a website.
They're actually things people talk about every day.
And so that journey has been
humbling and a great learning opportunity as well. If people want to learn more,
where's the best place to find you? Oh yeah. Contact me on LinkedIn or Twitter.
My Twitter account is at Rickio1138. Let me know if you get the 1138 reference. That's a fun one.
THX1138, who doesn't?
Yeah, there you go.
And it's hidden in almost every single
George Lucas movie as well.
You can contact me on any of those
social media platforms
and I'd be happy to engage
with anybody that's interested in optimization,
cloud technology,
bill, anything like that.
Or even not, even anything else either.
Thank you so much for being so generous with your time. I really appreciate it.
My pleasure, Corey. It was wonderful talking to you.
Rick Oaks, Principal Product Manager at AWS. I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star
review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast
platform of choice, along with an angry comment rightly pointing out that while AWS is great and
all, Azure is far more cost-effective for your workloads because given their lax security,
it is trivially easy to just run your workloads in someone else's account.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started.
This has been a humble pod production stay humble