Screaming in the Cloud - Data Protection the AWS Way with Wayne Duso and Nancy Wang
Episode Date: February 28, 2023Wayne Duso, VP of Storage, Edge and Data Governance Services at AWS and Nancy Wang, GM of AWS Data Protection, both join Corey on Screaming in the Cloud to discuss data protection and analysi...s at AWS. Wayne and Nancy describe how AWS Backup has scaled to protect over 90% of the data stored on AWS today. Nancy explains how her team specializes in helping AWS customers develop custom solutions for their specific data needs, and the way that AWS has built out new tools and services to accommodate that customization. Wayne also reveals how important data analysis is to the AWS team when it comes to improving services and developing ground-breaking new innovations. About WayneProfessionally, Wayne is a Vice President at Amazon Web Services (AWS) where he leads a set of businesses delivering cloud infrastructure services. In 2013, he founded and continues to lead the AWS Boston regional development center. Wayne is an always-curious entrepreneur who is passionate about building innovative teams and businesses that deliver highly disruptive value to customers. He loves engaging people who build and deliver customer-obsessed solutions, as well as customers wanting to realize value from those solutions. Wayne also holds over 40 patents in distributed and highly-available computer systems, digital video processing, and file systems. Personally, Wayne is a proud dad to great people, and loves to cook and grow things, it relaxes and grounds him, and he cherishes finding adventure in the ordinary as well as the extraordinary.About NancyNancy is a product & engineering executive, advisor, and investor with significant experience in cloud computing, cybersecurity, and SaaS. Nancy advises Fortune 10 companies on accelerating revenue growth, and she advises startups on attracting their first 100K enterprise customers.  Currently, Nancy is the Director of Product & Engineering and General Manager at Amazon Web Services, where she leads P&L, product, engineering, and design for its data protection and data security businesses. Prior to Amazon, she led SaaS product development at Rubrik, the fastest-growing enterprise software unicorn, and built healthdata.gov for the U.S. Department of Health and Human Services. Passionate about growing early-stage startups, Nancy is a Venture Partner for Felicis Ventures, where she works with early-stage data infra and security companies on their product-market fit, market segmentation, and product scaling. Excited to advance more women into technical roles, Nancy is the founder & board chair of Advancing Women in Tech, a global 501(c)(3) nonprofit that has already informed and educated 30,000 Coursera learners worldwide on how to get their first, or next, tech leadership role.  She earned a Bachelors of Applied Science from the University of Pennsylvania, where she serves on the Board of Directors for the UPenn School of Engineering Online. Links Referenced:re:Invent talk with Nancy and Neha: https://www.youtube.com/watch?v=ELSm3WgR8RE
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Tailscale SSH is a new and arguably better way to SSH.
Once you've enabled Tailscale SSH on your server and user devices,
Tailscale takes care of the rest,
so you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves.
Pretty cool, right?
Tailscale gives each device in your network a node key to connect to your VPN, then uses that same key for SSH authorization and encryption.
So basically, you're SSHing the same way you're already managing your network.
So what's the benefit? Well, built-in key rotation,
the ability to manage permissions as code, connectivity between any two devices, and
reduced latency. You can even ask users to re-authenticate SSH connections for that extra
bit of security to keep the compliance folks happy. Try Tailscale now. It's free forever
for personal use. Welcome to a very special in-person edition of
Screaming in the Cloud. I'm Corey Quinn. And while we're breaking norms, we're going to take it one
step further. I have two guests instead of one today, both of whom have been on the show
independently. Wayne Dusso is a VP of Technology at AWS, and Nancy Wang is the GM of Data Protection at AWS.
First, welcome back and congratulations on surviving to reInvent.
Thank you.
So what's been going on in your world since the last time I both
dealt individually with throwing slings and arrows in your direction?
Well, Corey, as you know, we are very active in everything we do.
And just as an example, Nancy's team and the data protection team have had seven amazing
launches that she'll cover in some detail today and some amazing talks that cover the
importance and why those launches have happened.
And the rest of my team, which covers a lot of storage and data protection, data migration, we've had another seven launches to support customers' need for faster, stronger data storage and, more importantly, data that can be shared and utilized more easily.
Would you agree with that assessment or would you wind up basically just contradicting the entirety of everything he just said, which is honestly one of my favorite things to do whenever I talk to Wayne?
Well, you know, it's hard because he tells me what to do, right?
But aside from that, I mean, coming from a data protection angle,
I think all of these launches really just increase the surface area of what my team can do.
So I find it really exciting from that perspective because, I mean, as of this reInvent,
we now protect 17 and counting AWS services,
although my team might debate me on 18 or 17. The point is, right, that the greater surface area we
can cover, the more storage or more data that comes onto AWS, it increases the array of possibilities
that my team has for delivering more functionality and helping customers really derive insights on
top of their data estate. I think Nancy's being a little modest because in the four years that she's owned this service,
it's gone from a very sort of modest service that protected three resources within AWS to today
protecting over 90% of the data in AWS. And for many customers, it covers 100% of their data
estate. And in some of the launches
that she and her team have executed on this week, which I'll leave for her to tell you about,
they're taking data protection in a whole new direction for our customers. That's really
exciting. I do want to call out a couple of things. First and foremost, that when you talk
about data protection, and I remember this from our last conversation, it's not for the same reasons that back when I was living in my data center best life that I
cared about a lot of those things where, well, what if we wind up losing power and the drive
heads crash? That effectively does not happen in any realistic sense within AWS anymore.
So the idea of, oh, if it's always online and highly available, why do you need to protect it?
It's you need to protect it from you, or more specifically, from me, where I wind up inadvertently removing the wrong file or corrupting the wrong bit of data or whatnot.
It's often application issues.
It is stuff on the customer side.
Because when I've talked to this, about the stuff with customers, they wind up talking as if an AWS underlying storage level failure was the thing that they were guarding against.
And it's, well, yeah, on some level from a best practice perspective, but 99 times out of 100 when there's a data loss event that has people scrambling for backups, it's not because you folks have dropped the ball.
It's because, oh, is that how you spell production?
There are three general categories. The first one,
which you've been talking about, or saying it doesn't happen that often, are technical failures.
Technical failures do not happen that often at AWS, at our scale, given our operational practices.
That's our job. But human error comes into play from time to time. And human error can be
somebody fat fingering a keyboard, or it could be an application that somebody has written that makes an error. It just
does something we didn't expect. The third category, and unfortunately a growing category,
is malicious actors. And so no matter how good a job you do with durability and protection of data
using technology, you have to go beyond basic technology
because malicious actors are very smart, clever people, right? We have to make sure we have the
tools to be more clever. It feels like it's an impossible challenge because effectively,
if I can dramatically oversimplify, because I'm good at that, well, we solved the technical
problem, so now we're going to extend this to solve people problems with technology. And it
feels like that's an impossible Herculean task that you never wind up quite being able
to get to.
Well, I think that's the interesting part of our roles, right, is especially, I mean,
best use cases this week at reInvent, all of the briefings and customer meetings that
I've been a part of, it's really around thought partnership, right, which is, A, as we move
our data from on-premises data centers where, let's say, protecting the network and protecting the compute was sufficient, right?
In the cloud where you have, for example, massive data lakes, and we see that with S3, with Redshift, where people are moving massive amounts of data around, right?
They're doing experimentation.
You end up with shadow copies.
You end up with data sprawl.
And so we really have to start from first principles,
which is where is my data?
Where are my crown jewels and sensitive data that I need to protect?
And so helping customers really find all of those sort of PI information
or maybe even credit card information within all of their data
that they have on AWS is really step one.
And then from there, we go into how do you combine all of the industry-leading primitives
around maybe guarding your perimeter around zero trust,
which is what I talked about this week,
of how do you, for example, right-size roles
to least privilege using IAM access analyzer, for example,
also the verifiable permissions
that we just launched at reInvent.
All of those combined, for example, also the verifiable permissions that we just launched at reInvent. All of those combined, right, data protection, as well as all the elements of traditional data
security, that's what we are recommending to customers as best practices for securing their
data on AWS. Yeah, in many ways, Corey, it's no longer one tool to solve one problem, because
there isn't one problem, when human beings are introduced
and when malicious actors are introduced, the problem set just expands exponentially.
Now, the tool set does not need to expand exponentially, but the capabilities, the tools
that we recommend, build, offer a partner need to cover that exponential increase in threat or risk.
And we hear that from partner companies as well.
In fact, I'm going to quote a good friend of mine, Lena, who's a CISO for MongoDB.
She's a fan of trying out different security solutions because, to Wayne's point, it's
not a one-size-fits-all.
Companies these days, they, for example, process their data in different ways. They leverage their data to solve different business challenges. So there are forever new
innovations that are also coming out from our ISV partners that are super exciting. And it's our,
I think, role and responsibility as we architect new products to think about how do we make our
platforms extensible so that ISV partners can also be part of the solution.
I should probably disclaim my own bias on a lot of this. And although I will say you're absolutely right, Lina is delightful. She is probably one of the most impressive CISO types that I've had
the privilege of speaking with. Who also accordingly has good disco moves, but I'll leave that to another
conversation. I would not know that yet. But again, I think that I'm going to start changing up how
the podcast is done. So we're going to learn that real soon now. What I started my entire career as a grumpy
sysadmin on Unix-style systems managing servers, and I was exactly the reckless kind of admin that
you would expect. This teaches you pretty early on to do things like backup and the rest,
but it also winds up keeping me away from the truly sensitive stuff, such as interacting
directly with databases. Other than setting them up and setting up
replication and making sure they're backed up, I wasn't running SQL queries
against anything. I've sort of always had a bit of a blind spot around that. So
when I think about data, I always think of it in terms of economics these days,
obviously, protecting it in the sense of is it backed up? And as far as access
controls, well, it should really just come down to who has access to the environment.
That is an increasingly outmoded way of thinking. And I also have found over the past few years
that a number of our more sophisticated clients have significant billing opportunities in the
data space. It's why we wound up making some of the
hires that we have. It's why we have focused rather deeply on data engineering as a company.
But I am not the person to look into those things. I still feel like I'm coming from behind on
learning how a lot of these things work. The world really has changed. If you have an opportunity
to review the talk that Nancy and Niha gave this week, I strongly
recommend it because a lot of what you're talking about is traditional thought. It's about putting
perimeter around something and hoping that perimeter helps. And perimeters are good. Perimeter
keeps out 98% of the problem. However, you know, 2% of that problem is going to jump over the fence,
right? And once it does, it's going to run around and it's going to create some problems for you, right?
So the whole conversation around zero trust and in their talk, what you'll hear them talk about is the universal statement.
You know, it's basically a mathematical proof that critical resources are accessed by only those who should access them for very specific purposes. You know,
it's closing down, if you would, the notion of public access to an S3 bucket, but it's that
mental model of closing it down for every resource all the time. In zero trust, you know, auth Z,
auth N, all the time. Like, you simply do not trust anyone, whether they're inside
your perimeter or outside your perimeter. And by not thinking about the problem only as a perimeter
problem, but as a point to point problem and verifying that point to point is necessary,
that that connection even needs to exist is part of the talk that they gave. Strongly recommend
that folks take a look at it. And I'm sure Nancy
can expand on that. Perfect. In fact, let me give you a way and a question I specifically would like
you to expand on on this. I tend to spend more of my time these days on the building something out,
getting it up and running side of the world, as opposed to my previous life running systems that
other people who are good at things had built. And my job was to keep the lights on, keep it secure, et cetera.
When I'm building something or improving something
and iterating on it,
when you talk about the idea of absolute least permission,
that means that every failure mode
that you have while developing
is you're smacking into a wall of nope, can't do it,
nope, can't do it, nope, can't do it.
And at some point,
the very human element in development is great.
I'm just going to allow everything for here just so I can stop hitting those walls and I'll go back and fix it later.
Later never comes.
And now that thing's in production, probably at a bank or something, because that's the way the world works out.
How do you avoid that painful friction of feeling like you're being blocked at every turn while trying to either create or expand
something that already exists.
Yeah, so one of the solutions, actually,
we talked about in the talk that Wayne's referencing
is IAM Access Analyzer,
which can actually help you generate least privileged roles
based on your CloudTrail logs, right?
Because that's actually one of the main, I would say,
vectors where actors can get inside your environment
is, for example, stale, stale
roles, right? Or things to your point where, you know what, I give up. I'm just going to open public
access to everything because it's, you know, I'm tired of granting access to specific individuals,
right? And so that's what we really mean by right-sizing. Look at the data, look at who
needs to access that within your environment and who has legitimate reason to access that.
In fact, we're already seeing innovations such as, for example, being able to have access to specific data sets in order to do experiments upon them, process them.
And then once you're finished with that experiment, you actually no longer have access to that data set.
So what we're going to see is the trend of more configurable or more granular policies that are not just role-based,
right? But it's really action-based. And we're starting to see that shift.
And in addition, all of that is true. And if you think about using tools like VPC,
Access Analyzer, IAM Analyzer, and building those tools into your pipelines,
if you then shift left as a developer,
because that's the point you were making,
as a builder, I'm frustrated, right?
So go back, shift left.
There are tools that you can use today
from partners, from others,
where as you're writing code,
it will tell you that there is a security need here.
Like you're not checking auth,
you're not checking access where you should.
And there should at least be a call in there.
Now you can make that call, say,
open it up to the world
so that you can at least write your code.
But then in the pipeline,
using these analyzers,
you can later say,
you know, you need to change this.
So you're right.
It can be frustrating to be stopped at every line of code. But if you shift left, use the right
capabilities to insert the right auth and access calls to make sure that you
are secure. And then sure, open a few things up in your dev environment, right?
But then as it goes through the pipeline, shut it down using these tools. And this
comes all the way back to the universal statement concept. You're talking about gating and higher environments as things progress
through. I like the approach. I tend to not have as many lower environments. Again, everyone has
a test environment. Some people just also have a separate one to run production in. And I find that
when I'm building something out early on in a relatively dual or single environment type of
setup, and you have all of the stuff that comes in that does security alerting and the rest,
it is page after page after page of nonsense that, oh, this is a slightly outdated thing that when
used with a different library has a security problem. Things that do not ever apply to what
I'm doing. And at page 17,
it's like, oh, also you left the oven on. And then it goes on back into other nonsense. It's
finding the signal from the noise there and avoiding the friction and just tuning it out.
It feels like that's almost a delicate art. The world has become a little more complex than it
was back in 1996. And so it's not that we want to make the life of the developer or operator more complex.
The life of the developer and operator became more complex.
We want to actually make it simpler.
And there is a process that it takes to get there.
Coming up with an IAM analyzer, coming up with a VPC analyzer, coming up with other
such tools, being able to shift left and be able to catch these
things at development time. These are trends that are happening. These are tools that are available.
This is where we're going to take the world and it'll become the norm. No longer does anybody
write code to build a link list. That's a library. Only in job interviews. Only in job interviews,
exactly. So these things too too, will become like libraries.
Yeah, and I mean, this year we saw data protection really highlighted as a main stage keynote item.
And this is really kind of the evolution that even just my career I've seen kind of working in the data protection industry, both at AWS and outside of AWS, is before I used to be the storage admins problem, right?
Is, okay, well, they were in charge of, you know, issuing backup policies or restoring data, right?
Now it's become a CISO problem.
It's become a board-level problem.
In fact, according to the NACD, which is a board regulatory association,
it's requiring organizations and companies to prove that they have a data resiliency strategy in place.
And in fact, actually, one of the groups
that my team works with publicly
is the Cross-Market Organizational Resiliency Group.
And it's a group of financial institutions in the UK,
similar to Sheltered Harbor in the US,
where they need to prove to regulators
that their data is immutable, right?
And today, the only way to do that
across all of these 17, 18 plus AWS services like
S3 and EC2 and RDS is really by applying a lock onto your vault. And that's proven to be SEC 17A,
FINRA, and CFTC compliant as of this reinvent. Which is a big step. The idea of, you can make
anything so that you personally cannot get access to it, but people try to write their own cryptography that way. It doesn't go super well either.
Having a trusted third party who also, let's be clear, is the one that evaluates whether you are in compliance or not and will fine you otherwise, state that yes, this qualifies is no small step. Yeah, and we see now this being implemented across the Fortune 500 and
majority of enterprise environments as their protection against, let's just call the 800-pound
gorilla in the room, ransomware. As ransomware threats become more evident and also as ransomware
also evolves in the cloud, we need to really ensure that we have proper protection,
proactive protection in place by locking your vault,
but also by making it easy to do that.
And actually this week, we're also excited to announce
the application-aware sort of protection that you can do,
which is define your stateful resources in a CloudFormation template,
point to that template and stack as an entity,
and be able to protect that
as an entity and restore that as an entity. This episode is sponsored in part by Honeycomb.
I'm not going to dance around the problem. Your engineers are burned out. They're tired
from pagers waking them up at 2am for something that could have waited until after their morning
coffee. Ring ring. Who's there? It's Nagios, the original Call of Duty.
They're fed up with relying on two or three different monitoring tools that still require
them to manually trudge through logs to decipher what might be wrong.
Simply put, there is a better way.
Observability tools like Honeycomb, and very little else because they do admittedly set
the bar, show you the
patterns and outliers of how users experience your code in complex and unpredictable environments
so you can spend less time firefighting and more time innovating. It's great for your business,
great for your engineers, and most importantly, great for your customers.
Try free today at honeycomb.io slash screaming in the cloud. That's honeycomb.io
slash screaming in the cloud. I am very excited about that. You have excellent reasons that you
have built this, and I am not in any way, shape, or form saying that they are not good and valid,
but this is one of those scenarios I worry that you've built a beautiful iPad or something and, oh great, a new hammer is what I'm about to do with it. Because it feels like this is
closer than a lot of things have come to being an easy way for me to wind up migrating an application
from my old omnibus AWS account that has everything in the kitchen sink in it into a dedicated member
account in the AWS organization? Because yes,
you could always apply the CloudFormation template somewhere else, but what about the data contained
within those resources? It feels like you have built a cloud migration story for me from AWS
account to AWS account. Is that a known thing and an intended and designed use case, or have I just
ultimately come up with a solution where you're about to tell me about a different service and tire that does that for me already and I've been asleep at the wheel again?
Well, you can just pull me and Wayne about our philosophy around services. And our philosophy
is to build a centralized platform where it has these capabilities and really customers have the
flexibility of choosing which capabilities they need for their specific use case.
So sure, if you find that being able to define your AWS, a collection of AWS resources as an application stack,
is your way to, for example, failover, migrate your applications from account to account,
maybe you have a dedicated account to your point where it's your, you know, maybe Fort Knox account, right?
Sure, Those are all
use cases that now can be met with this capability. And better yet, right, talking about sort of
central governance is using AWS organizations, you can now delegate that administrator of capability
from the central management account in case, hey, you don't want anyone going into that account,
into member accounts, whereby these member accounts, in case they are, let's say, managing separate organizational units or OUs, can continue
delegating those policies. So you can be sure, hey, across, let's say, you know, my tens of
thousands of accounts, in fact, our largest customer has 50,000 accounts, right? And across
50,000 accounts at that scale, how can I be sure that each account is protecting its
resources the same frequency and way as the 49,000 plus other accounts that I have?
Right. It's always a question of, is this just a scratch account with test data in the end,
or is that the one that has the credit card numbers in it? It's always a tricky balance.
I want to give you peace of mind.
Please do.
I'm going to give it my best shot.
Wait, do you want to give me peace of mind or a piece of your mind? I want to give you the former, not the latter. Excellent. I will give you the of mind. Please do. I'm going to give it my best shot. Wait, do you want to give me peace of mind or a piece of your mind? Either one of those. The former, not the latter. Excellent.
And I will give you the latter anyway. I look forward to it. What you described is not a misuse
of a capability. Because for years, since the dawn of time, since spinning tape, people have
taken backups to do restores on another system.
That has been a way people have migrated data from point A to point B.
So I got out of VPC, easy to do classic into VPCs. Yeah. You take a snap, you stop,
you quest the application, snapshot the RDS instance, restore that snapshot inside of a VPC,
hope it doesn't take super long back in those days, and turn it back on, and
yeah, hopefully you've tested it first. So in this case, if in fact somebody decides to take
this capability and do a backup of their entire application state and restore that application
state somewhere else, it's not an unreasonable way of doing things. However, if that's a live
application, doing a backup and restore is not the same as doing a live migration or a disaster recovery scenario where you're doing block-level, change-level migration of your data from point A to point B.
Those are different problems.
So if your RPO and your RTO is such that an hour is fine, six hours is fine, 12 hours is fine. Okay, this is probably
a reasonable thing. But if your needs are that a second is okay, a minute is pushing it, this is
not that solution. So, and then people start to think of replication as a backup strategy, which
it is not. Which it is not. Yeah. Which it is not. So, people, and Nancy can speak to this very eloquently,
backup, disaster recovery, these are different problems.
They are related.
They're sister problems.
They're not the same problem.
When we first started talking,
I believe you were the general manager of AWS Backup,
and your portfolio has expanded,
and I'm not quite sure where one service starts
and the other one stops.
I mean, these days it seems like the way forward is the elastic disaster recovery offering.
And I don't mean to toot my own horn too loudly, but I'm something of an elastic disaster myself.
Where is the recovery?
We are waiting on that.
Probably after reInvent, maybe, if we're lucky.
But at this point, what is the entry point for all of this?
AWS Backup,
I feel I understand it. It speaks the language that I think of data in. Elastic disaster recovery seems much more encompassing than that, isn't it? So to what Wayne described, if you think about
solutions from a continuum of RPO requirements, right? So there's a reason why, for example,
Gartner and other analysts
grouped together business continuity and resiliency
with backup plus DR, right?
And that's really how we should think about
data protection on AWS
is depending on what your RPO needs are, right?
You should be able to set a central policy
and be able to determine what RPO you need.
So today with Elastic Disaster Recovery, we support, for example, data that's written to disk. And if you're applying an agent onto your EC2 instances and filtering each write to another replica, that's a great solution for you.
With that said, again, if you want to be able to apply that vault lock across all of your 17 plus, 18 plus accounting
AWS services, I'm losing count here, right? AWS backup might be the better solution for you. But
again, it goes back to that flexibility and choice. So where my team comes in is if you're a customer
and you're thinking, hey, I have X bytes level of data, petabytes level of data on AWS that I want to be able to protect,
what should I do? And that's really our sweet spot, is we like being the thought partner to
customers of what are your requirements? What are your use cases that you want to solve for?
Specifically, the compliance frameworks that you are subject to, and let us recommend the
best solutions to you. It's one of those areas that I'm a big fan of
the cloud for, where it's easy to sit here and be overwhelmed by the sheer number of services that
you folks have, but your customers are incredibly and wildly diverse when it comes to what their
applications are, what their businesses do, whether they turn on backups or not, all kinds of other
stuff. But because it's all built on the
same underlying platform, it feels like there are remarkably few disaster recovery approaches
that are truly unique to a specific customer. It feels like all the heavy lifting, all the work
has been done countless times. It's an early experience I had with this is stop trying to
answer auditor questions about AWS
like it's a data center. Just hand them the AWS documentation that hangs out in AWS Artifact and
call it good because it already is written in the language that they're expecting rather than
reinventing a wheel badly. Well, I'd love to make your day by saying we actually have a
functionality within AWS Data Protection called AWS Backup Audit Manager that is designed to generate reports that you can hand over to your auditors.
I love that.
It also winds up speaking to the truism, which is this idea that, honestly, no one cares about backup.
Absolutely no one.
They don't.
They care a lot about restores directly after they really should have cared slightly more about backups.
I mean, the people who are the most fanatical about backups, the ones who've lost data,
I count myself on that list.
But again, the entire theme of this year's reInvent has been about data, both in terms
of accessing it and using it and making sure the right parties have access to it and leveraging
it to add value rather than previous years and previous stories,
which I felt a lot more like, you've got a lot of data. You probably don't know what it is,
but that's right. You should sit your butt down on top of it like a sad, greedy dragon and never
let anything happen to it. Now it's, okay, what are you actually going to do? You go to that sad
dragon and you have a pile of gold. Why don't you try investing it somewhere? And it feels like that is really how it is starting to play out narratively.
Well, data, we've talked about this before, Corey, but second only to people in a business,
data is their most important asset. Nancy refers to it as the crown jewels. Find out within your
data sets, what are your crown jewels, and make sure that you
understand where they are, what they are, are they properly being protected. For those who need to
share them, going to our auth and access piece, make sure those folks can share in that data.
Make sure they can get the value out of that data. The balance is protection and use.
You want to make sure that you protect your resources, but not
constrain their appropriate use. And everything that Nancy and Nihat talked about really refers
to that. Do not constrain the use of the data inappropriately by being scared. Constrain the
use of the data to only those who need it, so you don't have to be scared. Yeah, if you can't access the
data, you effectively don't have the data. Effectively, you have a valuable resource
that's locked up and not doing any good for your business. One of the challenges that we do see,
though, is that no matter what it is that we're talking about here, the world is getting bigger.
Scale is increasing for almost everything. And it's long gone are the days where
you have three to maybe four web servers. Now you have so many things that are emitting telemetry
at stupendous volume. And so much of the data that is being collected is fundamentally useless
on some level. But there's still not a lot of awareness of what the data that people have is.
It becomes a data swamp where it's, well, we have a whole bunch of load balancer logs
that are operationally useful for a little while,
but you don't need them from six years ago,
combined with stuff that, oh yeah, we need to keep those
in case the auditor comes knocking for seven years.
And people just wind up thinking about it
as one discrete entity
and don't even know where to start unpacking it all.
Let me tell you a story,
because I'm going to agree with you
and disagree with you at the same time.
You used the example of operational logs. AWS collects a tremendous number of data points
every second across all of our services. And especially given the scale we have,
that really is expansive. It may seem like a lot of things to keep around for no reason.
However, those logs end up in an operational data lake.
Those logs end up being analyzed. Whether predictive analytics or ML models, it depends
on the group, it depends on the size of the logs, it depends on the need. We're able to innovate,
invent and innovate brand new features and capabilities, often at no cost to the customer, because of those operational
logs, because a human can't possibly understand the usage patterns for all of these customers.
But when you do the analytics, patterns become clear. Value becomes clear. Opportunity
becomes glaring. And when we look at things like 10x improvement in this and 12x improvement in
that, where do you think these come from? That we're brilliant, we just figured out magically
how to remove three lines of code? Well, maybe, sometimes. But often it's from this analysis that
says, we actually have more capability in our current product that we could hand to the customer
or reduce our cost. This is the value of data.
This is the value of these operational logs.
And often it's not because you keep them for six months.
It sometimes is because you have them for seven years because that's where you find the patterns.
Yeah, I wish I could, until this place,
I've never been anywhere even near that long
at the same place.
Like, oh, seven years from now,
it's the best kind of problem.
Someone else's.
Yeah, I don't have that luxury anymore.
I have to start thinking a bit more long-term on that.
I really want to thank both of you
for being so generous with your time during reInvent,
where I'm sure you have an infinite number of things
you could be doing that are better than this.
Thank you so much for being so generous.
Of course.
Thanks for having us, Corey.
Corey, it's always a pleasure.
It really is.
Wayne Dusso, VP at AWS.
Nancy Wang, General Manager of Data Protection at AWS.
And I'm cloud economist, Corey Quinn.
If you've enjoyed this episode,
please leave a five-star review
on your podcast platform of choice.
Whereas if you hated this podcast,
please leave a five-star review
on your podcast platform of choice,
along with a comment telling me
exactly what you think about me,
and that's okay because I'm going to lose the data by mistake.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group
works for you, not AWS. We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started.