Screaming in the Cloud - Data Protection the AWS Way with Wayne Duso and Nancy Wang

Episode Date: February 28, 2023

Wayne Duso, VP of Storage, Edge and Data Governance Services at AWS and Nancy Wang, GM of AWS Data Protection, both join Corey on Screaming in the Cloud to discuss data protection and analysi...s at AWS. Wayne and Nancy describe how AWS Backup has scaled to protect over 90% of the data stored on AWS today. Nancy explains how her team specializes in helping AWS customers develop custom solutions for their specific data needs, and the way that AWS has built out new tools and services to accommodate that customization. Wayne also reveals how important data analysis is to the AWS team when it comes to improving services and developing ground-breaking new innovations. About WayneProfessionally, Wayne is a Vice President at Amazon Web Services (AWS) where he leads a set of businesses delivering cloud infrastructure services. In 2013, he founded and continues to lead the AWS Boston regional development center. Wayne is an always-curious entrepreneur who is passionate about building innovative teams and businesses that deliver highly disruptive value to customers. He loves engaging people who build and deliver customer-obsessed solutions, as well as customers wanting to realize value from those solutions. Wayne also holds over 40 patents in distributed and highly-available computer systems, digital video processing, and file systems. Personally, Wayne is a proud dad to great people, and loves to cook and grow things, it relaxes and grounds him, and he cherishes finding adventure in the ordinary as well as the extraordinary.About NancyNancy is a product & engineering executive, advisor, and investor with significant experience in cloud computing, cybersecurity, and SaaS. Nancy advises Fortune 10 companies on accelerating revenue growth, and she advises startups on attracting their first 100K enterprise customers.  Currently, Nancy is the Director of Product & Engineering and General Manager at Amazon Web Services, where she leads P&L, product, engineering, and design for its data protection and data security businesses. Prior to Amazon, she led SaaS product development at Rubrik, the fastest-growing enterprise software unicorn, and built healthdata.gov for the U.S. Department of Health and Human Services. Passionate about growing early-stage startups, Nancy is a Venture Partner for Felicis Ventures, where she works with early-stage data infra and security companies on their product-market fit, market segmentation, and product scaling. Excited to advance more women into technical roles, Nancy is the founder & board chair of Advancing Women in Tech, a global 501(c)(3) nonprofit that has already informed and educated 30,000 Coursera learners worldwide on how to get their first, or next, tech leadership role.  She earned a Bachelors of Applied Science from the University of Pennsylvania, where she serves on the Board of Directors for the UPenn School of Engineering Online. Links Referenced:re:Invent talk with Nancy and Neha: https://www.youtube.com/watch?v=ELSm3WgR8RE

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. Tailscale SSH is a new and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices,
Starting point is 00:00:40 Tailscale takes care of the rest, so you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, then uses that same key for SSH authorization and encryption. So basically, you're SSHing the same way you're already managing your network. So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra
Starting point is 00:01:16 bit of security to keep the compliance folks happy. Try Tailscale now. It's free forever for personal use. Welcome to a very special in-person edition of Screaming in the Cloud. I'm Corey Quinn. And while we're breaking norms, we're going to take it one step further. I have two guests instead of one today, both of whom have been on the show independently. Wayne Dusso is a VP of Technology at AWS, and Nancy Wang is the GM of Data Protection at AWS. First, welcome back and congratulations on surviving to reInvent. Thank you. So what's been going on in your world since the last time I both
Starting point is 00:01:59 dealt individually with throwing slings and arrows in your direction? Well, Corey, as you know, we are very active in everything we do. And just as an example, Nancy's team and the data protection team have had seven amazing launches that she'll cover in some detail today and some amazing talks that cover the importance and why those launches have happened. And the rest of my team, which covers a lot of storage and data protection, data migration, we've had another seven launches to support customers' need for faster, stronger data storage and, more importantly, data that can be shared and utilized more easily. Would you agree with that assessment or would you wind up basically just contradicting the entirety of everything he just said, which is honestly one of my favorite things to do whenever I talk to Wayne? Well, you know, it's hard because he tells me what to do, right?
Starting point is 00:02:50 But aside from that, I mean, coming from a data protection angle, I think all of these launches really just increase the surface area of what my team can do. So I find it really exciting from that perspective because, I mean, as of this reInvent, we now protect 17 and counting AWS services, although my team might debate me on 18 or 17. The point is, right, that the greater surface area we can cover, the more storage or more data that comes onto AWS, it increases the array of possibilities that my team has for delivering more functionality and helping customers really derive insights on top of their data estate. I think Nancy's being a little modest because in the four years that she's owned this service,
Starting point is 00:03:30 it's gone from a very sort of modest service that protected three resources within AWS to today protecting over 90% of the data in AWS. And for many customers, it covers 100% of their data estate. And in some of the launches that she and her team have executed on this week, which I'll leave for her to tell you about, they're taking data protection in a whole new direction for our customers. That's really exciting. I do want to call out a couple of things. First and foremost, that when you talk about data protection, and I remember this from our last conversation, it's not for the same reasons that back when I was living in my data center best life that I cared about a lot of those things where, well, what if we wind up losing power and the drive
Starting point is 00:04:13 heads crash? That effectively does not happen in any realistic sense within AWS anymore. So the idea of, oh, if it's always online and highly available, why do you need to protect it? It's you need to protect it from you, or more specifically, from me, where I wind up inadvertently removing the wrong file or corrupting the wrong bit of data or whatnot. It's often application issues. It is stuff on the customer side. Because when I've talked to this, about the stuff with customers, they wind up talking as if an AWS underlying storage level failure was the thing that they were guarding against. And it's, well, yeah, on some level from a best practice perspective, but 99 times out of 100 when there's a data loss event that has people scrambling for backups, it's not because you folks have dropped the ball. It's because, oh, is that how you spell production?
Starting point is 00:05:04 There are three general categories. The first one, which you've been talking about, or saying it doesn't happen that often, are technical failures. Technical failures do not happen that often at AWS, at our scale, given our operational practices. That's our job. But human error comes into play from time to time. And human error can be somebody fat fingering a keyboard, or it could be an application that somebody has written that makes an error. It just does something we didn't expect. The third category, and unfortunately a growing category, is malicious actors. And so no matter how good a job you do with durability and protection of data using technology, you have to go beyond basic technology
Starting point is 00:05:45 because malicious actors are very smart, clever people, right? We have to make sure we have the tools to be more clever. It feels like it's an impossible challenge because effectively, if I can dramatically oversimplify, because I'm good at that, well, we solved the technical problem, so now we're going to extend this to solve people problems with technology. And it feels like that's an impossible Herculean task that you never wind up quite being able to get to. Well, I think that's the interesting part of our roles, right, is especially, I mean, best use cases this week at reInvent, all of the briefings and customer meetings that
Starting point is 00:06:19 I've been a part of, it's really around thought partnership, right, which is, A, as we move our data from on-premises data centers where, let's say, protecting the network and protecting the compute was sufficient, right? In the cloud where you have, for example, massive data lakes, and we see that with S3, with Redshift, where people are moving massive amounts of data around, right? They're doing experimentation. You end up with shadow copies. You end up with data sprawl. And so we really have to start from first principles, which is where is my data?
Starting point is 00:06:48 Where are my crown jewels and sensitive data that I need to protect? And so helping customers really find all of those sort of PI information or maybe even credit card information within all of their data that they have on AWS is really step one. And then from there, we go into how do you combine all of the industry-leading primitives around maybe guarding your perimeter around zero trust, which is what I talked about this week, of how do you, for example, right-size roles
Starting point is 00:07:17 to least privilege using IAM access analyzer, for example, also the verifiable permissions that we just launched at reInvent. All of those combined, for example, also the verifiable permissions that we just launched at reInvent. All of those combined, right, data protection, as well as all the elements of traditional data security, that's what we are recommending to customers as best practices for securing their data on AWS. Yeah, in many ways, Corey, it's no longer one tool to solve one problem, because there isn't one problem, when human beings are introduced and when malicious actors are introduced, the problem set just expands exponentially.
Starting point is 00:07:52 Now, the tool set does not need to expand exponentially, but the capabilities, the tools that we recommend, build, offer a partner need to cover that exponential increase in threat or risk. And we hear that from partner companies as well. In fact, I'm going to quote a good friend of mine, Lena, who's a CISO for MongoDB. She's a fan of trying out different security solutions because, to Wayne's point, it's not a one-size-fits-all. Companies these days, they, for example, process their data in different ways. They leverage their data to solve different business challenges. So there are forever new innovations that are also coming out from our ISV partners that are super exciting. And it's our,
Starting point is 00:08:35 I think, role and responsibility as we architect new products to think about how do we make our platforms extensible so that ISV partners can also be part of the solution. I should probably disclaim my own bias on a lot of this. And although I will say you're absolutely right, Lina is delightful. She is probably one of the most impressive CISO types that I've had the privilege of speaking with. Who also accordingly has good disco moves, but I'll leave that to another conversation. I would not know that yet. But again, I think that I'm going to start changing up how the podcast is done. So we're going to learn that real soon now. What I started my entire career as a grumpy sysadmin on Unix-style systems managing servers, and I was exactly the reckless kind of admin that you would expect. This teaches you pretty early on to do things like backup and the rest,
Starting point is 00:09:19 but it also winds up keeping me away from the truly sensitive stuff, such as interacting directly with databases. Other than setting them up and setting up replication and making sure they're backed up, I wasn't running SQL queries against anything. I've sort of always had a bit of a blind spot around that. So when I think about data, I always think of it in terms of economics these days, obviously, protecting it in the sense of is it backed up? And as far as access controls, well, it should really just come down to who has access to the environment. That is an increasingly outmoded way of thinking. And I also have found over the past few years
Starting point is 00:09:55 that a number of our more sophisticated clients have significant billing opportunities in the data space. It's why we wound up making some of the hires that we have. It's why we have focused rather deeply on data engineering as a company. But I am not the person to look into those things. I still feel like I'm coming from behind on learning how a lot of these things work. The world really has changed. If you have an opportunity to review the talk that Nancy and Niha gave this week, I strongly recommend it because a lot of what you're talking about is traditional thought. It's about putting perimeter around something and hoping that perimeter helps. And perimeters are good. Perimeter
Starting point is 00:10:36 keeps out 98% of the problem. However, you know, 2% of that problem is going to jump over the fence, right? And once it does, it's going to run around and it's going to create some problems for you, right? So the whole conversation around zero trust and in their talk, what you'll hear them talk about is the universal statement. You know, it's basically a mathematical proof that critical resources are accessed by only those who should access them for very specific purposes. You know, it's closing down, if you would, the notion of public access to an S3 bucket, but it's that mental model of closing it down for every resource all the time. In zero trust, you know, auth Z, auth N, all the time. Like, you simply do not trust anyone, whether they're inside your perimeter or outside your perimeter. And by not thinking about the problem only as a perimeter
Starting point is 00:11:30 problem, but as a point to point problem and verifying that point to point is necessary, that that connection even needs to exist is part of the talk that they gave. Strongly recommend that folks take a look at it. And I'm sure Nancy can expand on that. Perfect. In fact, let me give you a way and a question I specifically would like you to expand on on this. I tend to spend more of my time these days on the building something out, getting it up and running side of the world, as opposed to my previous life running systems that other people who are good at things had built. And my job was to keep the lights on, keep it secure, et cetera. When I'm building something or improving something
Starting point is 00:12:08 and iterating on it, when you talk about the idea of absolute least permission, that means that every failure mode that you have while developing is you're smacking into a wall of nope, can't do it, nope, can't do it, nope, can't do it. And at some point, the very human element in development is great.
Starting point is 00:12:24 I'm just going to allow everything for here just so I can stop hitting those walls and I'll go back and fix it later. Later never comes. And now that thing's in production, probably at a bank or something, because that's the way the world works out. How do you avoid that painful friction of feeling like you're being blocked at every turn while trying to either create or expand something that already exists. Yeah, so one of the solutions, actually, we talked about in the talk that Wayne's referencing is IAM Access Analyzer,
Starting point is 00:12:53 which can actually help you generate least privileged roles based on your CloudTrail logs, right? Because that's actually one of the main, I would say, vectors where actors can get inside your environment is, for example, stale, stale roles, right? Or things to your point where, you know what, I give up. I'm just going to open public access to everything because it's, you know, I'm tired of granting access to specific individuals, right? And so that's what we really mean by right-sizing. Look at the data, look at who
Starting point is 00:13:20 needs to access that within your environment and who has legitimate reason to access that. In fact, we're already seeing innovations such as, for example, being able to have access to specific data sets in order to do experiments upon them, process them. And then once you're finished with that experiment, you actually no longer have access to that data set. So what we're going to see is the trend of more configurable or more granular policies that are not just role-based, right? But it's really action-based. And we're starting to see that shift. And in addition, all of that is true. And if you think about using tools like VPC, Access Analyzer, IAM Analyzer, and building those tools into your pipelines, if you then shift left as a developer,
Starting point is 00:14:07 because that's the point you were making, as a builder, I'm frustrated, right? So go back, shift left. There are tools that you can use today from partners, from others, where as you're writing code, it will tell you that there is a security need here. Like you're not checking auth,
Starting point is 00:14:25 you're not checking access where you should. And there should at least be a call in there. Now you can make that call, say, open it up to the world so that you can at least write your code. But then in the pipeline, using these analyzers, you can later say,
Starting point is 00:14:40 you know, you need to change this. So you're right. It can be frustrating to be stopped at every line of code. But if you shift left, use the right capabilities to insert the right auth and access calls to make sure that you are secure. And then sure, open a few things up in your dev environment, right? But then as it goes through the pipeline, shut it down using these tools. And this comes all the way back to the universal statement concept. You're talking about gating and higher environments as things progress through. I like the approach. I tend to not have as many lower environments. Again, everyone has
Starting point is 00:15:14 a test environment. Some people just also have a separate one to run production in. And I find that when I'm building something out early on in a relatively dual or single environment type of setup, and you have all of the stuff that comes in that does security alerting and the rest, it is page after page after page of nonsense that, oh, this is a slightly outdated thing that when used with a different library has a security problem. Things that do not ever apply to what I'm doing. And at page 17, it's like, oh, also you left the oven on. And then it goes on back into other nonsense. It's finding the signal from the noise there and avoiding the friction and just tuning it out.
Starting point is 00:15:54 It feels like that's almost a delicate art. The world has become a little more complex than it was back in 1996. And so it's not that we want to make the life of the developer or operator more complex. The life of the developer and operator became more complex. We want to actually make it simpler. And there is a process that it takes to get there. Coming up with an IAM analyzer, coming up with a VPC analyzer, coming up with other such tools, being able to shift left and be able to catch these things at development time. These are trends that are happening. These are tools that are available.
Starting point is 00:16:30 This is where we're going to take the world and it'll become the norm. No longer does anybody write code to build a link list. That's a library. Only in job interviews. Only in job interviews, exactly. So these things too too, will become like libraries. Yeah, and I mean, this year we saw data protection really highlighted as a main stage keynote item. And this is really kind of the evolution that even just my career I've seen kind of working in the data protection industry, both at AWS and outside of AWS, is before I used to be the storage admins problem, right? Is, okay, well, they were in charge of, you know, issuing backup policies or restoring data, right? Now it's become a CISO problem. It's become a board-level problem.
Starting point is 00:17:12 In fact, according to the NACD, which is a board regulatory association, it's requiring organizations and companies to prove that they have a data resiliency strategy in place. And in fact, actually, one of the groups that my team works with publicly is the Cross-Market Organizational Resiliency Group. And it's a group of financial institutions in the UK, similar to Sheltered Harbor in the US, where they need to prove to regulators
Starting point is 00:17:39 that their data is immutable, right? And today, the only way to do that across all of these 17, 18 plus AWS services like S3 and EC2 and RDS is really by applying a lock onto your vault. And that's proven to be SEC 17A, FINRA, and CFTC compliant as of this reinvent. Which is a big step. The idea of, you can make anything so that you personally cannot get access to it, but people try to write their own cryptography that way. It doesn't go super well either. Having a trusted third party who also, let's be clear, is the one that evaluates whether you are in compliance or not and will fine you otherwise, state that yes, this qualifies is no small step. Yeah, and we see now this being implemented across the Fortune 500 and majority of enterprise environments as their protection against, let's just call the 800-pound
Starting point is 00:18:34 gorilla in the room, ransomware. As ransomware threats become more evident and also as ransomware also evolves in the cloud, we need to really ensure that we have proper protection, proactive protection in place by locking your vault, but also by making it easy to do that. And actually this week, we're also excited to announce the application-aware sort of protection that you can do, which is define your stateful resources in a CloudFormation template, point to that template and stack as an entity,
Starting point is 00:19:04 and be able to protect that as an entity and restore that as an entity. This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your engineers are burned out. They're tired from pagers waking them up at 2am for something that could have waited until after their morning coffee. Ring ring. Who's there? It's Nagios, the original Call of Duty. They're fed up with relying on two or three different monitoring tools that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there is a better way.
Starting point is 00:19:39 Observability tools like Honeycomb, and very little else because they do admittedly set the bar, show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and most importantly, great for your customers. Try free today at honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming in the cloud. I am very excited about that. You have excellent reasons that you have built this, and I am not in any way, shape, or form saying that they are not good and valid,
Starting point is 00:20:21 but this is one of those scenarios I worry that you've built a beautiful iPad or something and, oh great, a new hammer is what I'm about to do with it. Because it feels like this is closer than a lot of things have come to being an easy way for me to wind up migrating an application from my old omnibus AWS account that has everything in the kitchen sink in it into a dedicated member account in the AWS organization? Because yes, you could always apply the CloudFormation template somewhere else, but what about the data contained within those resources? It feels like you have built a cloud migration story for me from AWS account to AWS account. Is that a known thing and an intended and designed use case, or have I just ultimately come up with a solution where you're about to tell me about a different service and tire that does that for me already and I've been asleep at the wheel again?
Starting point is 00:21:09 Well, you can just pull me and Wayne about our philosophy around services. And our philosophy is to build a centralized platform where it has these capabilities and really customers have the flexibility of choosing which capabilities they need for their specific use case. So sure, if you find that being able to define your AWS, a collection of AWS resources as an application stack, is your way to, for example, failover, migrate your applications from account to account, maybe you have a dedicated account to your point where it's your, you know, maybe Fort Knox account, right? Sure, Those are all use cases that now can be met with this capability. And better yet, right, talking about sort of
Starting point is 00:21:50 central governance is using AWS organizations, you can now delegate that administrator of capability from the central management account in case, hey, you don't want anyone going into that account, into member accounts, whereby these member accounts, in case they are, let's say, managing separate organizational units or OUs, can continue delegating those policies. So you can be sure, hey, across, let's say, you know, my tens of thousands of accounts, in fact, our largest customer has 50,000 accounts, right? And across 50,000 accounts at that scale, how can I be sure that each account is protecting its resources the same frequency and way as the 49,000 plus other accounts that I have? Right. It's always a question of, is this just a scratch account with test data in the end,
Starting point is 00:22:35 or is that the one that has the credit card numbers in it? It's always a tricky balance. I want to give you peace of mind. Please do. I'm going to give it my best shot. Wait, do you want to give me peace of mind or a piece of your mind? I want to give you the former, not the latter. Excellent. I will give you the of mind. Please do. I'm going to give it my best shot. Wait, do you want to give me peace of mind or a piece of your mind? Either one of those. The former, not the latter. Excellent. And I will give you the latter anyway. I look forward to it. What you described is not a misuse of a capability. Because for years, since the dawn of time, since spinning tape, people have taken backups to do restores on another system.
Starting point is 00:23:06 That has been a way people have migrated data from point A to point B. So I got out of VPC, easy to do classic into VPCs. Yeah. You take a snap, you stop, you quest the application, snapshot the RDS instance, restore that snapshot inside of a VPC, hope it doesn't take super long back in those days, and turn it back on, and yeah, hopefully you've tested it first. So in this case, if in fact somebody decides to take this capability and do a backup of their entire application state and restore that application state somewhere else, it's not an unreasonable way of doing things. However, if that's a live application, doing a backup and restore is not the same as doing a live migration or a disaster recovery scenario where you're doing block-level, change-level migration of your data from point A to point B.
Starting point is 00:23:55 Those are different problems. So if your RPO and your RTO is such that an hour is fine, six hours is fine, 12 hours is fine. Okay, this is probably a reasonable thing. But if your needs are that a second is okay, a minute is pushing it, this is not that solution. So, and then people start to think of replication as a backup strategy, which it is not. Which it is not. Yeah. Which it is not. So, people, and Nancy can speak to this very eloquently, backup, disaster recovery, these are different problems. They are related. They're sister problems.
Starting point is 00:24:33 They're not the same problem. When we first started talking, I believe you were the general manager of AWS Backup, and your portfolio has expanded, and I'm not quite sure where one service starts and the other one stops. I mean, these days it seems like the way forward is the elastic disaster recovery offering. And I don't mean to toot my own horn too loudly, but I'm something of an elastic disaster myself.
Starting point is 00:24:54 Where is the recovery? We are waiting on that. Probably after reInvent, maybe, if we're lucky. But at this point, what is the entry point for all of this? AWS Backup, I feel I understand it. It speaks the language that I think of data in. Elastic disaster recovery seems much more encompassing than that, isn't it? So to what Wayne described, if you think about solutions from a continuum of RPO requirements, right? So there's a reason why, for example, Gartner and other analysts
Starting point is 00:25:27 grouped together business continuity and resiliency with backup plus DR, right? And that's really how we should think about data protection on AWS is depending on what your RPO needs are, right? You should be able to set a central policy and be able to determine what RPO you need. So today with Elastic Disaster Recovery, we support, for example, data that's written to disk. And if you're applying an agent onto your EC2 instances and filtering each write to another replica, that's a great solution for you.
Starting point is 00:25:59 With that said, again, if you want to be able to apply that vault lock across all of your 17 plus, 18 plus accounting AWS services, I'm losing count here, right? AWS backup might be the better solution for you. But again, it goes back to that flexibility and choice. So where my team comes in is if you're a customer and you're thinking, hey, I have X bytes level of data, petabytes level of data on AWS that I want to be able to protect, what should I do? And that's really our sweet spot, is we like being the thought partner to customers of what are your requirements? What are your use cases that you want to solve for? Specifically, the compliance frameworks that you are subject to, and let us recommend the best solutions to you. It's one of those areas that I'm a big fan of
Starting point is 00:26:46 the cloud for, where it's easy to sit here and be overwhelmed by the sheer number of services that you folks have, but your customers are incredibly and wildly diverse when it comes to what their applications are, what their businesses do, whether they turn on backups or not, all kinds of other stuff. But because it's all built on the same underlying platform, it feels like there are remarkably few disaster recovery approaches that are truly unique to a specific customer. It feels like all the heavy lifting, all the work has been done countless times. It's an early experience I had with this is stop trying to answer auditor questions about AWS
Starting point is 00:27:25 like it's a data center. Just hand them the AWS documentation that hangs out in AWS Artifact and call it good because it already is written in the language that they're expecting rather than reinventing a wheel badly. Well, I'd love to make your day by saying we actually have a functionality within AWS Data Protection called AWS Backup Audit Manager that is designed to generate reports that you can hand over to your auditors. I love that. It also winds up speaking to the truism, which is this idea that, honestly, no one cares about backup. Absolutely no one. They don't.
Starting point is 00:28:00 They care a lot about restores directly after they really should have cared slightly more about backups. I mean, the people who are the most fanatical about backups, the ones who've lost data, I count myself on that list. But again, the entire theme of this year's reInvent has been about data, both in terms of accessing it and using it and making sure the right parties have access to it and leveraging it to add value rather than previous years and previous stories, which I felt a lot more like, you've got a lot of data. You probably don't know what it is, but that's right. You should sit your butt down on top of it like a sad, greedy dragon and never
Starting point is 00:28:36 let anything happen to it. Now it's, okay, what are you actually going to do? You go to that sad dragon and you have a pile of gold. Why don't you try investing it somewhere? And it feels like that is really how it is starting to play out narratively. Well, data, we've talked about this before, Corey, but second only to people in a business, data is their most important asset. Nancy refers to it as the crown jewels. Find out within your data sets, what are your crown jewels, and make sure that you understand where they are, what they are, are they properly being protected. For those who need to share them, going to our auth and access piece, make sure those folks can share in that data. Make sure they can get the value out of that data. The balance is protection and use.
Starting point is 00:29:23 You want to make sure that you protect your resources, but not constrain their appropriate use. And everything that Nancy and Nihat talked about really refers to that. Do not constrain the use of the data inappropriately by being scared. Constrain the use of the data to only those who need it, so you don't have to be scared. Yeah, if you can't access the data, you effectively don't have the data. Effectively, you have a valuable resource that's locked up and not doing any good for your business. One of the challenges that we do see, though, is that no matter what it is that we're talking about here, the world is getting bigger. Scale is increasing for almost everything. And it's long gone are the days where
Starting point is 00:30:05 you have three to maybe four web servers. Now you have so many things that are emitting telemetry at stupendous volume. And so much of the data that is being collected is fundamentally useless on some level. But there's still not a lot of awareness of what the data that people have is. It becomes a data swamp where it's, well, we have a whole bunch of load balancer logs that are operationally useful for a little while, but you don't need them from six years ago, combined with stuff that, oh yeah, we need to keep those in case the auditor comes knocking for seven years.
Starting point is 00:30:34 And people just wind up thinking about it as one discrete entity and don't even know where to start unpacking it all. Let me tell you a story, because I'm going to agree with you and disagree with you at the same time. You used the example of operational logs. AWS collects a tremendous number of data points every second across all of our services. And especially given the scale we have,
Starting point is 00:30:56 that really is expansive. It may seem like a lot of things to keep around for no reason. However, those logs end up in an operational data lake. Those logs end up being analyzed. Whether predictive analytics or ML models, it depends on the group, it depends on the size of the logs, it depends on the need. We're able to innovate, invent and innovate brand new features and capabilities, often at no cost to the customer, because of those operational logs, because a human can't possibly understand the usage patterns for all of these customers. But when you do the analytics, patterns become clear. Value becomes clear. Opportunity becomes glaring. And when we look at things like 10x improvement in this and 12x improvement in
Starting point is 00:31:45 that, where do you think these come from? That we're brilliant, we just figured out magically how to remove three lines of code? Well, maybe, sometimes. But often it's from this analysis that says, we actually have more capability in our current product that we could hand to the customer or reduce our cost. This is the value of data. This is the value of these operational logs. And often it's not because you keep them for six months. It sometimes is because you have them for seven years because that's where you find the patterns. Yeah, I wish I could, until this place,
Starting point is 00:32:16 I've never been anywhere even near that long at the same place. Like, oh, seven years from now, it's the best kind of problem. Someone else's. Yeah, I don't have that luxury anymore. I have to start thinking a bit more long-term on that. I really want to thank both of you
Starting point is 00:32:28 for being so generous with your time during reInvent, where I'm sure you have an infinite number of things you could be doing that are better than this. Thank you so much for being so generous. Of course. Thanks for having us, Corey. Corey, it's always a pleasure. It really is.
Starting point is 00:32:41 Wayne Dusso, VP at AWS. Nancy Wang, General Manager of Data Protection at AWS. And I'm cloud economist, Corey Quinn. If you've enjoyed this episode, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review
Starting point is 00:32:59 on your podcast platform of choice, along with a comment telling me exactly what you think about me, and that's okay because I'm going to lose the data by mistake. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.