Screaming in the Cloud - Crafting a Modern Data Protection Strategy with Sam Nicholls

Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value?

Starting point is 00:00:38 Or being locked into a vendor due to proprietary data collection, querying, and visualization? Modern-day containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily,

Starting point is 00:00:58 and get better context and control. 100% open-source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight to ECS, EKS, and your microservices, wherever they may be, at snark.cloud slash chronosphere. That's snark.cloud slash chronosphere. This episode is brought to us in part by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes mere users. AI models combined with the Pinecone Vector Database let your applications understand and act on what your users want without making them spell it out.

Starting point is 00:01:40 Make your search application find results by meaning instead of just keywords. Your personalization system make picks based on relevance instead of just tags. And your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit pinecone.io to understand more. Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam.

Starting point is 00:02:16 And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nichols, director of public cloud over at Veeam. Sam, thank you for joining me. Hey, thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am hopefully well prepared to answer anything and everything that Corey throws my way. Fingers crossed. I don't think there's too much room for criticizing here to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups,

Starting point is 00:02:56 which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, oh, I really should have cared about backups at any point prior to 20 minutes ago. Yeah, yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do. But when it, you know, push comes to shove and something's burning down, a file's been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really kind of care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery. It's been over a decade

Starting point is 00:03:32 and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server. It was an iPhone setting that said, oh yeah, automatically delete everything in the trash folder or archive folder after 30 days. It was just a weird default setting back in that era and it was doing that. Yeah. Painful stuff. And we learn the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while, it still bugs me, which speaks to the point that the people who are the most fanatical about backing things up are the people who've been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me. Yeah, yeah, it's a good point. I can remember a few years ago,

Starting point is 00:04:18 my wife migrated very aging polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good. As the white polycarbonate Mac becomes yellow, then yeah. All right. If that's, you know, it's time to replace it. Yeah. So yeah. So she wiped the drive and what happened? That was her moment where she learned the value and importance of backup and why she backs everything up. Now, I fortunately have never gone through it, but, uh, I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course. Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital pack rat where her hard drives on her laptop will

Starting point is 00:04:55 periodically fill up. And I used to take the approach of, oh, you can be more efficient and do the rest. And I realized, no, telling other people they're doing it wrong is generally poor practice. Whereas just buying bigger drives is way easier. Let's go ahead and do that. It's a small price to pay for domestic tranquility. And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked, you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop? Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing. And, you know, a lot of folks were there trying to utilize agents to protect

Starting point is 00:05:41 their vSphere instances, just like they were doing their physical windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered. It was this agentless backup, image-based backup for vSphere. And of course, you know, in the last 15 years, we've seen lots of transitions. Of course, we're here screaming in the cloud with you, Corey. So AWS, as well as a number of other public cloud vendors, we can help protect as well as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So Veeam's really kind of come a long way from just virtual machines to really taking a global look

Starting point is 00:06:21 at the entirety of modern environments and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole? It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. I mean, I'm one of them, let's be very clear here, it comes down to a problem where backups are a multifaceted deep problem. And I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data? Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but used to store data. Well, all of them. Some of them you have to hold it

Starting point is 00:07:05 in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So what service am I using? In which account am I using? And in what region am I using it? And you wind up with data sprawl,

Starting point is 00:07:19 where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so where am I being charged and for what service? That seems like a good place to start, but where is it getting backed up? How do you think about that? So some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to back up absolutely everything and we're going to keep that data for the rest of our natural lives.

Starting point is 00:07:47 It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes. Yeah, snapshots for all is a real thing and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot. But when you're dealing with thousands of snapshots for thousands of machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.

Starting point is 00:08:19 So where to start is automating the lifecycle of a snapshot from its creation. How often do we want to be creating them from the retention? How long do we want to keep these for? And where do we want to keep them? Because there are other storage services outside of just EBS volumes. And then, of course, the ultimate deletion. And that's important even from a compliance perspective as well. You've got to retain data for a specific number of years. I think healthcare is like seven years, but then you've also...

Starting point is 00:08:50 And then not a day more. out gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup. What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere. Virtualization was not really in play. So we had file systems on physical disks.

Starting point is 00:09:27 And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some rule set that it has, it copies them off somewhere else. It was slow. It was fraught. It had a whole bunch of logic

Starting point is 00:09:39 that was pushed out to the very edge. And forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in a state of flux where anything changing during the backup run

Starting point is 00:09:55 would leave your backups in an inconsistent state. That, on some level, seems to have largely been solved by snapshots, but what's your take on it? You're a lot closer to this part of the world than I am. Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance. And again, just the ease and accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So I think

Starting point is 00:10:26 the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly. And we all of a sudden are stranded with a crash consistent backup or snapshot versus an application consistent snapshot. And then they're also very, very quick to recover from. So snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other, it's that they've got to go hand in hand with something else. And typically that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.

Starting point is 00:11:12 One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time, back once upon a time when we ran MySQL databases on top of a NetApp filer, which works surprisingly well. We would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file, and then unquiesce it, which took less than a second start to finish. And that was awesome. But then you had this snapshot type of thing. It wasn't super portable. It needed to reference a previous snapshot in some cases. And AWS takes the same approach, where the first snapshot captures every block. Then subsequent snapshots wind up only taking up as much size

Starting point is 00:11:51 as there have been changes since the first snapshots. So large quantities of data that generally don't get accessed a whole lot have remarkably small subsequent snapshot sizes. But that's not at all obvious from the outside and looking at these things. They're not the most portable thing in the world, but it's definitely the direction that the industry has trended in. So rather than having a cron job

Starting point is 00:12:15 fire off an AWS API call to take snapshots of my volumes as sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, well, cron jobs are hard and we have a friendlier interface for that. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at and what they are good for within your environment. As previously mentioned,

Starting point is 00:12:41 low RPOs, low RTOs. How quickly can I take a backup? How frequently can I take a backup? More importantly, how quickly can I restore? But then looking at their limitations. So I mentioned that they were not independent of that disk. So that certainly does introduce a single point of failure, as well as being not so secure. We've kind of touched on the cost component of that as well.

Starting point is 00:13:03 So what Veeam can come in and do is then take an image-based backup of those snapshots, right? So you've got your initial snapshot and then your incremental ones. We'll take the backup from that snapshot and then we'll start to store that elsewhere. And that is likely going to be in a different account. We can look at the Well-Architected Framework,

Starting point is 00:13:22 AWS deeming accounts as security boundaries. So having that cross-account function is critically important. So you don't have that single point of failure. Locking down with IAM roles is also incredibly important. So we haven't just got a big wide open door between the two. But that data is then stored in a separate account, potentially in a separate region, maybe in the same region, Amazon S3 storage. And S3 has the wonderful benefit

Starting point is 00:13:48 of being still relatively performant so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month instead of- To start, and it goes down from there with sizable volumes. Absolutely, yeah. You can go down to S3 Glacier

Starting point is 00:14:04 where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month. But it's going to take you a couple of days to recover that. Even infrequent access cuts that in half. And let's be clear, these are snapshot backups. You probably should not be accessing them

Starting point is 00:14:19 on a consistent, sustained basis. Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying, you must keep this data for this length of time. Keeping that, let's just say it's three years worth of snapshots and an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years, even two months ago, it's very, very small. So the performance part of S3 is you don't need to take it as much into

Starting point is 00:14:52 consideration. Is it, can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right? Spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it. I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bills, one of their largest single line items.

Starting point is 00:15:23 And I want to be very clear here, because if one of those people are listening to this and thinking, well, hang on, wait, they're telling stories about us, even though they're not naming us by name. Yeah, there were three of you in the last quarter. So at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to you as one company when there were three of you. This is a narrative device, not me breaking confidentiality.

Starting point is 00:15:50 Disclaimer over. Now, when you talk to people about, so tell me why you've got 80 times more snapshots than you do EBS volumes. The answer is, is, well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups and monthly and so on and we needed to get hourly backups to a point, then daily backups, then monthly and so on and so forth. And when this was set up, there wasn't a great way to do this

Starting point is 00:16:10 natively. And we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do, well, that cost is almost incalculable. So this is the safe way to go. And they're not wrong in anything that they're saying. But the world has definitely evolved since then. Yeah, yeah, it's a really great point. And again, it just folds back into my whole having a cake and eating it conversation. Yes, you need to retain data. It gives you that kind of nice, warm, cozy feeling. It's a nice blanket on a winner's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is, is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more

Starting point is 00:16:58 cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier. One of the inherent challenges I think people have is that snapshots by themselves are almost worthless in that I have an EBS snapshot. It is sitting there now. It's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is. And okay, great. Well, I'm looking for a file that was not modified since X date as it was on this time. Well, great. You're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again with a different one. And after like the fifth or sixth in a row,

Starting point is 00:17:40 you start doing a binary search approach on this thing, but it's expensive. It's time consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup. Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a 10-step process to do a granular file or folder-level recovery from a snapshot, right? Like you say, you've got to determine the point in time when you knew the last time that it was around. Then you're going to have to determine the volume size,

Starting point is 00:18:16 the region, the OS. You're going to have to create an EBS volume of the same size region from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume, search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then

Starting point is 00:18:39 you've got to tear it all down. And that is, again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be, and then you have to start from scratch from a different point in time. So backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process, but make the whole e-discovery, the search, the location of those files much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have Explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again,

Starting point is 00:19:22 automatic process, you can just search for your individual file folder, locate it. You can download it locally. You can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis. I forget the right terminology for it. Some of it's AWS, some of it's Amazon. But by the by, the whole recovery process, especially from a file or folder level is much more pain-free, but also

Starting point is 00:19:45 much faster. And that's ultimately what people care about. How reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money. This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you're looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time,

Starting point is 00:20:25 right now, from anywhere, visit snark.cloud slash redis. That's snark.cloud slash r-e-d-i-s. Right, you get the idea of RPO versus RTO, recovery point objective and recovery time objective. With an RPO, it's great. Disaster strikes right now. How long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes. Sometimes it's measured in fractions of a second. It really depends on what we're talking about.

Starting point is 00:20:54 Payments databases, that needs to be, the RPO is basically asymptotically approaches zero. The RTO is, okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there is a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of, is it possible for us to recover this data? And that's where I know that you are, obviously, you have a position on this that is informed by where you work. But I don't, and I will call this

Starting point is 00:21:33 out as what I see in the industry, AWS backup is compelling to me except for one fatal flaw that it has. And that is, it starts and stops with AWS. I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times. But the one area where it makes absolute sense to me is backups. Have your data in a rehydrate the business level state backed up somewhere that is not your primary cloud provider

Starting point is 00:22:00 because your otherwise single point of failure through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor, there has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what else is expensive? Irrevocably losing the data that powers your business. Is it likely?

Starting point is 00:22:46 No, but I would much rather do it than have to justify why I'm not doing it. cloud and I read your newsletters and understand where you're coming from. But I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole sourced on a single cloud and nothing else is relatively small, single digit percentage. It's around 87% that are hybrid and the remainder of them are your favorite multi-cloud but again having something that is 100 soul sourced on a single platform or a single vendor does expose you to a certain degree of risk so having the ability to do cross-platform backups recoveries migrations for whatever reason right because it might not just be a disaster like you'd mentioned. It might also just be, I don't know, the company's been taken over and all of a sudden the preference is now towards another cloud provider

Starting point is 00:23:33 and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds. But the other key piece is that we always use the same backup file format.

Starting point is 00:23:59 And why that's key is because it enables portability. If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk. I could copy those into Azure. I could do the same with my Azure VMs and store those on S3 or again on on-premises disk and any other endless combination that goes with that. And it's really kind of centered around control and ownership of your data. We are not prescriptive by any means. You do what is best for your organization. We just want to provide you with the tool set that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be. One of the big challenges that I keep seeing across the board is just a lack of awareness

Starting point is 00:24:53 of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed. But you can create those things at will. Why do you care about the actual data that's on these things? It winds up almost with a library management problem on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub. And they were paying for a third-party service that wound up backing up Git repos.

Starting point is 00:25:31 Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git. But at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere. It's like, really? You want to bet the company slash your job slash everyone else's job

Starting point is 00:25:51 on that being feasible and doable? Or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it and they validate that it works? And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in

Starting point is 00:26:10 different places. But what's really expensive is not having a company anymore. Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, just like with insurances, there's multiple different ways to go about recovering your data. And it's only in crunch time. Do you really care about what it is that you've been paying for? And when it comes to backup, could you get your backup through a Git clone? Absolutely. Could you get your data back? How long is that going to take you? How painful is that going to be? What's going to be the impact to the business while you're trying to figure that out versus like you say, the 39 bucks a month or

Starting point is 00:26:52 year or whatever it might be to have something purpose built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online. I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that, yeah, there is a real fear against data loss. It's not people are coming to get you, so you absolutely have to buy whatever it is I'm selling, but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill, and sometimes that means spending more. Okay, that one S3 bucket is extremely important to you. And you say you can't sustain the loss of it ever.

Starting point is 00:27:31 So one zone is not an option. Where's it being backed up? Oh, it's not. Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing. Yeah, yeah, it's interesting. And it's going to be hard for you to prove the value of doing that when you are driving their bill up, when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour,

Starting point is 00:28:12 millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison. So having something that is purpose built and is going to protect your data and help mitigate that impact of downtime, because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. Like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without? That's the big burning question that I think a lot of people do not have a good answer to.

Starting point is 00:28:53 And when you don't have an answer, you either back up everything or nothing. And I'm not a big fan of doing either of those things blindly. Yeah, absolutely. And I think this is why we see varying different backup options as well. You know, you're not going to try and apply the same data protection policies to each and every single workload within your environment because they've all got different types of workload criticality.

Starting point is 00:29:12 And like you say, some of them might not even need to be backed up at all just because they don't have data that needs to be protected. So you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy in terms of how frequently do you protect it, where do you store it, how often or when are you eventually going to delete that, and apply that on a workload by workload basis. And this is where the joy of things like

Starting point is 00:29:40 tags come into play as well. One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this, because they're forced to by brevity, is in the title tag of their website. I go to, I pull up Veeam.com and I hover over the tab in my browser and it says Veeam Software Modern Data Protection. And I want to call that out because you're not framing it as explicitly backup. So the last topic I want to get into is the idea of security, because I think it is not fully appreciated on a lived experience basis, although people will of course agree to this when they're having ivory tower whiteboard discussions, that every place your data lives is a potential for a security breach to happen. So you want to have your data living in a bunch of places, ideally for backup and resiliency purposes,

Starting point is 00:30:35 but you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it. How do you balance those trade-offs yourself, given that what you're fundamentally saying is, trust us with your holy of holies when it comes to things that power your entire business? I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of the stuff to destroy our company.

Starting point is 00:31:01 Yeah, yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it. And I think that backup absolutely needs to be a, I'm going to say this in air quotes, a first class citizen of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out? This is going to be a bit of the FUD that you love, but ultimately the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You, on the other hand, as the good guy with your cape and

Starting point is 00:31:41 whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect. And it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup is really going to be the only thing that is going to save your organization should that actually happen. And what's key to a secure backup? I would say separation, isolation of backup data from the production data. I would say utilizing things like immutability. So in AWS, we've got Amazon S3 object lock. So it's that write once, read many state

Starting point is 00:32:27 for whatever retention period that you put on it. So the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes, is encryption, right? And we can utilize things like AWS KMS for that encryption, but that's there to help defend against the exfiltration attempts

Starting point is 00:32:49 because these bad guys are realizing, hey, people aren't paying me my ransom because they're just recovering from a clean backup. So now I'm going to take that backup data. I'm going to leak the personally identifiable information, trade secrets or whatever on the internet. And that's going to put them in breach of compliance and give them a hefty fine that way, unless they pay me my ransom. So encryption, so they can't read that data. So not only can they not change it, but they can't read it, is equally important. So I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable. I think that that is one of those areas where people need to put additional levels of thought. And I think that that is one of those areas where people need to put

Starting point is 00:33:26 additional levels of thought. And I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not, at least with that account, ideally not you at all personally, have access to alter the backups full stop. I would say on some level, there should not be the ability to alter backups given for some particular workloads. The idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear. But if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.

Starting point is 00:34:09 Yeah, yeah. I mean, we can turn to a number of organizations. Codespaces always springs to mind for me. I love Codespaces. It was kind of one of those precursors to... It's amazing. Yeah, yeah. But they were running on AWS and they had everything, production and backups, all stored in one account, got into the account. We're going to delete your data if you don't pay us this ransom. They were like, well, we're not paying you the ransom, so we got backups. Well, they deleted those too. And unfortunately, Codespaces isn't around anymore. But it really

Starting point is 00:34:31 goes to show just the importance of at least logically separating your data across different accounts and not having that godlike access to absolutely everything. When you talk about Codespaces, I was under pressure talking about GitHub code spaces specifically,

Starting point is 00:34:46 where they have their developer workstations in the cloud. They're still very much around, at least last time I saw, unless you know something I don't. Precursor to that. I can send you the link.

Starting point is 00:34:54 You can share it with our listeners. Oh, yes, please do. I'd love to see that. Yeah, absolutely. It's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate your spending so much time with me.

Starting point is 00:35:06 Where can people go to learn more? Yeah, absolutely. I think Veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy. So we always make free product available. And then you can find that on the AWS marketplace. Simply search Veeam through there.

Starting point is 00:35:24 Number of free products. We don't put time limits on it. We don't put feature limitations. You can back up 10 instances, including your VPCs, which we actually didn't talk about today, but I do think is important, but I won't waste any more time on that. Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology. Yeah, that's a real pain. So we can help protect those VPCs, and we actually don't put any limitations on the number of VPCs that you can protect. It's always free. So if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation,

Starting point is 00:35:57 want to learn more, want to speak to someone, Veeam.com is the place to go. And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated. Thank you, Corey. And thanks for all the listeners tuning in today. Sam Nichols, Director of Public Cloud at Veeam. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that takes you two hours to type out, but then you lose it because you forgot to

Starting point is 00:36:33 back it up. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duck Bill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started. This has been a HumblePod production.

Starting point is 00:37:23 Stay humble.

CODACE Plant Stand

Screaming in the Cloud - Crafting a Modern Data Protection Strategy with Sam Nicholls

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Screaming in the Cloud - Crafting a Modern Data Protection Strategy with Sam Nicholls

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.