Podcast Archive - StorageReview.com - Podcast #113: Dell PowerScale and the Impact of QLC SSDs

Episode Date: October 18, 2022

Brian connects with Dell’s Product Management VP, David Noy for this week’s podcast. David… The post Podcast #113: Dell PowerScale and the Impact of QLC SSDs appeared first on Storag...eReview.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everyone, Brian Beeler here with the Storage Review Podcast and today we're talking all things PowerScale and what's going on with that line of products from Dell. There were some big announcements at DTW back in May and now some of these things are coming to market which is really exciting to see. I've got David Noy with us today. David, thanks for jumping in. Hey, thanks for having me. So yeah, I'm glad to do it. Glad to talk with you. I mean, you guys had all sorts of great announcements at DTW. Sometimes the hardware was a little bit behind
Starting point is 00:00:35 the software offerings. But before we get into all of that, just what do you do at Dell? Give us a little background on yourself. No problem. So I lead product management for our unstructured data solutions team, which is our scale on NAS offerings, so PowerScale, as well as object storage, ECS, and then the software-defined version, ObjectScale. I also lead product management for our data protection suite. So that's the PowerProtect DD product, all of our software assets around backup, including our new PowerProtect Data Manager and things related to cyber recovery and cyber sets.
Starting point is 00:01:14 Okay. Yeah, I didn't know actually that you did backup too. One of our favorite things that you guys ever did was that little DD virtual edition that started to slide into, to rail and some other spots was just a neat, like a little data protection widget to just kind of clip into your environment. That was, that was pretty slick. I don't know if you should take any credit for that, but that's one of our favorite things. I don't know if I can take credit for it, but inherited, we have 13 exabytes,
Starting point is 00:01:43 I believe now of data protected in public cloud on that little DDVE product. Really? Wow. That's extensive. That's a lot. So take us back a little bit to Dell Tech World. I mean, there obviously Apex was a big messaging driver and Alpine. I guess you guys like code words that begin with the alphabet, which is cool. Apex, of course, is a service. Alpine being the cloudification of your operating systems and storage software.
Starting point is 00:02:18 Let's start with those. Where does PowerScale fit into those models? How are you guys thinking about the as a service and then also this cloud delivery model? So we have PowerScale as a service in APEX today, file on demand, very popular offering. For customers who are looking for a fully managed service, who want to buy consumption based and that's the model for them that capability is out there right now what we showed at deltech world was
Starting point is 00:02:51 a software defined version of the operating system that runs on power scale called 1fs and by moving to software defined that gives us the ability to adapt beyond just the appliance model which we ship today and even the underneath APEX there are physical appliances but as we go forward to actually deploy 1FS in the cloud as a cloud native service for Scale-Out NAS. Yeah that's ambitious right I mean but I, it makes sense to sort of separate the two components, but the Alpine project's a big one for you guys. I mean, that's obviously a major emphasis in the messaging, but also, as you're talking about it,
Starting point is 00:03:36 is something that you're working on delivering too. So if you think about it, this isn't just a PowerScale or UDS thing. It's across all of the Dell technology storage assets. We're in the process of making them all become available as both a software-defined consumption model as well as appliance consumption model and as Apex as a managed service offering. So customers have their choice. If they want something in the data center, they can get it that way. If they want it in the data center or they want it as a fully managed offering, they're willing to outsource that operation, they can get it that way. Either manage it in your data center or
Starting point is 00:04:12 hosted. And then as a cloud offering, we're doing that for file, object, block, and our data protection products. So we process various stages of that process across the portfolio. PowerScale has come pretty far along. And so we're looking forward to what that means. But we demoed the beta so you can guess as to how far along we are. Well, typically when Dell shows something, that means you're pretty far along in the process because you guys don't tend to lead with half-baked kind of solutions. As you're talking about that, I do wonder,
Starting point is 00:04:47 and I don't even know how much visibility you have into this, into what happens from a development standpoint for the code. Because taking the code off of hardware, I mean, still obviously runs on something, but it's a different development modality than traditional storage array development. And I think there was even a big shift like when Dell and EMC came together to take some of those really traditional EMC storage appliance hardware models and transitioning them to a little more power edgy kind of server models. You had to make a shift in development mentality there. How different is this going to a cloud delivered model? Well, you hit the nail on the head. I mean, the first move was to adopt PowerEdge servers
Starting point is 00:05:32 as a platform for us to extend onto. And so we've had a lot of success with our all-flash servers, which are the PowerEdge based servers. But in doing that, we had to take a different approach to how we build out 1FS to assume that ultimately we could be run on any platform. And so if it's a PowerEdge platform or it's a compute instance running virtualized in cloud, call it EC2 and AWS or Azure Compute, what have you, that we treat those as yet another platform. And so we abstract ourselves further from the actual hardware so that that way we can actually become software defined what we don't want to do and you called this out earlier with the development practices what we don't want to do is fork the code and have two completely different code bases one that's for cloud and software defined and one
Starting point is 00:06:19 that's for the appliance and so we've kept that as a common code base. The importance there is that I get asked all the time by my customers, when you launch this thing in cloud, is it gonna be different from the version that you have on-prem? The answer is, it's the same code. Like largely speaking, it'll be functionally equivalent to what you have on-prem. There'll be differences in performance probably. There'll be differences in performance, probably.
Starting point is 00:06:46 There may be differences in the scale, but largely speaking, it is the same code. So if you're used to doing things one way on-prem, you're going to be very used to doing it the same way in the cloud. And that commonality is actually something that's very important to our customer base. I think it's probably beyond very important. It's like mission critical, right? Because you can't have a storage admin on-prem saying, oh, well, this is how we do it in the data center and spin up whatever. We use some deep data feature that's within PowerScale and then go to the cloud for some remote work, and be like, oh, that's not there yet. And we've seen that before where there's not feature parity
Starting point is 00:07:27 between the cloud version of whatever is being deployed and the on-prem, and it causes some conflict. That's right. Yep. We get the question all the time, can I replicate back to the on-prem?
Starting point is 00:07:38 Yeah, replication. If I run cyber capabilities, we talked about the ransomware defender capability in the on-prem, and I want to do cyber protection of my in-cloud assets. Can I do that? Sure. So let's stick now with the hardware a little bit.
Starting point is 00:07:55 So in May, again, you launched a couple things around 1FS and PowerScale. You had some security stuff, which we'll definitely talk about some more, but also the support for QLC, which is really interesting to me. We've done so much work around QLC. To see you guys from a mainstream array perspective adopt that, I think is pretty cool and really validates that the QLC NAND is there and ready for heavy use in mainstream enterprises. Talk about that a little bit, the decision path there, kind of what you're seeing that from an enablement standpoint for your customers. So QLC is just a different architecture of Flash from TLC in that it adds more dimensions, if you will, to the way the data is stored.
Starting point is 00:08:49 But the trade-off is typically endurance and performance. The way that we use those drives is that we're fairly sparing on the endurance. So we're pretty gentle in the way that we use the drives. We do large IOs. We try to be very careful about how we place data. Data that comes into the cluster is not immediately just dumped on the drives. We process it first. And so we've done some calculations using infield metrics. And what we've found is that we get about 14 years of endurance out of these drives
Starting point is 00:09:25 before hammering them in the worst possible ways that we see happen in the field, which is pretty good. I mean, for the longevity of a drive that's supposed to be low endurance, 14 years is a pretty long time in the worst possible cases. At the same time, we have checks and balances, so we do wear level checking to make sure that the drives haven't been for whatever reason, inadvertently worn down. We will alert our support teams if their drives are failed for whatever reason, but largely speaking, we feel fairly comfortable that even with the lower endurance
Starting point is 00:09:59 of these drives, because we're using larger capacity drives like 15 and 30 terabyte, the lifespan of these drives would be quite good. And so I've got a couple sitting on that just happen to be sitting on the desk here. We've done a ton of work with these, the 30 terabyte part, the solid dime drives, and they're really fantastic from the capacity bump that you get like you set up to 30.72 in in a drive and you guys are supporting these i think in the f200 and f900 nodes 600 600 900 anyway the the one the one will fit um drives in it. So 24 30.72 terabyte drives is roughly 750 terabytes in this thing, almost a petabyte in one of these 2U chassis. And you hit on the really important thing, and I was hoping you would do this because we know we collectively know from telemetry data what the drives are seeing at the drive level and then what
Starting point is 00:11:09 you're seeing at a system level and with that information you can confidently go to customers and say look guys it's QLC there are going to be some trade-offs and we can talk about that in a minute but on the upside your workloads aren't punishing these drives as much as you think or might be afraid that they will. So, so long as you're under this whatever drive right per day ceiling that you guys set, then so be it. And if the drive fails, it's under warranty. It's not like the customers have much risk here outside of workload performance. Yeah, no, we've been very successful with the launch, actually. So far, no drive failures, so cross your fingers.
Starting point is 00:11:51 But look, you're right. So it's not even just that. Even under the most punishing environments that we see in the field, the real-life environments, these things will endure. And the net of it is, even with the performance profile being somewhat different from TLC drives, the way that we do a lot of prefetch and a lot of performance optimizations,
Starting point is 00:12:15 we see the exact same performance with QLC drives as we do with TLC drives. So from a customer perspective, it's great. I get a lower price point and I get more density and more equal performance. Like what's not to love, right? Yeah. I mean, we've seen some of that performance data. In fact, we're working with your team on it. I was in Hopkinton last week, lovely fall day, by the way. I don't know if you had the team order that up for me, but doing some hands-on work with PowerScale up there.
Starting point is 00:12:46 And yeah, I mean, we all know the read performance is good. What surprised me was what we saw on the write performance, which is at times where QLC can be troubled because, well, I don't want to get too far down in the weeds, but the QLC drives have an indirection unit. Right now, it's 64K. That's how they want to be written to. They want it to be nice, sequential, friendly, writes to the drive to maintain performance. And I guess in the software, somewhere along the way, you guys have figured that out to make sure that you're writing to these drives the way they want to be written to. And if you do that, if you can coalesce those writes before they go to the drive, then you're pretty much golden, right? That's exactly what we do. And we are. So, I mean, I think we're not done squeezing write performance out of these things yet. So
Starting point is 00:13:33 our engine is working on it right now, but we might be able to squeeze more. What's nice is we are able to squeeze more read performance. So I think we announced it at Dell Tech World, so I'm not out of line saying it, but in our upcoming 1FS release, we'll actually give you 25 to 30% more streaming read performance with a software upgrade. So we're not done squeezing more juice out of this orange. So talk about then how your customers are using these drives. I mean, I'm sort of conflicted on this because I feel like when you go with a density message of, hey, customer, with TLC, we're capping out at eight terabyte drives or maybe up to 15.36 in a few cases, but those are really quite costly when you start to get to the large capacity TLC drives. Be that as it may, we can now get maybe four to eight times more
Starting point is 00:14:24 capacity depending on what you're coming from in these systems. Do you worry that customers will buy fewer nodes because they don't need them for capacity? I mean, I would be worried if they weren't doubling or tripling their footprints every year. Some of our customers who are consuming this class of node for performance and density, we'll go through some of the workloads who are consuming this class of node for performance and density, you know, we'll go through some of the workloads in a moment, but the amount of data that they're gathering is just, it's nonstop. And what we're seeing is that, you know, whenever you put out a more dense node,
Starting point is 00:14:59 customers will gravitate towards it. So you put out a 15 terabyte drive, that's one thing. If you put out a 30, you guarantee that the 30 is going to outsell the 15. Really? Wow. Is that specific to unstructured that you see that happening more? I mean, I guess because the media explosion, everything's just, as you said, nonstop. At the end of the day, I talk to customers who tomorrow they might have to pick up a six petabyte additional data pool. And where am I going to put that in my data center? How am I going to catch all of that?
Starting point is 00:15:31 And how am I going to do it in a cost-effective way with rack power sometimes being constrained and rack space constrained? It's a tough problem. So density rules. Now, don't get me wrong. Performance is important, too. And so we're continuously working on both. But we'll keep striving towards density. So what do you have to do? I mean, obviously, adopting the 30 terabyte drive helps on the density side. Do you, as you said, the data sets are expanding, but these guys are constrained in many times rack space or power or cooling or whatever.
Starting point is 00:16:09 Do you need to do more dense two and a half inch drive chassis? Like what else can you do to continue to help on the density messaging? I think we have to look at some, you know, new design paradigms. Certainly we're exploring a few, you know, without going too much into it. There's several new different drive form factors, but that said, I don't think that we're capped out on the maximum right now with 24 and two U. So I think there's more that we can do
Starting point is 00:16:37 to jam more drives in there. You know, one thing that you have to think about as being a differentiator between us is we're not doing some complete and shelf model. So, to think about as being a differentiator between us is we're not doing some completed shelf model so um with there where you potentially don't have compute along with storage you might get you know misinformation about how much real density you have we scale compute and storage together and if you need more performance for your data you really need a performance dense set of data then you might buy smaller drives.
Starting point is 00:17:05 So you might use four terabyte or eight terabyte drives when I really have to just do a ton of performance out of a small data set. But for customers who've got lots and lots of data and that data is continuing to grow, and it's not all active at any given time, what percent of your data actually needs that performance at any given time?
Starting point is 00:17:22 Is it 20%? Is it only 10%? I'm sure it's not 80%. Then basically we'll tune the density to be as rich as possible when that percentage grows smaller so that you're maintaining the performance but getting the cost efficiencies and the power efficiencies that density drives.
Starting point is 00:17:40 And so that's really the question we're thinking is like, how much more density can we actually physically fit into a certain amount of space while preserving the performance or even getting incrementally better in terms of performance? Yeah, you talked about form factors, obviously, like E1L would be a great one or an E3 or any of some of these other ones where you can get the nice long NAN packs with plenty of cooling around them with the heat sinks. It has a lot of potential in terms of the capacity that you can jam in there for sure. Very few mainstream storage arrays have moved off of U.2. Customers know it and like it. I mean, so we were talking about your adoption of QLC and it being very good. And I'd be curious kind of to dive into that more, but normally we hear a lot of FUD marketing around new technologies or new shapes or whatever,
Starting point is 00:18:41 because your competitors may not have that. So they go straight to, well, you know, endurance, whatever, you know, and try to try to cloud the waters a little bit. But it sounds like your customers are still adopting regardless and, and really like what they're seeing from these high density QLC drives. I'm just curious how much friction there's been in that sales process. There's been a few calls I've been on with customers to walk them through why we're comfortable with the QLC drives. Largely after seeing the data, which is just publish it to them as plain as can be. Those, you know, those concerns largely fade away.
Starting point is 00:19:22 In fact, I haven't seen a situation yet where they haven't just been, okay, well, that looks good. That said, you know, in terms of, you know, how we're going to pack more in, there's all kinds of different ideas right now. So I know there's a few that we're pursuing. It's a little bit too early for me to call it yet, but I understand that, you know,
Starting point is 00:19:41 I fully appreciate the benefits of getting as much density and compute together at the same time as possible. So when you guys made the announcement back in May at Dell Tech World, you announced the support for the 1536 and the 30.72 QLC drives, which was pretty neat, but also a little bit surprising. Because historically, Dell has been very much in the storage world about multi-source everything. But when it gets to QLC, that level of competition isn't there. I mean, Solidigm has the drives I just mentioned. They're the only ones really at scale with those drives. So it was pretty obvious, you know, what, what you guys, who you were working with there. But that's a, that's a fundamental shift, at least from what I'm aware
Starting point is 00:20:30 of in, in your world is, does that matter? Does the multi-source issue, like, how do you address that? Or does, do you care anymore? Well, you always want to be multi-sourced, right? As much as, as you can, as much as it as makes sense and so we encourage all the drive vendors to look at 30 terabyte qlc as being a good opportunity i mean that said the opportunity is just too rich to go after to to to pass up for lack of another vendor there so you know we have a good relationship with solid and and they came through, they really did. So what can I say other than just, you know, it's solid. Right.
Starting point is 00:21:12 Look the moment that something else comes out, we'll take a look at it too. But right now that's what we have in our customers are demanding that and that's what we're giving them. Well, sure. So I guess if your customers are saying we want greater density and you're looking around saying, is there a good solution for this? Yes, there is. Then go for it, right? That's right. some of that work, I fully expected, I'm still surprised, we haven't published it yet, we'll get there very soon, but the performance on the right side, I know we already talked about it, but I'm still stuck on it because that was the one thing that I'm like, well, how's Dell going to go to
Starting point is 00:21:56 market with this, talking to customers about understanding how much right impact you have and blocks, because that's going to be really confusing and a little bit sloppy of a sales motion to say, you know, it only works on real large block. But I guess, again, kudos. There's not much else to say, but you guys have done a good job to figure that out. In some ways, Brian, I would almost say it's just luck. Like the way that 1FS writes data to these drives
Starting point is 00:22:24 happens to work really well, right? Like the way that one of us writes data to these drives, that happens to work really well, right? We do coalesce writes, we do a lot of write pre-processing, the data comes in and it's handled in a way that makes it possible for us to actually go and ensure that we don't cause damage going in and we can ingest at a very high speed and make sure that we don't cause damage going in and we can ingest at a very high speed and make sure that we distribute that load across all the nodes in our clusters. So the advantage of that is that,
Starting point is 00:22:54 well, it turns out that we're pretty easy on the drives underneath, pretty gentle on them. It's not like one, you know, data is coming in and it's hitting a very small set of drives. It's actually going across every node in a cluster up to a reasonably good sized cluster in every drive inside. So we're pretty well distributed. And that means that every drive is just seeing a fraction of that IO. And that takes a lot of the pressure off those drives. Do the large capacities cause any new challenges? I'm just sort of thinking through what happens in 1FS if the node that was 60, 80 terabytes before is now
Starting point is 00:23:38 almost three quarters of a petabyte. Does that just massive data cause any new table indexing? I don't know. Any other challenges for you that you had to work around? Not yet. We calculated a number called mean time to data loss, which is a calculation of given the size of the node, the number of drives inside of it, and what would happen if we had multiple failures at the same time, the probability of failure occurring with the drive or the chassis itself, how long would it take before enough failures occur that you actually experience data loss? And we try to keep that number in the thousands of years.
Starting point is 00:24:21 Yes. You know, in a thousand cluster, you know, 5,000 clusters every year, you might actually have that happen. But very few of our customers have 5,000 clusters. So generally speaking, we're fairly OK. Now, of course, that's the lower limit. So we always try to go beyond that. If we ever ran into a case where the size started to impact that, we would probably start to ratchet up our protection levels. And so you'd start to trade a little bit of efficiency.
Starting point is 00:24:50 And this is what everyone who kind of uses this technique of erasure coating across drives to get endurance will do is they'll start to trade off efficiency to get more durability out of the system. And so we have that lever at our disposal. Fortunately, we didn't have to use it to go around. Oh, that's interesting. So you would over-program your resiliency if you felt like you had to deal with a less reliable part, whatever the part is. Correct. Huh. Yeah.
Starting point is 00:25:22 Well, so that's good. That's, you get all the benefits now and you haven't had to work real hard to get there outside of your traditional qual process and the fulfillment support arm to supply these things out to the field to deal with potential replacements or whatever. I would say the most important thing is that we did the calculations, that we did all the quality testing to make sure that this was a high-quality product that met those endurance levels that I described earlier, so that our customers who are running, I'd call them mission-critical applications.
Starting point is 00:25:57 They're not Oracle databases, but they certainly are business-critical for those customers. In some cases, if it's life sciences or healthcare, then it's very mission critical and can feel comfortable with the data. So I agree. It was a fairly engineering light lift, but we had to make sure that we ran it through the paces. So for those customers that are worried about, that still remain worried about their right impact on the drive in terms of endurance,
Starting point is 00:26:32 we talked about the drives have telemetry, your system has telemetry. If they're in a current power scale system, is that something that you can, I don't even know how that looks in, in 1FS. Is that something that they can see or something that you can tell them be, Hey customer, I know you're worried about your, your rights, but you're at, you know, 0.2 and a half drive rights per day, which is well below the spec. And can you, can you see that in your systems and share that with the customers to help them feel more comfortable? Both. You can report on it through command line interface or APIs and you can see it in alerts. If something like, if we were to run into a situation where a drive was experiencing errors, we would know about it, you would as well. Right. But they could also, you could also tell them like,
Starting point is 00:27:20 hey, looking at your current workloads, you're well within the window for what we can handle with this particular drive spec. Yeah, there's ways of reporting on that. That's right. Okay. So the other big, yeah. It's like belt and suspenders, right? So the belt is like, we did the testing.
Starting point is 00:27:37 We feel good about it. The suspenders is that there's enough alerting and reporting capabilities in the product that if you were to get near the rails, you know, we would know and you would know, or if you're getting to the rail. So your next challenge then will be to push these guys to crank out a 64 terabyte part so you can continue to pitch your density story in power scale with new bigger drives, huh? Oh, well, that'll be the next challenge for sure.
Starting point is 00:28:06 We'll see whether we have to make any adjustments to accommodate those. But I would imagine that we're all looking for that. I think that drive vendors as well as storage vendors appreciate the value of density. And so I would imagine all of us have got that goal in mind. Yes, I suppose so. So you talked about security. That was the other thing that you guys were excited about from a 1FS perspective back at DTW. Everyone's worried about cyber resiliency and attacks and everything else.
Starting point is 00:28:37 And file data, unstructured data is kind of one of the more dangerous spots, I would think, with so many points of exposure, all the file data that could be corrupted or attacked at any time. How much responsibility falls on the array for that versus backup software or other protection applications? How do you view that? That's a great question. So I happen to own cyber protection products on both of those portfolios. And so it goes back to belt and suspenders. I kind of feel that it's a, it's a nuanced discussion, but it's an important one. If you're backing up data, you have a responsibility to make sure that that backup is secure.
Starting point is 00:29:29 Because that's your last line of defense for your business. Hopefully you're backing up data that requires backup because backup infrastructure is not free. And so if somebody attacks a backup and that backup is destroyed, you've just lost your very reason for having backup to begin with. Which means if somebody were to accidentally delete something or something bad happened, you have no way to recover. So let's just put that on the table that your last line of defense is your backup. And if you're backing up data, you have to find ways to make sure that that data is impregnable, that you've got the right checks and balances in place to make sure that any latent threats have been discovered and that you have a fast way to remediate against them.
Starting point is 00:30:11 If something's been corrupted in some way, shape, or form, and you find out about it, that you have locked down your system and you can go back to a last known good copy and figure out where that was. That said, when you talk about unstructured data and file data, oftentimes it will reach a scale where backup products simply don't usually extend into. So if I'm looking at a customer who's got, let's use this 30 terabyte all flash unit, and they went and built a 30 petabyte cluster out of it. The likelihood is that they're not backing that up using traditional backup infrastructure. And so it's just too big, right? Too cumbersome. So quite possibly they're just replicating snapshots and locking down snapshots at the secondary location.
Starting point is 00:31:02 Great. Now, if an attacker gets access to the primary and starts to encrypt data, that will get replicated to the secondary location. Great. Now, if an attacker gets access to the primary and starts to encrypt data, that'll get replicated to the secondary. And so... Yeah, it just passes through. And now you're in trouble on both accounts again, right? Exactly. So just like I put a lock on my front door, if an intruder gets through the lock, I have to have a secondary security system. And so I have cameras around my home. I thought you were going to say a shotgun, but okay, camera's fine. Yeah, a little more innocuous, but you know. Fine.
Starting point is 00:31:34 In the Midwest, we do things a little differently out here. I'll make sure not to break into your home. Thank you. The point being is that what we do on the primary side is it'll expand to include scanning, but right now it's heavily focused on the actual user behavior. Because in the case of primary, there's a lot of data to go analyze and figure out if something bad happened over the last course of time. So instead, I'm looking for what's happening right now. Are people accessing data they're not supposed to? Are they behaving in ways that they're not supposed to?
Starting point is 00:32:10 And I want to be able to lock that user out of the system, revoke their privileges. I still want an air gap solution so that I actually have a completely remote version of the data that cannot be accessed by intruders who are vectoring in from the primary. So they're basically behind the vault type of a solution that opens the gate periodically and says, hey, is everything okay? Send me what you got, right? And then locks up. So intruder can't get in, but it keeps multiple point in time copies that are
Starting point is 00:32:41 immutable. So we have that type of a solution as well. If your system is small enough that you could back it up, you have the option to do both. You could use the backup with a cyber protection to make sure your backups are not, because typically a backup system is not just for one NAS. It might be for multiple NASs, it might be for a NAS and a couple of databases
Starting point is 00:33:03 and a few VMs that are running over in a pocket somewhere else. You still want to lock it down unless you've dedicated that backup system to that NAS. And then on the NAS, you probably want to lock it down as well because detecting right away that someone's breached your environment or somebody's behaving in a way that's apparent is important because otherwise it'll just get propagated to the backup. You'll still have to restore, but you're just going to propagate the problems. The sooner you shut it down, the better. And so there's both of those solutions in place, but it's really kind of a user has to decide if I'm backing up,
Starting point is 00:33:36 I have an obligation to maintain the integrity of my backups. If I'm running primary, there's oftentimes cases where I just simply cannot back it up. Therefore, I will have, I will want another way to secure it. What I like about what we've done in PowerScale is that if I look across the solutions in NAS, some of our competitors, nobody has a nice vaulted solution the way that we've built it with a pure, with a full air gap. It's an inside the vault solution. An attacker cannot vector in.
Starting point is 00:34:05 It's a different set of administrators, potentially. And it periodically checks so that the gate is not constantly open to figure out what to do next. And that really sets us apart. Well, talk about that a little bit more from a process standpoint because you're talking about
Starting point is 00:34:25 storage admin stuff. You're also talking about ITSEC stuff at the same time. In the orgs you work with, who's really taking ownership over data security for what you do, for unstructured? And how much of that is a security issue, how much of it's a storage issue, how are orgs adjusting to deal with these risks more or less in real time as you describe it, rather than retroactively, let's go to the backups to find how far back we've got to go to get one that's not infected or whatever, where this thing wasn't lying dormant. But in the real time model, how are you seeing that work with your customers? So, you know, obviously a lot of this is being driven by the CISO.
Starting point is 00:35:15 That's their charter. That's what they're looking at. And then they're driving it top down. That said, it doesn't take long to look around on the news and see how often these breaches are occurring. If I'm an infrastructure owner, I'm going to be paranoid about my data. And it depends on the organization. But a lot of these, you know, we were talking about these high performance drives, a lot of them go into regulated industries who have penalties that they have to basically cough up or have to deal with the repercussions of their data being offline. And so it's kind of coming from both sides.
Starting point is 00:35:50 I would say that in the space of backup, the folks who are focused on backup and have that as a job title or as a responsibility tend to be more attuned to it in terms of the urgency around implementing these solutions. But I will say that at the same time in the file space, I think we have over a thousand deployments of the ransomware defender products. So it's not like it's not there. It's pretty well understood. Will we do more? I think that that will increasingly become
Starting point is 00:36:30 something that is just going to be a check you know, check the box has to be part of the order coming from our customers, not just us. But I do think the backup administrators are ahead. I do think that the conversation starts with the CISO and you meet in between. And, you know, so. Yeah. So I'm thinking through from a portfolio perspective, and I know the rest of these things aren't under your umbrella exactly, but can Dell take what you've learned here on PowerScale and, for lack of a better word, port those types of security solutions to scale and Max and Flex and all the other stuff that you offer? Is that part of, I'm just, I, we haven't ever talked about this before. I'm just wondering how much of that universalizing some of these chunks of functionality can happen and then spread across the portfolio. So, so interestingly, you asked me the question, PowerMax recently implemented some metrics that
Starting point is 00:37:23 can indicate whether or not they believe that something bad is happening in your environment as well that gets sent up to a security index that's reported on in cloud iq and so all of the products will start to report on some of these metrics within cloud iq to show at a base level you know is there a score that you can look at that tells you, hey, I think there's something going on or it looks pretty clean. That's at least kind of your first line of defense, not your last line of defense. Obviously, you may want to get more sophisticated than that. And that's where some of these additional product offerings come in. And we do take a good, better, best approach, right? So all of our products have immutability to some way shape or form so you can always turn on immutability and with retention lock and prevent someone from being
Starting point is 00:38:10 able to delete something and look at you know some reporting metrics to figure out whether or not something's going on the next level is the you know user behavior and actual content analytics and being able to create a vault is another step. So it's almost in the other order. It's actually immutability is kind of how do you protect your data? Then there's isolation, which is how do you put it in a location where attacker can't easily vector in and there's intelligence, which is how do you look at either attacker behavior or scan the data itself to look for
Starting point is 00:38:45 something that I can notice that to the ordinary eye might not actually look like an attack? Because some of these attacks are getting so sophisticated, they might flip only a few bits and that's enough to really mess up your data and extract a ransom. But it wasn't enough to really trigger anyone that something really bad happened. So it might stay dormant in your environment for months before you actually detect it. Unless you had some AI ML that was learning from other people who got attacked and go, oh, wait a minute, whenever I see a pattern that looks like this, that's an attack. And so we're building all of those capabilities into all of our products.
Starting point is 00:39:22 And each one of them will essentially feed into different levels of scoring, whether it's just a security metric that's reported on in CloudIQ or a more sophisticated product offer or maybe the best product offer, which includes an air gap vault. That's interesting. Yeah, I mean, we know that the backups are now the primary target for ne'er-do-wells that are looking to cause problems. And as you said, the data suggests pretty heavily that they will sit as quietly as possible for some period of time, because then you end up backing up, backing up, backing up. And now you could have months or days.
Starting point is 00:40:00 I mean, who knows how long that piece of nefarious code or whatever is there just kind of waiting to be activated. But I do like the notion of taking everything you've learned and not just leaving that in one customer's library. But let's say, you know, let's be collaborative here and take these signatures and these behaviors and figure out a likelihood that this has shown up in your environment. And if so, here's what we need to go do to remediate it before it's a serious problem. I mean, that's got to be a big part of the future for getting out ahead or as ahead as you can be on these attacks, I would think. That's right. I mean, it's really become like an antivirus-type world out there. We have to get smart about how malware is making.
Starting point is 00:40:52 This is a form of malware, right? It gets way into your system. So try to do that as much as possible. You made a comment before, I want to revisit for a second, about the unstructured data at scale being really cumbersome to back up. And it's not something that I've thought a lot about. Is there a common wall that customers hit? Is it 10 petabytes?
Starting point is 00:41:19 Is there some number where they're like, okay, once we get this far, it's too hard or too expensive or too slow or whatever? I mean, I don't want to throw numbers out there and get people wrapped around them, but there is probably a scale where you're either saying, okay, you know what? I'm not going to back up everything. I'm going to be more selective about which data sets within my environment i back up okay what i'll tell you is that you know it's not uncommon for us to see 10 to 20 petabyte plus environments even 200 petabyte environments not completely uncommon and so at that scale the backup infrastructure to support it would be enormous um and so you know for a number of different reasons including the change rates would be just
Starting point is 00:42:11 off the wall off the charts so um you know there's probably in that petabytes multi-petabytes range where um customers flip over to either doing selected backup of just a portion of their environment or just using replication and snapshotting as a way to protect their data. Does this then, I mean, we started out talking about Apex and Alpine. Alpine specifically has got to have your customers pretty excited because now if i can ad hoc spin up a couple petabytes in a cloud of 1fs and use that as my replication target rather than you know whatever they do now copying the the infrastructure or taking old nodes and making those replication targets or doing it on disk because it's more cost effective
Starting point is 00:43:06 or whatever. Like there's a lot of other expensive and or suboptimal ways to do this. But Alpine seems potentially like a really great fit for these guys that are too big or for anyone that wants to replicate this stuff off and still have the accessibility to it know fail to it if they have to oh yeah i mean all kinds of interesting topologies pop up for just a one-time burst of cloud because i need to go stand up a pretty large cluster i've got the data but i don't want to go buy 10 000 compute nodes for a day um so i'd rather just rent them from the cloud to what you said, like, oh, let's do periodic updates and maybe we'll spin down the data during times when it's quiet and spin it back up and catch up to where it was before to find out ways to low-cost, really low-cost object storage. It's the ultra-cold archive. So all of these things come into play.
Starting point is 00:44:16 And then there's just a pure agility play. Like, you know, we happen to be, I happen to be in a fortunate spot in that Dell is a pretty large vendor of both software but hardware as well and as a large vendor of hardware we enjoy really good relationships with our suppliers which you talked about earlier and so typically I can supply my customers with gear when they need it but some customers or some of you know some of our competitors may not have that luxury. And so if you've got an environment where you've got rapid change and it's unpredictable,
Starting point is 00:44:50 and so I'm looking at something that might grow by two petabytes tomorrow and seven petabytes the day after, who knows what the day after that, one of the advantages that cloud gives them is that almost instantaneously accessible supply chain. I don't worry about, you know, oh, it's going to half a year to go get my gear. instantaneously accessible supply chain. I don't worry about, oh, it's going to be half a year to go get my gear. Okay, well, let me just hang out and wait. If a customer doesn't have that luxury, that agility is pretty critical. Well, sure. It lets you spin up almost instantly, right? As fast as you can swipe your credit card
Starting point is 00:45:25 and be on your way. Yeah. I mean, this is a good conversation. We've covered a bunch of stuff I didn't expect to. So I appreciate that. It's always fun to go in and learn more about what's going on. It was great to be back in the lab and in Hopkinton again. It's been a little while since we've done that and seeing some of this technology come to fruition, the hardware bits. I mean, I know you guys are really into the service and cloudification of everything, but I like to see and touch stuff still. So it's good to be hands-on with these things. And it's real. I mean, I don't know exactly how long you've been shipping QLC, but I saw it in a number of the nodes on site as we were messing around there. And yeah, it's pretty neat to see. Yeah. I mean, look, hardware is not going away, right? I mean, in fact, if anything,
Starting point is 00:46:23 for these really large environments where something like 30 terabyte makes sense, we see that the cost of running in cloud, unless you really need that agility, you know, it's not cheap. And so our customers are going away from the mentality of cloud first to like cloud when it's right,
Starting point is 00:46:46 you know? Right. So they have to do that little back of the envelope calculation to say, hey, wait a minute. Is this a good environment for cloud or is this one one I might keep it on? And at very large capacities, you know, again, I mentioned 200 petabytes, but actually it's a thing. That calculus works in favor of, wow, that's a big chunk of change I saved by staying on-prem.
Starting point is 00:47:14 And so I don't think that it's going to dry up in terms of the demands around density that we talked about earlier, performance and large cluster sizes. All that stuff is all going to be there for the foreseeable future. Well, one of our most popular social media videos of the last couple of weeks was the Buffalo Bills coach, I think is the offensive coordinator that was freaking out, smashing everything in his coaching booth after they couldn't get the snap off. And we clipped that in the tagline was something like, you're CFO after they couldn't get the snap off. And we clipped that in the tagline with something like,
Starting point is 00:47:45 you're CFO after they get their first cloud bill, which is kind of what you're talking about, right? Is that it's, maybe that was a dramatization via meme, but it can be hard sometimes to really estimate and get a full handle on what that is. And again, what you're talking about at scale
Starting point is 00:48:05 with unstructured can be insanely expensive if that's not value optimized for your workloads and what you're trying to do as an organization. Absolutely. Yep. I mean, it's the difference between rent versus own, right? I mean, you know when you're going to go on vacation, you don't just buy a home in the place you're going to
Starting point is 00:48:26 unless you can absolutely afford that. Congratulations to you if you can. But, you know, it's a temporary thing that you need, then you go rent it. Or if it's something that you need sporadically, that might make sense too. But Airbnb every day doesn't seem like a good way to manage your capital.
Starting point is 00:48:44 No, especially since they outlawed the parties. I'm not an Airbnb guy anymore. So ruin my whole vibe. Better cut back on that. Well, this is great. We've got a report coming out in a couple of weeks that's going to dive way into all of this stuff and cover it. And normally I would have you on after we publish the report.
Starting point is 00:49:03 But, you know, so it worked out to get you on before. So this will be a great teaser for anyone that wants to learn more about what's going on with PowerScale, what you guys are doing with QLC, some of the performance data you talked about. We'll have some numbers to look at too. And I think it's great from a store, just selfishly as a storage guy, to see a vendor out there pushing the limits of what can be done and addressing customer needs with great products and being unafraid to go do that. And so I give you guys a lot of credit for that. Well, thank you. Appreciate it. Thanks for joining us today, David.
Starting point is 00:49:42 Appreciate it. No problem. Anytime.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.