Grey Beards on Systems - 148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex
Episode Date: May 18, 2023Sponsored By: This is one of a series of podcasts the GreyBeards are doing with Dell PowerFlex software defined infrastructure. Today, we talked with Anthony Cinelli, Sr. Director Dell Technologies an...d Brian Dean, Technical Marketing for PowerFlex. We have talked with Brian before but this is the first time we’ve met Anthony. They were both … Continue reading "148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
This Greybeards on Storage episode is brought to you today by Dell PowerFlex Storage.
And now it is my great pleasure to introduce Brian Dean, PowerFlex Technical Marketing,
and Anthony Cianelli, Senior Director, Global PowerFlex Software Defined and Multi-Cloud Solutions.
So Brian and Anthony, why don't you tell us a little bit about yourselves and what's new with Dell PowerFlex?
Awesome. Thank you. Thank you so much, Ray.
And pumped to be on the podcast today.
PowerFlex is at a pretty exciting inflection point here for Dell Technologies.
There's just so many things going on in the world of infrastructure and cloud
where customers are consistently looking for ways to drive standardization across the data center.
They're trying to eliminate all different silos of architectures and silos of platforms that
they've brought in over the years. They're trying to simplify and consolidate all of that.
And they're trying to get some level of consistency and experience between what they're doing on-prem and what they're doing in the cloud.
And what's really exciting for PowerFlex right now is it literally lives in the middle of all of that.
And it's helping deliver some really transformational outcomes to customers, helping them drive towards these true infrastructure transformational outcomes. And we're doing it at some of the largest organizations on the planet,
for the largest data centers on the planet,
handling some of the hairiest, most mission-critical workloads that are out there.
So it's a really exciting time to be in and around the world of PowerFlex in general,
driving some of this transformation out there with customers.
And Brian, you want to talk a little bit about yourself and what's going on?
That's a really great way of putting it.
So, like you said, Brian Dean with PowerFlex Technical Marketing.
I've been familiar with the product and working with it for the last six years. And as a distributed software-defined infrastructure platform, I think it has been transformational
in the customers that have been able to make use of it.
One of the things we'd like to probably get into and clarify as we go through the work
today is, what exactly is it?
It's been marketed and positioned as high-end storage, as converged
infrastructure, or hyper-converged infrastructure, or software-defined storage. What does that even
mean? And so it's a little bit of all of these. I like to use an analogy out of like The Princess
Bride, right? So when this movie came out, it was a flop. And part of it was they didn't know how to market it. Is it a comedy? Is it an adventure story? A fantasy love story?
Well, it's all of them, but it doesn't fit into any box.
It's none of them exactly. It's a little more than all of these.
But once people started watching it, then it became its own, you know, classic hit.
It's classic.
People understood it.
Whereas with PowerFlex, I think it's the same kind of thing.
One thing is like, it's a little of all of these.
It's none of them in particular.
Once people start using it,
they see the power of it and how much it can do for them.
Anthony talked a little bit early on about some of the current issues
facing the data center today.
Would you like to maybe talk a little bit more about that
and how that plays out
and what the infrastructure, I guess, evolves into?
Yeah, sure thing.
So what we've seen is when you go to any typical customer
of any type of size or know, size or scale,
they all kind of have an infrastructure that looks the same in that it's made up of a lot
of different things. And I like to break it down into like three different categories.
You know, first you'll have what I'll call the general purpose estate, right? You got a large
virtual workload environment, maybe a whole bunch of different, you know, database workloads. And typically we'll see things here like your traditional
three-tier stack. Maybe you see customers dabbling in things like hyper-converged.
This tends to be a large part of the environment and customers are after,
how do I do things cost-effectively and how do I make it simple, right? That's kind of what
they're after. But that's only one part of the environment. Then you step outside of that world and you oftentimes also find some type of specialized systems where you have a certain platform there to just being a unicorn in the data center it's just there to do one thing
and then on the other end of the spectrum you have this whole new emergent world
of what i'll call modern scale-out workloads these are your you know your no sql databases
maybe your modern analytics workloads and the infrastructure here also tends to look different
because it needs to scale out by nature right so? So you end up with maybe a lot of servers with just direct attached storage.
You go to any customer's environment, they usually have those three categories all at the same time.
And there's just an incredible amount of complexity there.
And you're not even talking about things like acquisitions where company A buys company B and there's a whole different environment that they have, right?
Exactly.
There's just so much complexity because workloads tend to look a little bit different and customers just end up with all these different architectures that they then need to operate. If I kind of summarize PowerFlex in one word, it's consolidation.
It has a very unique architecture that allows us to take all those different platforms and consolidate it down into one universal infrastructure that can solve for your traditional virtual workloads, can run your databases,
can deliver the specialized performance, and has this scale-out capability to also
deliver for the modern workloads. So PowerFlex is really all about consolidation. And there's just
such a focus right now for customers to think about, how do I get standardized infrastructure? How do I consolidate? How do I drive out cost? Because the reality is having all these different platforms, you end up with tremendous amounts of waste across the infrastructure.
Utilization and all that stuff. Yeah even utilization, there's just waste, right? And by driving consolidation,
it helps you drive up utilization. And that's what's so unique about PowerFlex right now.
And what's exciting about it is, again, we don't need to sell customers on the value of
consolidation, right? That's been proven in IT over 30 years. If I can go from five widgets to
one widget, I'm going to save some money. I'm going to become more agile. I'm going to be able
to move faster, all that stuff. What's so unique about PowerFlex is it has an architecture that
actually allows you to consolidate at the modern scale that customers deal with today. And that's
just a really, really exciting thing that just frankly,
nothing out there we've seen has the capability to do to the same degree.
That's a huge ask to be able to do all those sorts of things with the same
storage architecture and stuff like that. I mean, databases, right? NoSQL, big data,
AI and machine learning, that sort of stuff.
I mean, this has got very diverse, I'll call it performance characteristics, those sorts of things, right?
Totally. And I kind of group it into like there's three things you need to solve for, right?
The first one is what I'll call ecosystem supportability, right?
So today, a lot of modern software-defined
architectures, they're interesting, but they're interesting and useful only if you're running a
general-purpose virtual machine. So right out of the gate, one of the things that makes PowerFlex
super unique is I can run a virtual machine. I can run a bare metal database workload,
and I can provide persistence for a containerized workload,
whether that container is running on a VM or running on bare metal. So from an ecosystem
perspective, we can support yesterday's workloads, today's workloads, and tomorrow's workloads.
The second piece is what I'll call the architectural scale. The unique thing that PowerFlex brings to the table,
it has a truly disaggregated architecture where I can provide you a full stack value prop with
simplicity, automation, and lifecycle management like a hyper-converged appliance would, but I do that with a complete physical decoupling of compute and storage.
The value in that is I can now help drive economics at scale.
As your compute requirements grow, you simply grow compute.
As your storage requirements grow, you simply grow storage.
You have a complete decoupling in how you're able to scale those resources.
That means you're never adding or paying for a resource you don't need.
You're never licensing a resource you don't need.
And it allows you to optimize for things like database licensing, which is incredibly expensive.
The third piece, and one that ties it all together, is PowerFlex has a very, very unique IO architecture
that allows us to deliver incredible game-changing performance. And as you scale and grow the
environment, that performance will scale and grow linearly along with it. The value in that
is not that customers need the millions of IOs that PowerFlex
delivers. They don't. The value in that is that they don't need to think about or worry about
performance as they consolidate. They can truly consolidate with confidence. And that's the
superpower of PowerFlex. I now have this scale-out architecture where I can run my Oracle.
I can run physical SQL.
I can run my general-purpose VMware.
I can run my containers all on the same platform.
I can scale my compute and storage in a completely disaggregated way, driven by whatever my application requirements are. And then I know I have more than enough performance to go around
to truly consolidate all these different workloads and not have to worry about noisy neighbor,
not have to worry about, you know, huge operational burden from performance management.
All that goes away. We're going to have to get into all this technical stuff too, Anthony,
but I understand where you're going. It's kind of a huge, potentially huge system.
But I mean, you talk about on your website, talk about thousands of nodes.
Are there customers out there with these sorts of configurations?
So believe it or not, yes.
And this is what's really exciting.
The cool part is when we bring PowerFlex to a customer today, we talk about this value prop, but then the most exciting thing is no matter what that customer scale is, we would not be learning on them.
PowerFlex is already proven at the single largest scales imaginable.
It's deployed in production at four of the five largest banking institutions in the U.S.
Our single largest customer has over 800 petabytes of PowerFlex deployed,
running core banking workloads and banking applications.
Now, that doesn't mean PowerFlex is just for customers that have 100 petabytes or more, right?
Absolutely not. Now, that doesn't mean PowerFlex is just for customers that have 100 petabytes or more, right?
Absolutely not.
But what's really exciting here is it's not often that you're able to kind of approach a truly transformational technology with a unique architecture,
yet also see it as something that's incredibly proven at the largest scales imaginable.
And that's exciting because I can now go to a customer, general enterprise, general kind of mid-market customer, and we have tremendous confidence in our ability to execute because we've already proven the value prop in these massive environments
with these super heavy hitting, hairy workloads. And again, we're happy to get into the architecture
specifics of what enables this. But what's really cool here is it's already proven at some really, really large customers.
And that's exciting.
And that's the biggest scale.
That's battle testing at the largest scale.
But it doesn't have to start in the hundreds and multi-hundreds and bigger setups.
I mean, we'll start with a four-node storage cluster.
So if I was doing some sort of an edge environment,
I could start PowerFlex with a four-node configuration and build from there?
Yeah, so PowerFlex, you know, minimums, right?
Starts at four nodes, four storage nodes.
And then we also have a concept of PowerFlex compute nodes.
And you could literally have one of those,
three of those, you know, whatever. And then we also have a concept of PowerFlex compute notes. And you could literally have one of those, three of those, whatever. And important to know, PowerFlex, it's not just a storage thing. It truly is about how do we not just deliver really scalable, really performant
storage, but it's also how do we help customer simplify and transform the operations?
So when you're buying a PowerFlex solution,
yes, you're bringing in PowerFlex storage,
but you're also running PowerFlex compute nodes
that give you the benefits of automation and lifecycle management
for the compute layer as well, inclusive of ESX, for example.
And it's that full stack value prop that becomes really interesting to customers.
So essentially, there's three form factors for PowerFlex.
We have storage nodes, we have compute nodes, and we also have hyperconverge nodes.
So whether you're doing, you know, four nodes in an edge location, you're doing anywhere from four nodes to, you know, 14, 40, 400 and beyond nodes in a data center.
PowerFlex has the ability to solve for that, which is pretty
powerful. So this sounds more than a product. It sounds like a operating model. Can you talk to me
about that operating model? Because when I'm thinking about my bare metal Oracle workloads, my Kubernetes bare metal solutions, my ESXi, my stuff that's running on Red Hat virtualization platforms.
These are all models of operating that I have to select and I have to be thoughtful about. What you're trying to sell me is this idea
to kind of forego those models as the primary method of addressing storage, compute,
and automation in my environment and kind of go with a PowerFlex first model. Not quite. If I can rephrase that, it's to use all of them with PowerFlex as your sort of universal stored framework behind everything.
Maybe that's a different way of putting it.
It's not that we're replacing it with XI or RHV or something.
It's that we can simply operationalize and standardize everything underneath and behind those.
So the practical challenge, you know, this on paper, that sounds great, but in practice,
you know, there's minimum firmware needed for, to make sure that I get support on my SAP workload.
My, my solution has to be validated. So when I upload, when I upgrade my storage
firmware, I got to make sure that all the underlying components, and this is where the
complexity comes in. All the underlying components and requirements are in line with this. And this
is where customers get stuck. And this is where silos end up being created. How are you helping customers solve that that siloed problem that this is a great solution?
Again, I'm sure if I had all HCI solution with ESXi or if I had all Oracle solution, it gives me that flexibility.
But when we're thinking about these mixed workloads and it's mixed operating model, this is where my skepticism
starts to come into play. Yeah. You bring up a great point, Keith. And this is one of those
areas that I think the uniqueness of PowerFlex becomes really interesting, right? Because
one of the challenges of HCI is exactly what you bring up. It's super simple,
as long as you're not trying to do anything outside the
norm with it of what I'll just call a straight VMware general purpose ESX type workload.
Where PowerFlex becomes interesting is it gives you that HCI-like concept, but with the flexibility
of that kind of traditional three-tier stack, where your compute and storage have the ability
to be operated somewhat
independently. And on the compute, you can run a variety of different stacks. So within PowerFlex,
there's a tool, if you will, called PowerFlex Manager. And what PowerFlex Manager does is it
provides the operational aspect of the environment. And to kind of work from the bottom of the stack up,
it will do at the storage layer, all of the storage, you know, deployment, add, remove nodes,
lifecycle management, firmware, BIOS updates of the hardware, of the software-defined storage.
All of that gets done by PowerFlex Manager. As you move up the stack to compute, you have the
ability to operate it multiple ways. You can have it fully update lifecycle and operate that VMware kind of compute node, if you will,
where PowerFlex will literally deploy ESX, all the right BIOS, firmware, and driver levels that have been pre-tested, pre-validated.
When it comes time to upgrade or apply patches, it will automate and do all of that on the system.
But then let's say you step to a workload that's maybe not running on a hypervisor, not running on VMware.
Maybe it's an Oracle running on a physical Linux host.
You have the ability for PowerFlex Manager to treat that host with what we call bare metal,
where it will go and treat that host and just talk to the BIOS and firmware on it. So it'll update just
the hardware of the server, leaving the customer to operate their Oracle and Linux OS image as they
have historically. Same would apply to something like OpenShift, right? Where let's say the
customer wants to run OpenShift on bare metal, they can have PowerFlex manager operate the server hardware, if you will, where I have Hypervisor integrated into the lifecycle
experience of the platform. And within the same system, using this concept known as services,
you can also have additional platforms running on top where we don't have to do all the bits,
giving you this one singular landing zone that you can have all these different stacks operating on top of according to the way you need to operate them.
So a practical problem that we've run into, and it sounds like you've thought this through, a practical problem that we run into is when we have a converged system that's fully integrated. There's ESXi. There's some type of lifecycle management tool. There's the
collapsed storage and compute. And I want to go from one version of VMware vSphere or VMware vSAN
or whatever the control plane is to the next. You solve this firmware problem of needing to go out
and get the latest and greatest firmware. That automation is there. And then it seems like in that same platform, if I understand you
correctly, if you just want to consume this as a NFS compliant, iSCSI compliant, standards
based solution, you can do that. So I can connect a consistent, persistent volume in a Linux host to deliver as Kubernetes.
And I've abstracted away that storage piece of it, the software control plane of it.
And I'm just consuming it as compliant storage.
So let me back up the conversation just a touch.
I think I can answer that question and maybe some remainders from the previous one.
So what PowerFlex is, we'll call it a software-defined infrastructure.
And I like to call it a software-first architecture because even though it has to run on hardware,
in theory, in principle, at the base base it is just software you bring it some x86 some ethernet
and some direct attached storage and install the right pieces of software on a compatible
operating system and off we go you know there's different pieces of software that allow it to do
different things fundamentally there's just three of them. There's a software-defined,
I mean, there's a storage server
that works on a node to aggregate those local disks,
bind them together with other nodes,
create storage pools and different layers of complexity.
There's a client,
and this is partly getting at what you were just asking,
a software SDC, so a storage data client,
which runs normally in, well, not normally, it runs in various operating systems or hypervisors
in the kernel and is able to map to those storage nodes and consume storage from them,
presenting volumes to the hypervisor or the operating system. And we have these for all
kinds of platforms.
There's also pieces of software that do all the metadata management, additional pieces of software that enable different features like replication or NVMe over TCP. But fundamentally, you just got
these storage creator, storage consumer, and some management layers. And it doesn't matter whether you put those on separate nodes
and they talk like a disaggregated thing, or you put them into sort of the same node
and you do a hyper-converged thing for an individual node, both creates and consume
storage in a cluster. That allows us, and then they don't care, right? And you can mix them up.
Well, you can have some things in a cluster just providing storage, some just consuming and some doing both. That's okay with us. This allows all
that great architectural flexibility. We work with lots of different hypervisors and operating
systems, but it also provides many layers of complexity that users don't want to get into.
So I think it's the beauty of what Anthony was talking about here a minute ago
with PowerFlex Manager is now we'll take all of the complexity that's possible
and we'll provide you easy templates to deploy and manage it
along with all the hardware life cycling that that happens to be sitting on
to provide the ease of operations across the board.
Now, Brian, I understand how the client and the server
can facilitate what I'll call block access.
Do you also offer file access?
Yes, that is new with version 4.0.
And does that support like standard NFS
or standard SMB services?
Yep, all the standard protocols.
You don't necessarily need a client to support that.
Is that what you're saying?
No.
We have our file controller nodes
that sit there in front of the rest of the cluster,
and all of the PowerFlex juice and scalability
sits underneath the file systems that those will serve out.
But it is from the client perspective, from the file client perspective, it looks just like any other file system that's being presented to it.
So standard SIFs, NFS, S&B, et cetera.
Right, right, right.
So all the normal file operations. Right.
And so you've got this file controller node that, that provides,
I'll call it, you know,
file services and uses the PowerFlex backend for its bulk storage, storage pools and that sort of stuff. Correct. Yep.
And you mentioned lifecycle management.
We're talking lifecycle management for some bare metal solutions.
Are you just talking the client side of that, the storage data client?
Or are you talking like the whole OS and firmware and hardware?
And, you know, there's plenty of, I'll call it,
different x86 systems out there,
not necessarily all of which are from Dell.
I know that's kind of a foreign concept, but do you also offer those sorts of services for non-Dell hardware?
So the lifecycle management in PowerFlix is obviously the storage node hardware and software layer.
On the compute node front, it is the server itself,
right? BIOS, firmware, et cetera. And then it also has the ability to lifecycle ESX.
For a bare metal host or a non-ESX operating system, PowerFlex Manager, we're not going to want to touch that, right? So like, again,
if a customer is running Oracle on physical Linux, that's going to be the customer's Linux OS.
They're going to be responsible for that lifecycle at Patchit. We're going to handle the compute node
that it runs on. Now, to the question of, hey, you know, whose x86 is it? What we have found over the years is that when it comes to software defined, customers
want the value and the consolidation that a scale out software defined architecture
can provide, but they don't necessarily want the science projectiness of it,
of mixing and matching different hardware from different vendors. So while PowerFlex as a core software-defined storage can run on any x86 hardware from any vendor out there,
the experience around the automation and the lifecycle management is when specifically deploying it on PowerFlex nodes, PowerFlex hardware, which obviously is Dell PowerEdge based. but doing that with simplified operations by delivering it as a full stack experience
and not simply saying to a customer, hey, here's some software, go build infrastructure out of it,
because the reality is that's not core to pretty much any customer's business these days.
So they want to kind of consume the outcome that can be provided without having to really put it together or build it
themselves in any way. What does a typical, I'll call it storage node, look like in this
environment? Is there any special hardware requirements for those sorts of things,
or is it just a standard Dell PowerEdge server with SSD storage behind it? I mean... Yeah, it's actually pretty straightforward.
When we call it, quote, PowerFlex nodes, they're PowerEdge servers with SSD or NVMe in them,
standard CPUs. There's nothing special or proprietary about the hardware itself. It's standard PowerEdge.
If we can put it this way,
the hardware gets designed and tuned
to enable the software to behave at its best.
The software doesn't require particular hardware.
So you don't have to have NVRAM
or things of that nature
for special buffering caches or something like that?
So we're not doing any type of caching within the system.
And this is one of the things that's really interesting about PowerFlex's IO architecture
and how we deliver the type of consistent performance that we do.
All of the IOs in a PowerFlex system, and I'm kind of changing gears
on you in case you haven't noticed, Ray, because I was looking for an opportunity to talk about our
IOPath because it's so cool. All of the IOs go directly to the underlying media in the distributed
cluster. So for example, let's say I have very simply a 10 node, 10 storage nodes in my PowerFlex
cluster, and each of those nodes have 10 NVMe devices on them. When you create a volume in
PowerFlex, that volume is evenly distributed across all 100 of those NVMe devices. And now
every IO coming from the compute nodes, all those IOs are being evenly distributed across all 100
of those NVMe devices at all times. There's no cache layer. All the reads and all the writes
are coming directly off of that persistent media. That delivers two things. Number one,
it delivers incredible amounts of performance because now I'm not relying on a cash drive or two to, you know, deliver my performance.
But the other benefit of that, moreOs. There's no concept of like cache hits, cache misses,
skew, workload skew, all that stuff goes away. And when you kind of run performance testing on
PowerFlex, you literally look at it, no matter how hard you push it, latency just, it's kind of a
straight line across because there's no gimmicks in the IO path. What you see is what
you get and it is incredibly consistent. And then as you scale or grow the cluster, so if I go from
10 nodes to 20 nodes or 100 NVMe drives to 200 NVMe drives, my volume will now get automatically
redistributed across all 200 NVMe drives. And I now just doubled my IO performance.
And there's no storage controllers in the way.
I'm adding, every time I add a node,
I'm adding more storage processing power.
I'm adding more drives.
I'm adding more network bandwidth.
Well, I think the other piece of this too
is it's not just linearity with respect to,
you're linearly, you're scaling the capacity but getting
performance to scale linearly along
with that. It's very
predictable. There aren't any
points at which if you add a couple more nodes
you start hitting a plateau or
a choke point at which now you've made it
too big it's going to start underperforming.
It will as long
as we're staying inside the theoretical
maxims which are giant, keep growing.
And also, it doesn't start coming apart when you start filling up the storage.
And I think that's part of the result.
Another result of this architecture is that you get it 75% full and it doesn't start tipping over.
So, you know, we've all done this a really long time.
And the one thing that we know is that scale breaks everything.
The computer science is undefeated.
So let's test this model a little bit.
10 NVMe drives across 10 servers will saturate a network path. What do you guys do to help us mitigate the IO path itself?
Like the, obviously the drives and the servers.
Once, once I, you know, once I get up to, you know, 16, uh, to 32 nodes of this stuff
with all NVMe drive, I have way more, I have more, I have more storage IO than I have network.
Bandwidth, bandwidth, bandwidth, right? For sure. I mean, to be clear, we're not, you know,
we're not breaking the laws of physics, to your point. Those laws are undefeated. The cool part
is, and I'll steal an analogy from one of our great pre-sales team members that created this.
Think of PowerFlex software, almost like a bed sheet. If I put it over a bed, it takes the shape
of a bed. If I put it over a chair, it takes the shape of a chair. And the analogy for that is this.
Today, if I'm running PowerFlex on a bunch of NVMe devices with four 25 gig NICs on it, it's going to run at the speed
of those four 25 gig network ports. Tomorrow, if I'm running it over a bunch of NVMe drives with
notes that have four 100 gig ports in them, I'm going to run at the speed of those four 100 gig
ports. So the exciting part is because the software is not the bottleneck. As hardware increases, as networks get bigger, you just get
to ride that curve of performance. As those things grow, you immediately take advantage of them. So
yes, we're not going to do anything to perform faster than 425 gig connections will allow you.
We'll run at line speed, but we won't run faster
than that. But now as soon as you upgrade to 100 gig or 400 gigs becoming a thing, you can now
operate at those speeds immediately. And that's what's really exciting because again, there's no
storage controller kind of bottleneck. It's not a dual controller array that, hey, no matter how
much I put behind it, I can only do
what those controllers give me. It's not a cash based architecture where you can only run as fast
as the cash allows. It truly is this distributed IO architecture where you will run at the speeds
of the, the network architecture, but then as that increases, so will your speed.
And it's not just network, right? So you can also end up in a situation where you've got, this is part of being sort of disaggregated, where you could have the
storage backend being able to provide a lot more storage than the compute you currently have
available to it can consume, right? They can be, you could have, you know, six compute nodes running
a heavy workload and they're all pegged at a hundred percent, but you have only tapped out 20% of what the storage backend can provide. So you just keep
adding compute, right? Theoretically, you can get to a point where now your compute is getting to
the point where it's starting to saturate what the storage backend can provide. So you just add more
storage. We can keep moving this in different directions to ensure that we're not network bound, CPU bound or disk bound or whatever.
So talk to me about data protection and your environment.
We haven't talked about, you know,
how you protect for drive failures or node failures and those sorts of
things.
Yeah. Great, great call out.
And this is one of the areas we're super proud of
because, and again, you know,
going back to when Brian and I first started doing this,
we used to have to talk about this in theory,
but now what's awesome is
we have a whole bunch of customer data to back it up.
And when you look across, you know,
PowerFlex's big deployments,
this is a true like tier zero,
mission critical type of resiliency platform, which is really exciting.
Right. You know, I'm sure you guys are very familiar of the long history we have with a platform like PowerMax, which is like the gold standard of uptime and resiliency.
Like PowerFlex should be thought of in that same breath from a resiliency perspective.
And we have the customer data to prove it. And it relies on the same concepts that we use to deliver IO performance, which is
many hands make light work. So the way we protect data and PowerFlex is through something we call
a parallel mesh, which is a fancy term for a many-to-many RAID 10, if you will, in that, again, we'll go back
to our example of 10 nodes with 100 NVMe devices spread across them. My volume and my protection
of that volume is done in a many-to-many fashion across all 100 of those NVMe devices. So in the event I have, for example, a drive fail,
all the remaining devices, so one drive fails,
all 99 remaining devices evenly participate in that rebuild.
Same thing, if I have a 30-node cluster, one node fails,
all 29 remaining nodes perfectly evenly participate in that rebuild. And the,
the outcome of that is an incredible amount of rebuild speed. You know, if you think,
you know, traditional kind of storage system, you have a, you know, call it a four terabyte
flash drive fails, you know, your rebuild is measured in hours, right? That same failure on PowerFlex,
your rebuild is measured in literally minutes.
And that's exciting.
Depending on how big it is and how small the data set is.
Yeah, I mean, potentially seconds, right?
So it's all about the speed of rebuild
and taking advantage of that distributed architecture
in order to deliver that incredible
resiliency. And I remember literally doing this and, you know, going back years where we talk
about the math behind it, but now it's really exciting. We just, we literally have all the
customer data, the customer references running these, you know, core banking workloads at scale,
kind of proving out the resiliency of this architecture, which is exciting.
And then I'll supplement that with a couple ideas, though.
So we're actually, we're not protecting the infrastructure.
We're protecting the data.
Right.
Right.
The data and that mesh mirror copying.
Right.
That's what we're protecting.
We really could care less in this context about the underlying disks or nodes.
The expectation is that things fail.
Disks fail. Disks fail,
nodes fail. And the system is designed to bend and flex around that, but not break.
So when we lose a disk, we're immediately re-protecting the data in this many-to-many
pattern that Anthony described. And as soon as we're done with that, we're done as far as any re-protection scheme is after.
And unlike traditional RAID where, yes, you can lose a disk or two disks or more,
but your data is protected in that first instance, but you're not really healthy as far as your data
protection goes until you've replaced the hardware and rebuilt the RAID structure.
Right, right. So there is a hardware problem. We don't care about that. as far as your data protection goes, until you've replaced the hardware and rebuilt the RAID structure.
Right, right.
So there is a hardware problem.
We don't care about that, right?
So in our context, you lose a disk,
we rebuild the data.
It may take, you know, 60 seconds, two minutes,
depends on how much data is on there,
how many nodes are contributing to this process.
Once we're done with that, that's it.
You can replace the disk at your leisure or not.
You'd be down a little capacity, but whatever.
Does that make sense?
Same with the node, right?
Another piece on resiliency that I think actually ties back to one of the points Keith brought up earlier,
which is it's the operational concepts of resiliency.
And, you know, look, I think we're all familiar with like hyper-converged
and the rise that's had in a lot of data center environments.
And one of the challenges that a pure hyper-converged model has, especially in today's day and age where security patching seems to be happening more than ever, is every time I update a host, I'm taking compute and I'm taking storage offline.
And I've got to think about that, right?
Like that's a resiliency concept I need to think about that, right? Like that's a
resiliency concept I need to think about when it comes to things like patching. And because of that,
we've seen a lot of customers say, hey, you know, I need to remain with that three-tier kind of
centralized storage model because I want the ability to patch my compute and not have to think
about what does this mean for my storage? Am I taking storage
capacity or performance offline? The beauty of PowerFlex is that it gives you the flexibility
of three-tier with the operational simplicity of HCI. So when you think about something like
patching, I'm getting all the scale out and the automation concepts that HCI kind of mainstreamed. But because my storage and compute
are physically separate, I can go patch and reboot a whole bunch of hosts. I'm doing nothing to my
storage. I'm not taking any storage capacity or performance offline. So I maintain that operational
benefit that exists in the three-tier world of patching, updating my
compute and my storage as each sees fit without an interdependency on the other.
And that's something especially larger customers have really seen a ton of value in that has
been one of the drawbacks of hyperconverged at scale.
Yeah.
So, I mean, now just to be clear, ultimately you have to update the storage nodes as well.
And during that update,
there would be sort of a cycling through the various nodes or drives in order
to perform that update while they're, I'd say the node goes offline,
if that's even a terminology kind of thing.
Into a maintenance mode where we're expecting,
there's different ways of handling it.
But yes, of course, we roll and cycle node by node
through the storage cluster to update all of the different components of it.
And it says my data is basically sharded across 10 nodes.
This is a rolling update.
I'm not taking that outage, service outage to do this.
You know, because one node goes down for maintenance
because it's turn in the upgrade cycle.
Remember, there's extra copies
on all the other fault units,
which are all the other nodes.
And so it just simply moves to the other stuff
in the meantime.
And then that does finishes its job and we move on to the next one.
Or we can, you know, if you really needed to,
you could bring a node out completely and bring a node back in or add
extras. And the elasticity is there to rebalance anyway.
So you guys mentioned the gold standard and availability within the Dell
world, which is the VMAX.
PowerMAX.
I'm sorry, the PowerMAX.
Apologies.
When I think of protecting uptime with a PowerMAX, I'm generally thinking of, you know, I've
lived in an environment where I supported mission-critical SAP. I just
had two PowerMaxes, believe
it or not.
If a customer's that concerned about
the flexibility, but they want the software
defined nature of that, how do they
match the availability?
What are some of the,
I guess the question is, what's the fault
domain when I'm thinking about
availability of my services? Right. What do what's the fault domain when I'm thinking about availability of my services?
Right. What do you mean by fault domain?
So the fault domain of a VMAX is a VMAX. So I have a second VMAX.
This is more software defined. So if I have 10 nodes, there's kind of a cluster.
So do I have two clusters or how do I design for availability?
Yeah, great, great question. So the answer is there's a lot of ways you can do that.
So within PowerFlex, there's a concept of cluster, which is my overall system. Then within that
cluster, I have a concept of protection domains, which is grouping of nodes. So let's say I have 30 nodes. I can have a 10 node,
a 15 node, and a five node protection domain. Each of those are completely their own fault unit from
a failure perspective. So it's almost like creating software defined arrays within the
overarching cluster. And then there's a third concept called storage pools, which doesn't get
used as much, but actually allows you to drive that segregated protection down to the disk level,
if you'd like. So essentially, you can obviously create, you know, multiple PowerFlex clusters,
and then use, you know, replication, whether it's at the PowerFlex layer or the application layer,
to have data between two completely separate PowerFlex clusters, either within a site or
across sites, or you can create that segregation within the PowerFlex cluster itself at the
protection domain level with the advantage there being, all right, you know, think of,
I have application A, B, and C each running in a different protection domain.
Application B gets shipped off, shut down, retired, sent to the cloud, whatever.
I can now take those nodes in protection domain B, and I can just redeploy them into A and C.
So you get a lot of flexibility there.
The third concept we have, or fourth one rather, is a really interesting one called fault sets. And what fault sets allow you to do
is create groupings within a protection domain
to protect against specific failure scenarios.
So an example being-
Rack or something like that.
Rack level failure, exactly.
I can actually tell PowerFlex,
I have a protection domain that's spread across,
I don't know, let's call it four cabinets.
But I want to make each cabinet a fault set.
PowerFlex will ensure your data is protected outside of the fault set.
So you can actually lose an entire cabinet of nodes.
No problem.
You're still up running serving data.
Application doesn't see a thing.
And that starts to become really interesting.
And that's one of those benefits of software defined. So remember what he said earlier about the RAID 10-ish
like behavior of the protection in the background, obviously you'd never put the two relevant copies
on the same disk, but you never put the two relevant copies on the same node, right? The
idea is that you want to allow for anything to fail at any time.
But the idea here with the fault sets is you never put the relevant copies in the same fault set.
So we're okay with an entire grouping of things going down altogether and never seeing a disruption in your event. And outside of failure, where we actually see fault sets mostly used is
with larger customers for maintenance. So we talked earlier, hey, I want to update that. I do a node at a time. I cannot do a cabinet at a time or a fault set at a time
of upgrading all those nodes at the same time without any type of outage. Now, here's where
I'm going to pull up a little curveball. This fault set concept is super interesting.
It's even more interesting when we talk about the ability to deploy PowerFlex directly into the public cloud, which we now have the ability to do.
Because what I can now do is using that same fault set concept, I can deploy a PowerFlex software-defined array in Amazon, let's say, but I could deploy that across availability zones.
And I can use the fault set concept where I make each Amazon availability zone a fault set, provide AZ level failure protection at the storage layer in the cloud without needing to replicate
your data set into every single availability zone that you want protection from. So Anthony,
we're in bonus time. You kind of set me up. I was going to ask about the question around hybrid
cloud. You can't have a conversation today about hybrid data without having a conversation about being able to manage data in the public
cloud. So is this concept for AWS, is this integrated into the AWS control plane,
not control plane, am I consuming the AWS control plane natively or is there an approach to abstract it away so I can take this concept and deploy it in Google Cloud, Azure or some other cloud that I'm interested in taking the underlying cloud's capability in applying this PowerFlex model to?
Yeah, great question.
So this is software you model to? Yeah, great question. So this is software
you bring to the cloud. So the cloud, it's not a integrated into the tools portal, if you will,
experience. It's truly, I'm using, instead of using PowerFlex nodes on-prem,
I'm creating virtual instances in the cloud that effectively become my nodes that PowerFlex then gets deployed on top of.
And I'm now using that to create a PowerFlex block storage system in the cloud. and that you can kind of bring it anywhere, right? Because it's just running on top of a standard cloud instance,
allowing you to create a private block storage array in the cloud.
And that now unlocks some really, really interesting capability, right?
We talked about one, that multi-availability zone protection concept.
Another basic one is data mobility, right?
I got PowerFlex on-prem.
I got PowerFlex in the cloud.
I can now replicate. I got PowerFlex in the cloud. I can now replicate.
I got PowerFlex in Amazon, PowerFlex in Azure. I can move my data between them.
The third one, though, that becomes really interesting is performance.
So, Anthony, before you go down that path, you mentioned replication and migration in the same
breath. And to me, those are two different functions. I mean, can you explain what you're
talking about there? Yeah, for sure. I agree with you. are two different functions. I mean, can you explain what you're talking about there?
Yeah, for sure.
I agree with you.
Definitely two different functions. The way to think of PowerFlex in that construct is very much moves the data from A to B.
So your kind of traditional storage replication concept, if you will.
So if I have PowerFlex in my data center, I have PowerFlex in the cloud,
I now have a very easy way to get the data from A to B, whether that's for disaster recovery
purposes or some type of migration of that data set into the cloud or vice versa.
And so when you're thinking of replication in PowerFlex 2, it's a volume by volume or volume group level operation
so that it's not that the entire cluster
has to be configured like this is my primary
and that's my target over there.
But for any given volume,
one side of the equation is designated
as that's your source, that's your target.
So you can be running PowerFlex on both sides,
on-prem and cloud, and decide,
okay, now I'm going to move this from on-prem
over to the cloud side,
start operating from that as my primary,
scale up the storage underneath that for power,
run tests, do whatever you're looking for.
Does that make sense?
Yeah, yeah, yeah.
No, it's, yeah, certainly there.
We didn't talk about some of the data services around here.
I mean, so you mentioned replication, compression, snapshotting.
You mentioned RAID 10 kind of thing.
Do you guys support compression and data snapshotting?
We do.
Yep.
Data reduction, snapshots,
really high-performance penalty-free snapshots.
All the typical things, if you will, thin provisioning,
pretty standard suite of storage services.
Consider those table stakes included.
Yeah, exactly. Where it gets really interesting though,
is when you think about the Cloud,
bringing some of those services capabilities to the cloud, right, where the cloud, we kind of solve both of those challenges.
And uniquely, we solve it with a scale-out architecture.
And that's what we think is so different about PowerFlex in the cloud versus maybe some other offerings out there where it's, hey, here's a dual-controller virtual appliance that can scale to 60 terabytes if you're deploying the cloud.
Like, that's not that interesting.
When I could take the petabyte type scale of PowerFlex,
use that to create a true enterprise storage system
in the cloud that gives me
some of those traditional storage services,
but with the scale out kind of native agility
that the cloud provides,
that becomes a really, really interesting concept that we're
getting tons of great feedback and interest in right now with customers.
I mean, imagine how, you know, we can say how elastic PowerFlex is on-premises,
but you have to still keep providing some physical nodes to do it, whereas you can keep
spinning up EC2 instances nonstop. Yeah. And maybe an interesting story from a customer that has
PowerFlex deployed in one of the hyperscalers. On-prem, they kind of run their standard thin
provisioning across their storage platforms, of which PowerFlex is one. On-prem, they kind of
stay around two to one, right? So they take advantage of over-provisioning, but look,
if they need more hardware, more capacity, it takes about a month, right? Place orders, lead times, people showing up to plug it in, et cetera.
In the cloud, they're actually running their over-provisioning rate north of 4 to 1 on PowerFlex.
And the sole reason is, well, we're comfortable doing that because if we need capacity, I click buttons and 15 minutes later, it's in my cluster. So they're kind of taking advantage of that cloud agility to drive out cost by running at a higher oversubscription rate.
And that's even more interesting because the cloud on its own does not provide thin provisioning.
When your PPA goes and provisions a four terabyte volume, but then only stores 500 gig of data. You're still paying the cloud for four terabytes of storage.
You're saying that you could drive up the utilization rate from two to one to four to
one based on what parameters for the solution?
I mean, like thin provisioning or not, I understand, but that would be the same regardless of whether
it's on-prem or in the cloud.
Totally on things like compression or something like that?
No, I think the operational difference, and this is one of the things I've pinged Dell in the past about and Dell competitors,
about when you build storage arrays in a public cloud, you're trying to bring this operating model from the private data center into the public cloud,
and it doesn't work because of cost.
If I provision a four terabyte storage array,
I have to pay for four terabytes of storage, whether I use it or not.
And it becomes obscenely expensive.
And so I guess the, the, the follow on question to you, Anthony,
I like the approach of being able to say, okay,
I'm going to provision 50 gig of real AWS instances behind my PowerFlex
and advertise two terabytes or whatever the size I want to advertise to my customers, that process, talk to me about that process of
when I actually need to back that with real instances. Is this something that I can automate?
Is there APIs in which I can, my platform engineering team can build auto scaling rules
that basically says, hey, add additional provision, additional AWS instances, and then go into
PowerFlex and assign those instances to my storage pool.
You've got the idea exactly.
And the programmability of the infrastructure is one of its other highlights.
It is software.
There's an API for everything on our side and there's APIs for
everything on the cloud provider side too. So everything can be programmed. Bingo. And that's
one of the interesting pieces right now. And we realized there are going to be customers
who want to take advantage of those APIs. They want to build that automation themselves and
great, thumbs up, ready to go.
Stay tuned.
You'll hear us making some announcements shortly about us for those customers
that maybe don't have the ability
to build that automation themselves
or just don't want to.
Some of that will be provided for them.
So without spoiling, stay tuned, if you will.
Okay.
Well, hey, this has been great. So Keith, any last questions for Brian or Anthony? with, you know, partners around the edges of what they do, you know, like Terraform integrations, et cetera.
All the cloudy stuff that I can envision is possible with the platform.
Right, right, right.
So, Brian or Anthony, is there anything you'd like to say to our listening audience before we close?
For me, you know, thank you both for the time.
This was, you know, great discussion. Great, great, great questions. You know, are kind of asked to the audience out there. If you're out there is a really, really interesting platform right now that's growing absolutely crazy for Dell.
We'd love the opportunity to come in and talk to you about it.
So really appreciate the time.
And Brian, any closing comments from you?
I'll echo that.
Thanks for having us.
This was really great.
And to also echo something you mentioned,
Dell Tech World is coming up.
It's right on our heels here. You'll
be hearing a lot about it. If you happen to be there, we have several deep dive sessions,
ask the experts, hands-on labs, come get a hands-on look at it, ask a lot more questions
if you've got them. Yeah, I guess I have one question. It was actually a whole set of questions,
but I'll just ask one of them. Is there some sort of like a trialability capability?
Can I log on to PowerFlex.com and download the software and run it for a four-node system for some number of hundreds of gigabytes or something like that?
No.
So remember, we don't want customers to say, oh, let me take software and build my own PowerFlex system.
Right. There's there's a level of science project projectiness to that that we just found customers are just not that interested in.
So if you're in a position where, you know, you're designing infrastructure, you want to get hands on with PowerFlex.
We have a whole array of options, everything from very, very rich labs
with very high performance systems at Dell facilities that customers can access, or we do
have the ability to send gear to customers in their data center where they can test with their
workloads and applications specifically. So to truly kind of properly evaluate, you have the
ability to do that. And then the same extends to cloud as well, where we have to work with customers, stand up, you know, PowerFlex in the cloud with them where, you know, they can go and do all the testing and validate, you know, a hold of some pretty robust hardware within the Dell ecosystem.
And we found that as a need that we didn't need to bring the physical nodes into our
data center, which is kind of our bread and butter.
It's a little known hack as a Dell customer to leverage these EBCs for pretty much whatever
you want to test.
Interesting. Interesting. All right. Well, this has been great, Brian and Anthony. Thanks for being on our show today. And thanks again to Dell PowerPlex for sponsoring this podcast.
Thank you both. Have a great day.
And that's it for now. Bye, Brian. Bye, Anthony. And bye, Keith.
Bye, Ray.
Until next time.
Next time, we will talk to another system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.