Podcast Archive - StorageReview.com - Podcast #144: 300 GB/s in One Dell PowerEdge R7725xd

Starting point is 00:00:00 So fundamental to the performance we saw out of the Power Edge R7725 XD is software and software is what makes the magic happen in terms of not only just aggregating the storage within the box but being able to put it out over the wire with modern protocols and that's what you guys do so mark just two minutes on what peak is and for those that don't know you so sadly I've spent a lot lifetime in storage. I know. I've known you for a long time.

Starting point is 00:00:33 For a long time. And most of that is developing features and functionality to get around the challenge that the hardware was never that great, or not as good as we'd want. Today, you know, we're amazing NBME, we've got amazing networks, we've got amazing CPUs, and as we've seen on the Dell, they put that architecture together really well. In one magical box, which is actually kind of a unicorn, right? Because you often get all of one thing or all of another. but this is really even.

Starting point is 00:01:01 They balanced it really well. And so our job there really is less complex, and ideally less complex, in fact all we do is joining the dots. Great NVME, great processes, great architecture, great network. Now the problem with that was, and what Pete really addressed was, storage is not evolved that way.

Starting point is 00:01:22 And so we actually spent more time stripping out some of the stuff that we've been writing for the last two decades, or the decades, to really get it smaller and closer to the hardware. And in the example of Dell, I think, as we've said, it surprised us. The level of performance that that architecture was able to deliver. Right. Well, that was the first thing, right? So we're messing with this server.

Starting point is 00:01:49 We bring in Mark and his team, get the software put on, and then you guys were blown away. Which is rare, because software guys are normally complaining about the hardware. This is one of the first times where the software guys were like, great hardware. We actually re-looked at our testing mythology. You thought you did it wrong? No, this can't be right. We'd never seen that many eye up.

Starting point is 00:02:11 So, you know, never thought it was practically possible. But they've made the hardware, let us do it. Well, now you've got some new challenges, right? Because Dell's so big on silicon diversity. Yeah. I mean, I know we've all built around, you know, Melanox or Spectrumx, Fabric and Nix before. you had broadcom switches, broadcom nix, which was new,

Starting point is 00:02:31 but you still were able to hit the big throughput numbers over the wire. So what kind of challenge does that make it for you from a software guy to keep up with the hardware flexibility that your partners like Dell want? And that is partly our job, because that challenge, if we don't deal with it, propagates to the resell of other user. But they're not going to.

Starting point is 00:02:54 Yeah, they're not going to do it. And it gets complex. whether it's MoFed, O'Fed, and whether the flow control. So we really spend a lot of time dealing with that RDMA and how we make sure that that IDMA is supported up to day, working correct, working fast, so that the user doesn't have to. Now, that makes it look nice and simple. They just plug it in and everything works.

Starting point is 00:03:19 There's actually a lot of work that goes on to make it. It's harder to make it simple than it is to make it fast. And making fast is easy, keeping it stable and making it simple, actually takes a lot of work. But you talked about this, this is your heritage. And when you and I met the very first time, it was, gosh, at least 10 years ago when you were working on the SCST driver stack, right? Yeah. Why do you care so much about the nitty-gritty, as we say, in the States, or I don't know what the equivalent in your land is? Nitty-gritty-gritty.

Starting point is 00:03:54 Okay, that works? Okay, fantastic. Why do you care so much about such a low-level functions? This is actually an interesting question and a bit of a story. I had, after SCST and was acquired, as you know, I'd semi-retired. I'd done more storage than I ever wanted to do. And I was actually consulting within the NVIDio channel,

Starting point is 00:04:15 trying to help them sell the appropriate storage solutions by all the names we know. The problem was, is those solutions didn't fit in with this new world of AI, as we call it, it's wider than the name. And it became so difficult to make those work, because they generally were, at that level of performance that these new DGXs were delivering, would normally have, you know, 500 machines with a storage team.

Starting point is 00:04:48 Now you had one professor developing an innovation who didn't even know what storage was. nor should he. Right, they never had to care before, though, really, right? They never cared, and their focus is on developing the model that's going to save the world, whatever that innovation is. Yeah, the science. Go focus on the science.

Starting point is 00:05:05 And so, and suddenly we, data was just an input, but it had to go down one wire or two wires to gigantic from an ounce of cars that was normally distributed throughout many rats before. So we realized pretty soon that, quickly, that we had to simplify that configuration, to something that the average person would be able to, not necessarily understand, just not care about. And so the only way we could do that was wrap it up and to take ownership of that problem. Right.

Starting point is 00:05:38 And actually then learn all the new stuff that broke on Melanox and everybody else was doing, wrap that up so that they can just plug it in and play and they don't really have to deal with RADME or any of the stack down. So that's why we ended up. That's why we ended up caring about the nitty gritty is because they don't. So why is it then, and maybe this is a decision just on business.

Starting point is 00:06:00 I mean, PKK is an entity, you've got the software, you can put it on it pretty much anything you want, and it will work well and deliver modern protocols, as we said, high performance, reliability, all those things. Why are you not a function in someone else's product? Because it seems like you could have gone that route, conceivably. Yeah. And maybe the... that would be an easy one but the reality is is in this newer world it's more exciting okay you know

Starting point is 00:06:31 you know we've spent our life making products for people that make their service go faster and more money which is valid and a great commercial logic right suddenly you're working with people that are developing brain cancer modules and it's more rewarding so we've really enjoyed that journey And as we've enjoyed that journey, we've seen it diversify and grow into its own, you know, like completely different model and business. So how do you protect yourself from trying to do too much, though, because it would be easy for you to say, well, we're pointed at this, we're pointed at research and enabling these outcomes. But if we just added a few more file services, if we added a couple more data things and you all these, now you're an array at that point. And you don't want to be there, I don't think. But how do you think about where you go from a development perspective?

Starting point is 00:07:21 So what we've really spent, and now it's open, that's easier to talk about. We, as you know, spent with the Dell the first X years condensing what would have been 10 servers into one. Right. So performance, power, stability. And by the way, that is one message that resonated very well when we were doing this work. The concept that what you needed four or eight or ten boxes before. one yeah because you never had that problem before where you get a GPU server that sucks all your power right before you had a few CPUs and you could have as many nodes as you wanted you

Starting point is 00:07:58 no one care so the answer was just at another node suddenly we had this world where it didn't fit you know it wouldn't it took too much power or it physically wouldn't fit next to an MRI so we had to condense it that meant redeveloped raid redevelop the whole staff but Then once we'd made this software that transformed amazing hardware that you can go off and buy on a shelf somewhere into a rocket with modern day protocols, our next job was saying, well, what if we join ten of these together or a hundred of these together? And again, in a very similar way, we looked at this with a modern approach and go, you know, the world that we've known, which has developed parallel file systems over the last X years, did so based on, I don't know, 10 gig ice grizzies,

Starting point is 00:08:49 on a handful of SSDs and a load of HDDs. Which is funny, though, because it wasn't that long ago. No. Like, it feels like it was an attorney ago, but we're talking about a couple of years. Yeah. And in those couple of years, I mean, if you look at Dell today, 300 gigs a second on one box, two years ago, that would have been 80. And so the achievements on the MBME, the whole PCI structure,

Starting point is 00:09:15 structure has just jumped gigantic and it's doubling every time. Right. And then when you reach PCI 6, we're going to double again. So suddenly you've got this amazing ability within one box. Let's scale them down or up whichever way. And so the challenge we had was that the traditional approach is to make some sort of proprietary way of doing that. And that's sort of easier. The problem is it's proprietary and not many people these days want a proprietary option.

Starting point is 00:09:50 So we were really fortunate to work with Los Alamos National Labs who have been looking for modern day replacements of parallel file systems and want that to be a standard and an open standard. And so we've spent the last 12, 18 months working on open P&FS, which we're launching now. It will be completely open so you can go off and make your own P&FS network, which is a great because that's that's what it needs to do to make a new standard and we're really focusing on productizing that and making that more into resilient enterprise type plasma solution but I mean it's it's really amazing we we both love that

Starting point is 00:10:30 Dell platform yeah so capable what would you want to see in the next version if you had input on the hardware team what would you want to see because it's such a struggle to allocate the lanes to the front of the box for the SSS because you need four lanes to get the maximum performance out of those SSDs, and then the lanes on the back end to get it out. If you had a magic wand, what do you want? If we had a magic wand, I'd have more lanes, so we can have more output. Or PCIU6, that gives us 800 gigs, maybe, or whatever that is.

Starting point is 00:11:04 Yeah, but it's funny, by the time you get 800 gigs, you're going to say, I want 1.6. Yeah. But interestingly, what I was doing is I was looking at some of their other products, I think, the same. 7745 and looking at that again with a different approach and wait this is this is you've got everything you need in that box in one box and why why can't our stack let it be a model let it be an inference box or a learning box but also be a storage box in one as well as being able to join into a global name space so so in many ways I think this is where we take

Starting point is 00:11:42 what we already have is we're great, we've got the 7725 and you can stack them, but I'm really interested in some of the Dell GPU servers. What can we do slightly different? Well, your challenge there, of course, is fewer drives, right? So the one you're speaking of, 8E3.D.S. drives, which is fine. I mean, the capacity is now at 30 terabytes and going bigger in those systems. But I guess what would be interesting from my perspective is if you've got the GPU's in that box, we've got our eight drives, what happens once we get out of that eight drives?

Starting point is 00:12:18 Can we have something else in that fabric, which I think is kind of where you're going? Exactly. To expand that, but fluidly, I mean, it has to be just part of that data pipeline and the data movement. That's the, that's interesting. The simplicity and the dynamic fluid nature of it is the hard of the challenges, which we've hopefully achieved. So the idea being this is maybe your starter part. you know, you're 745, short enough storage for you to do what you wanted to set off.

Starting point is 00:12:49 As you grow, just put a 7725 next to it, and we will automate that expansion and the spreading of the data. So that's a sensible step for those that are saying, hey, this is an innovation. We don't know whether or not we want to change our mind. We don't really want to buy a 10 petabyte box at this moment, but what we want to do is not limit ourselves. So it makes sense that we look at adding that parallel file system concept

Starting point is 00:13:23 with inside a single box. And one thing that you're talking about there, too, that I think is interesting. I mean, scaling as you go is a model as old as time because nobody wants to overspend until they need it, right? And sometimes after they need it, and then they can hurry up and make a decision. But the other thing that we're hearing is that the investment required to buy premium storage arrays for these AI workloads is scary too, and a lot of them have been purchased for ease of use. There's a couple of companies that have been really good at that.

Starting point is 00:13:52 A lot of the legacy storage companies have been less good, less dynamic in response to the needs of AI. You're trying to solve that, but you're also doing it without as much expense. Now, I don't know exactly what your bill looks like at the end when we, because I don't ever pay for you yourself. You just give us a license key. which is the best model of all. But talk about cost a little bit and what that means from your perspective versus whatever the series of competitors is up there.

Starting point is 00:14:20 This is actually a really interesting point because if you take, and let's just use the word AI globally, but if you take a lot of that, yes, you've got the deep training side of things and that's often big, but when they build that out into a model and I use the English National Health Service as an example, that solution,

Starting point is 00:14:41 that they roll out, that has to be within a budget that's sensible. And it also has to be within a footprint that's sensible, and a power consumption that's sensible, and maybe even be something a nurse could turn on in the morning. They don't have a data center for this. You know, this is maybe next to an MRI scanner. So the economics of it are actually really important because you can't go and spend a million on a chunk of storage when you only paid $300,000 for the GPU server, which is really what's giving them their value.

Starting point is 00:15:19 So that was partly what PKII focused on, was making it simpler for one, condensing the footprint down. But by doing that, what you're saying is you've got one piece of hardware. More importantly, what the world considers a commodity off-the-shelf server that now delivers what would have been a whole. a specialized high margin storage product. Just for the product, not even the services standing up. Not even the services or the maintenance

Starting point is 00:15:49 and everything we've traditionally got you on. So it's just a storage, it's just a box. I mean, in this case with like the Dell server or any server, then that OEM can handle Tier 1 support on any hardware issue. I mean, if a drive dies, whatever, you just flop it out and carry on. But then you guys come in,

Starting point is 00:16:09 for any more advanced software side needs, right? And, you know, if there's any advantage of getting old in this industry, as we've learned our mistakes. So we've already built in, A, obviously the stability, but B, the then monitoring and the predictive awareness of things. And I think in the field, in four years, we've probably seen two bug updates, which is not bad going for a company. So we've never had a downtime, because we've all getting on, and we've all done this for many years.

Starting point is 00:16:43 So it not only being simple and plug and play and understandable, it working and being stable was really what we spent two years playing with. Because performance is sort of easy, keeping it stable and it is the hard bit. Yeah, so you guys like the hard bits. So PKK is really good at doing the hard stuff. the hard stuff. And doing it because not just that no one else wants to, but you guys find a mission in this, especially on the scientific research side. So that's really compelling too.

Starting point is 00:17:14 And I know in talking to the Dell guys with the work that we did together on this, they've been blown away. Because as much as you were excited about the numbers, I don't think they had seen these kind of numbers either. So that's a great sort of bow to put on this. Yeah, and I don't mean this just as a plow for us. I've just never seen, we were, I think, 70 million Iops. A 300 gig a second or more in the box.

Starting point is 00:17:36 And I've never seen that within a single storage node before. And in that one node we can do that work in the box. We could do 160 gig over the wire using the high-speed nix in the back. Super flexible, as you said. Balance, I think, was your comment. And now we've got direction to see if we could beat this record and do more. Now we've got a challenge. And actually, extending that, you know, some.

Starting point is 00:18:03 you know somewhat just to give you a bit of insight imagine now that we say okay we've got this amazingly fast box that way it's running at 300 degrees a second what if we put the model inside that box absolutely so it's not even using the network or the fabric why can't we and that's really probably one of our next products is saying you've you've made your influence model now you've done it so you need to ingest data maybe you know send it off again after you finish but a large file over a one or a LAN run it on a GPU server when you could actually do it inside the box because that Dellbox should have a GPU in that as well so why not and really we've seen this level of performance it gives us with the additional power that that box had left and that was a surprising bit

Starting point is 00:18:54 we didn't use all the CPU right it gives us the ability to say actually what else can we put in there than just storage. I mean, those AMD CPUs are wild, right? The power available, all the lanes available, and then the extra little bits of Dell engineering on top to maximize the lanes to the back of the system. So we have our challenge. We hit the best numbers ever with this project and now we've got to go beat it. We've got to beat it.

Starting point is 00:19:18 All right, well. Come back next year. Carry on. Let's go. Thanks, Mark. Great meeting. Thank you.

Podcast Archive - StorageReview.com - Podcast #144: 300 GB/s in One Dell PowerEdge R7725xd

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.