Grey Beards on Systems - 134: GreyBeards talk (storage) standards with Dr. J Metz, SNIA Chair & Technical Director AMD

Starting point is 00:00:00 Hey everybody, Ray Lucchese here. Jason Collier here. Welcome to another sponsored episode of the Greybeards on Storage podcast, a show where we get Greybeards bloggers together with storage assistant vendors to discuss upcoming products, technologies, and trends affecting the data center today. And now it's my great pleasure to introduce an old friend, Dr. Jay Metz, Technical Director at AMD and board member of SNIA. So, Dr. Jay, why don't you tell us a little bit about yourself, about SNIA, and what SNIA is up to these days? Oh, sure. Absolutely.

Starting point is 00:00:42 So, yeah, I am a Technical Director. I have been working with AMD for a little over a year now, maybe about a year and a half. But my background, my experience is primarily in storage and storage networking. And as part of my responsibilities, I'm a member of SNEA and the technical rep for NVM Express. For SNEA, I'm on the board of directors. in fact, I'm the chair of the board of directors. And I'm kind of responsible for, you know, keeping everything, you know, aligned and moving forward. That includes, you know, creating technical alliances and sitting in on general committee stuff and all that fun political stuff that a lot of people are rightfully trying to avoid in their careers. Yeah, yeah. I can speak for myself in that respect. I'm sure Jason as well. So what is SNEA and why is it there? And do we need SNEA anymore now that the world

Starting point is 00:01:38 has changed and all that stuff? That's a fair question. I think it's one of the things that a lot of organizations are trying to ask themselves at this point in time. It's like, what are the relevance here? So SNIA does a number of different things that many people may not even realize. So it started off as the Storage Networking Industry Association. And storage networking was the primary focus for a very, very long period of time. But in the last 15 years or so Tiznia has been working on everything from persistent memory to computational storage security um you know we've just recently announced that we've got a technical affiliate the DNA storage alliance has joined us as a technical

Starting point is 00:02:18 affiliate and we're really trying to make sure that we wind up being all things storage for the industry, both in terms of for the vendors, for those companies, both large and small, who have a story to tell. But also for the end users, we've got a lot of material that are educational in nature, vendor neutral presentations in particular. And that's a big part of our charter which is to remain vendor neutral even though that we are a vendor based organization and then last but not least and certainly not least we do a lot of standards we do our standards development organization and we do things like the swordfish standard we do um you know sff is a technical affiliate for snea so we do a lot of hardware standards. You've probably seen, you know, E.1S, E.2S, and so on and so forth.

Starting point is 00:03:15 You know, the kind of form factors that people kind of take for granted, that's all part of what SNEA does. And then we've got other initiatives that involve power consumption, power and cooling, green energy. We've got a lot of memory and data movements like RDMA and DMA initiatives. And basically, if it involves memory and storage from a capacity perspective or a usability perspective, we have something involved in that. We couple that with hackathons and plug fests and education and wrap it all up in a nice, neat little bow for the SNEA dictionary, which is the de facto storage taxonomy. Yeah, I was giving that dictionary to people that are trying to learn about storage. I said, you should look through here.

Starting point is 00:04:04 You'll find everything you need. What about CXL? CXL seems to be a hot topic these days. I noticed you mentioned NVMe Express as part of your duties at AMD. Where does CXL fit in the SNEA world? So both NVMe Express and SNEA, as well as a few other organizations, are technical alliance members with SNEA. So we do a lot of joint operations with, it makes it sound like it's a military thing, but we do a lot of joint work with both organizations. And it's a very careful relationship with all these other alliances because nobody wants to step on anyone else's toes, right?

Starting point is 00:04:42 So we have to make sure that, you know, when, when SNEA does something for, you know, interoperability or architectures or APIs, that we have to be able to communicate with those organizations that own the technology, CXL, NBU Express, JEDEC for memory and so on and so forth. So the nature of the relationship is primarily a matter of going back and forth and creating a negotiated order for the technology. A good example for this is computational storage. So, you know, computational storage is kind of a focal point when you start to really dig into how a lot of this stuff works. Now, the lingua franca for computational storage is NVMe.

Starting point is 00:05:26 So all of the work that's being done in computational storage for NVMe is done in NVMe Express, the actual organization. But the architecture, the security model, the API model, all of those tend to be in SNEA. But one of the things about computational storage that's really getting a lot of people's imagination going is what happens when you want to do peer-to-peer communication between these devices without going through the host. So like GPU direct only computational storage direct kind of thing? Is that what you're talking? In a matter of speaking, in a matter of speaking, I mean, the principle is the same, even though the technical details are a little bit different. But yes, right now, if you want to have a computational storage device and it sits on a PCIe link, that host owns the computational storage device.

Starting point is 00:06:13 Right now, that's the way it works right now. But if I want to put 24 of these things into a shelf and I don't necessarily want to go tromboning back into the host and back into another device, you can't. Not yet. But it's one of those things that people want to do. And it makes a lot of sense that we've got the technologies to be able to do that over a transport like CXL, which has different kinds of caching mechanisms and memory access mechanisms, as well as the standard one-to-one relationship as well. And then on top of that, SNEA is working on a new DMA technology called SDXI, which is a memory-to-memory data mover. And that is one of the things that we are also talking about doing possibly over in computational storage that has implications for

Starting point is 00:07:09 SCXL. So if you want to get all these things to work together, you're going to have to have those relationships. Right, right, right. And so obviously the standards generation and that sort of stuff is pretty important. And SNEA has been a driver of that for many, many years. I remember doing like a plug fest, a fiber channel plug fest down in Colorado Springs and stuff like that. So you guys must be supporting those sorts of activities

Starting point is 00:07:34 in a continuing fashion. Oh, absolutely. But so where was I going with this? So do you see other standard protocol? I mean rdma over internet or rdma over ethernet or rdma over tcp those sorts of things come out of out of snea or is the snea in conjunction with ietf or how does that play out no well so like i said before we you want to make sure that those groups that own their technologies don't feel like they're being infiltrated, right? Or they're having their toes stepped on.

Starting point is 00:08:11 So RDMA is really something that came from the InfiniBand Trade Association. And Rocky is an InfiniBand Trade Association discussion. Putting that over TCP is a IETFF thing so IETF has a relationship with InfiniBand and if you wanted to have a connection with NVMe for instance and and I warp which is what RDMA over TCP is then you would have the relationship between IETF and and and Vimexpress right now there is no official I work transport for NVMe but but if there was, that's where that would go. However, if you want to do storage services, and if you wanted to do some sort of

Starting point is 00:08:51 management of this from the storage or the server entity, that would be a place where you could implement a SLEA technology like Swordfish. That's how you would actually kind of tie all of these things together conceptually. I always thought, quite frankly, this is not a good sign here, but I always thought Swordfish was dead. It had kind of gotten a lot of press early on and then kind of went quiet for a long time. Is it coming back? It is.

Starting point is 00:09:20 I think it's one of the most quietly successful technologies. Quietly successful? That's interesting. Well, yeah. I mean, I'm kind of being a little facetious here because you have a good point. I think part of the reason why Swordfish has been difficult to catch on has been simply because of the fact that in and of itself, Swordfish is a little bit difficult to wrap your head around. And the reason for that is because Swordfish is a management tool. And if you don't have something that you need to manage, you really wouldn't think about it all that much.

Starting point is 00:09:59 But if you want to do Ethernet attached SSDs, right? How do you manage those? Well, you manage those with Swordfish. If you want to do NVMe over fabrics with TCP to those SSDs, how do you manage it? Well, you manage those with Swordfish. If you want to do NVMe over Fabrics with TCP to those SSDs, how do you manage it? Well, you use Swordfish. If you wanted to have a single management structure, schema to manage both your servers and your storage,

Starting point is 00:10:19 you'd use Redfish and Swordfish. They go hand in hand together. So if you don't think of it in terms of, I have this thing that I need to be able to manage simply, then Swordfish probably isn't going to come up in the conversation. But management is one of those things that everybody loves to complain about, but they don't necessarily like to plan for. And I think one of the reasons that we are still a big believer in Swordfish is because of the fact that when you get down into it from a management perspective, it really makes sense. It's just it is a very flexible way of managing servers, storage.

Starting point is 00:10:57 But it always seems so, so complex to use and be in that framework and support and stuff like that. I mean, most of the storage vendors I have talked to over the years have kind of looked at it and played a little bit with it. But to a large extent, they've gone their own way. I mean, it's too big. It's too big. Maybe. I don't know. It's so interesting, all the interrelations between all this stuff, even the trade associations themselves.

Starting point is 00:11:25 I mean, Doc J, you should probably come up with a trade association that maps the interrelations between all the other trade associations, right? But yeah, there's a lot to it. And it's a fair criticism, I think that when the project started off, we had a number of things that we had to do. First of all, we had to replace the SIM approach that SMI was doing because of the fact that it was getting a little long in the tooth. And it was getting very difficult to implement this management structure in modern systems, disaggregated systems, that kind of thing. And then I don't think it was a matter of feature creep as much as it was a matter of the people who were developing these technologies didn't know how to communicate it to the general masses, right? There was no, I mean, they have a great marketing logo and they've got a great marketing name for it.

Starting point is 00:12:29 But when it came time to do it, the issue really is, okay, well, how do I put this into practice and where can I plug and play? And unfortunately, it was never really a situation where the modern technologies, like those companies that want to do server-based storage devices or containers or those kinds of things,

Starting point is 00:12:50 knew how to apply this in their environments. And it was a matter, I think, of people being so close to the technology that it seemed intuitive to them. And then you wind up with a communication problem more than a technology problem. I don't want to beat Swordfish over the head too much. I mean, there's lots of things.

Starting point is 00:13:10 So talk to me about the software, the Storage Developers Conference. I think that's going on here in a couple of weeks. Is that true? Yeah, well, in a couple of months. It's in September. Okay. And what do you guys do there? So the Storage Developer Conference is a technical conference for, it's the largest technical storage conference.

Starting point is 00:13:34 SNEA has these SDCs that go around the world. We've got them from the United States. We've got them in Europe. We have them in India. And they kind of rotate around the globe during the course of the year. The major one for us in the U.S. is in September, and we do a lot of things that you would probably find in other technical events, except we are far more vendor neutral. That's first and foremost. So you're not going to get a lot of commercials for products. And we are also a bit more in the weeds.

Starting point is 00:14:13 So we're talking specifically to people who develop storage, storage devices, storage software, storage virtualization. Those are the people who actually get into the weeds with this, and you start at a very low level for these. And then we've got the, it depends on the year, but we've got hackathons and plugfests for various technologies. Off the top of my head, I don't remember what's being handled, but in the past, we've had SMB technologies. We've had persistent memory hackathons. We've had, you know, programming model hackathons. And it winds up being a hands-on part of the conversation as well as those presentations.

Starting point is 00:14:56 I just got to say, Doc J, how cool that was because, you know, one of my former companies, we ended up going to one of the, I believe it was one of the plug fest, and you're talking about basically the SMB based stuff. We were sitting there and all the folks that basically wrote Samba and that were highly involved in that, you know, everybody from Microsoft to the open source community were there at that. And we were working on integrating that into a product that we were developing. And, you know, it's kind of one of those things, we've been roadblocked on a few things for like three, four months. And then we go to that. And then like literally over the week that we're actually at it, like we had made so many breakthroughs in the product. I mean, it was phenomenal for the development team to actually go to that and we

Starting point is 00:15:37 could share what, what we had been doing, but then also basically that, that co-learning and then basically getting to meet the other developers that you're usually talking back and forth to on message boards and basically forming that personal interaction. It was awesome. Yeah, I think from a developer perspective, these sorts of things are just great. I mean, I've been to the SGC conference a couple of times. It is pretty intense, technical, and it's a good session from meeting people and stuff like that. I haven't been to a plug-a-thon or a hack-a-thon to some extent, but I understand what those sorts of things could be.

Starting point is 00:16:12 That's very interesting. Well, you get to talk to the people who are actually working on this. I mean, I could come up with all kinds of examples, but I mean, Sage Weil was doing presentations on seph before um you know way back when he was starting with ink tank um you know siggy grimberg who was the author of envy me over tc did presentations um you've got a bunch of the people who are hands-on developers for nvme you know you want to ask about how nvme works we've got Eric Hibbard, who is a security guru, who is helping develop many of the security standards, both inside of SNEA and outside of SNEA. I mean, these are the people who are writing the stuff that people will be using.

Starting point is 00:16:54 And they're there in person and talking about the stuff that they're working on and why. You know, I mean, if there's any kind of time when somebody has complained about, well, why do they do this? Well, they're there. You can ask them yourself if you've got the cojones to do it. Right. So to speak. Oh, God. Yeah.

Starting point is 00:17:18 So the other question. So you mentioned standards and stuff like that. So something like NFS or SMB or, you know, Fiber Channel, those have their own standard bodies and you guys just interact with them? Yeah, we have alliances with them. So Fiber Channel, for example, is developed in a group called T11. It's a part of the Insights Standards Group. And they're basically Fiber Channel Industry Association is the marketing arm. But they've got an interesting relationship where the FCIA creates requirements documents that is then put into practice in the T11 and the T11 uses the FCIA to help market that. And so it's

Starting point is 00:18:02 kind of a nice circular kind of a thing. But when you look at Fiber Channel, it's not just Fiber Channel. I mean, it's a storage network and it's a storage solution. And there is a strong relationship between FCA, always has been, and SNEA. And we always have the open door for different types of technologies. We've had Fibre Channel come in and talk. We've had the iWarp people come in and talk. Even though we don't actually have anything to do with the technology being developed, we still see this as a storage technology that needs to be discussed and promoted, especially in lieu of anyone being able to have all the information all the time. In other words, if you want to know what a technology

Starting point is 00:18:53 does, you have two choices. You can go talk to a vendor, which is what some people do, and you get the vendor's perspective on what the technology does. But if you want to know what may be best for you, you always have to run the risk of whether or not it is a factual, informational kind of a thing, right? There's always a kind of a sense of a bias. Or you can go to an organization that kind of collates all of the information and allows you to find out what you need to find out when you need to find out and make up your own mind. So we, you know, as, as chair of SNEA, I've been absolutely adamant and, and, and just vicious from time to time that we will do a vendor neutral presentation and, and show, right.

Starting point is 00:19:39 I have personally shut down presentations that were nothing more than advertisements because, and I also, it has to be technologically neutral, right? Right. That's tough to do. It is. It is. So if Fiber Channel were to come in, right, and somebody were to say, oh, Fiber Channel is dead, Fiber Channel is awful, I don't want to see that in that presentation because it is simply not true and it's not fair to start making those kinds of statements when, you know, the other side of the story isn't there to defend themselves. Right. And you're not helping anybody. And a developer conference is not a trade conference. If you want to go off and you want to promote your product, promote your service, promote your technology, I'm all for that.

Starting point is 00:20:26 But you're not going to come into a developer conference and say our technology is better and then give some sort of spiel as to why the other technology is really bad. And that's something that I've been absolutely adamant about. And sure, sometimes things fall through, but I have worked very hard to create a culture where it's going to be caught by more than just me. I've been through sessions where I've presented and I had to have it pre-approved for vendor and technical neutrality, which was kind of interesting from my perspective, but that's an old story. So, you know, with the cloud that's coming out and everybody's moving all their stuff in the cloud, where does SNEA fit in the cloud space? I mean, are you doing anything to help, you know, standardize cloud services or how does that play out in your mind? Realistically, we have involvement with several organizations that, and we're starting up new ones. So for example, CNA has not traditionally or historically had a relationship with CNCF, but we're just now

Starting point is 00:21:30 starting to have some conversations there. I think part of the issue is that before I became chair, we were very traditionalist focused, meaning that infrastructure was where we spent a lot of the time. In the last two years though, our horizons have been expanded to include everything from containers to blockchain to the security aspect that goes across the board and the new memory technology. So we're moving beyond that into some of the upper layers of storage. And cloud is a bit of an interesting thing because the hyperscalers have their own way of doing things, and they don't need a standard. They don't need – basically, they're big enough that they can actually run their own show, and people will flock to them and do whatever they want to do anyway. comes into play and where Pistia becomes very useful is that there are still several layers between what the cloud providers want you to do and where people currently are, right? Even now.

Starting point is 00:22:33 So one of the things that we do is we help guide people in being able to understand A, whether or not migration is the right right thing for them because there are actually there are actual security issues that that are often ignored when trying to move indirectly into the cloud and one of the things that we help with is trying to navigate those waters we've been issuing a bunch of different uh advisories for instance about changes in the the security policies and and regulations that could affect companies as a result of either staying or going into the cloud. This includes international ones as well. Yeah, I mean, GDPR and all that stuff.

Starting point is 00:23:12 Yep. And if you want to have an API that's kind of available in a, I don't want to say free, but in kind of an open way. You know, the SNEA APIs for doing things like object storage or doing things like security or computational storage or whatever it winds up being is something that can be and often is implemented inside of the cloud. Something like S3 has become almost a de facto object storage standard. Is it something that SNEA could take on to try to make that more of a real standard rather than just a de facto standard? No. Well, it can't because it's AWS's property.

Starting point is 00:23:58 So you have your own object storage standards that SNEA has developed? Yeah, CDMI, yeah. CDMI, yeah, I've been there. Who all is using CDMI? Does Amazon, Azure, and Google Cloud use CDMI? No, but there are some places around the world that have a requirement for open non-proprietary formats.

Starting point is 00:24:23 And those organizations you can use and do use CDMI. But no, I mean, as far as the bulk of the market, it's difficult to say market because CDMI is not really a marketplace. It's an open object storage format. You know, S3 has a market. For instance, Ceph has a market. You know, CDMI doesn't have a market because we're not selling it, you know, but it is being used in environments that cannot be pinned to any one particular technology.

Starting point is 00:24:53 You mentioned containers and stuff like that. What would be, you know, what's your end goal between you and CNCF or SNIA and CNCF? I think one of the things that we're missing at this point in time of the conversation is that there's a separation between the people who are living in the software world and the actual storing of media, storing of data on media. And, you know, one of the things that, you know, Jason has done that I absolutely love is he has this wonderful metal as a service demonstration, where as part of the process, he explains how many different layers of abstraction you have between the application running all those layers in between from the containers down into the virtualized environment, down into the rated environment to the physical layer itself. And each and every one of these layers is an opportunity for people to not know what's going on beyond it. So I have a feeling that one of the things, my gut instinct tells me that one of the reasons why containers has had a very hard time getting storage to work consistently is that there just simply isn't a good understanding. You think they've had a hard time getting storage to work consistently?

Starting point is 00:26:13 I mean, so a long time containers could care less about storage, right? I mean, it was all service databases, the world is doing something with containers and or buying companies that do their own storage that runs on Kubernetes native. Yeah, but they're abstracting it away is the key. Yeah, abstracting it away. What's so? It's not like direct access. It's not like physical IO to it. But who wants to do physical IO?

Starting point is 00:26:41 They want to do files. They want to do blocks. They want to do object. They want to do block. They want to do object. They want to do Ceph without Ceph, you know, to some extent. Ah, but no, that's not really true. See, because Jason, no, no, Jason's right. The thing is that they have managed to solve an abstraction problem by creating another layer of abstraction. And you're absolutely incorrect when you say they don't want to have direct access to the hardware or the IO. And that's one of the reasons why SDXI is so important is because it allows those abstracted

Starting point is 00:27:09 entities, the applications, containers, to be able to use the hardware itself to do the memory copy. In other words, you're not telling 17 different layers of abstraction to do the copy and then it tells two friends and tells two friends and tells two friends. You really think that they want to have physical access to the hardware, which has been virtualized and contained? You know, it's through a container interface and all this stuff to do memory to storage IO? Oh, absolutely. And you were actually talking about the hyperscalers in particular. This is one of the things that they are very keen on doing. I think the problem winds up being a longitudinal one, Ray. I think that the issue is that over time, you put so many layers of paint on a wall, the wall starts to crumble

Starting point is 00:27:53 under its own weight. This problem in Chicago with their highways, they put so much layers on the concrete, the bridges were starting to, you know, the trailers are starting to be decapitated. A very good example. And I think that in storage in particular, the issue is that yes, you can, you, you, you can create another layer of abstraction, but you, you wind up doing a couple of different things. One, you really hurt your performance, truly hurt your performance. And that leads to the second thing, which is all of a sudden, all you want is faster. You don't want better. You want faster. You want faster chips. You want faster memory. You want faster transports because that's all you can see. You don't want any intelligence in the chips. You

Starting point is 00:28:34 don't want any intelligence in the storage. You don't want any intelligence in the transport, but that's exactly what CXL is. That's exactly what STXI is. That's exactly what NVMe is. It's intelligence that gets washed away by all these layers of abstraction. And yes, I think that the virtualization people and the applications people and the containers people and definitely the hyperscalers and the operating system people want to be able to have better access into what's going on inside the hardware. I mean, this is what I do on a day-to-day basis is talk to those very people about how can we do this in a fashion that is ubiquitous across the board so they don't have to rewrite a new driver every single time.

Starting point is 00:29:12 And I would say, Ray, to your point, does the user or the application developer care about the underlying storage? No, they do not. However, they do care about performance. The hyperscalers are the ones that care about all of this stuff because guess what? They're the ones that got to make it perform to that, basically, the standard the application developer wants. And they can't do it with a layer on a layer on a layer on a layer on a layer on a layer on a layer. You look at what's going on for an IO operation in a container. I mean, it's going through, you know, Kubernetes.

Starting point is 00:29:43 It's going through Docker. It's going through CSI. It's going to, you know, Kubernetes, it's going through Docker, it's going through CSI, it's going to, you know, some storage device someplace. It's a container on a VM that is then sitting on top of a hybridizer. Oh yeah, I forgot about the VM. Yeah, yeah.

Starting point is 00:29:55 So it's a VM cluster and stuff like that. It's ugly. It is ugly, but it's, you know, to the large extent, these software guys, I don't think they really want to deal with the hardware. The software guys don't. And you are completely correct about that. The software guys don't want to deal with it. But at the same time, if they don't want to deal with it, guess what? They're going to get, you know, like, like a thousand milliseconds of latency.

Starting point is 00:30:18 So there would be a second, Jason. So, but don't forget what your question was, Ray. I mean, your question started off with how can SNEA help those organizations, right? And the thing is that that's how SNEA can help these organizations. They don't want to deal with having to figure out a way to communicate with the hardware, so they don't. They just slap that on. But if there is a way that they can access an API that directly accesses the hardware, they'll be fine with that because now all of a sudden they're doing a call to a SDX. Yeah. So they don't have to worry about the underlying architecture at all. They just have to know what the APIs are to be able to do that. But if they don't know that exists, then it's not going to

Starting point is 00:31:01 help them. And that's where the relationship between SNIA and people like CNCF can really shine. You mentioned SDXI at least four times in this conversation. I don't know what SDXI is. Absolutely. So it's a smart data accelerator interface. It is a way of being able to identify from the hardware, you know, different memory regions, and you can use that hardware to do the copying of memory from one region to another. So for instance, let's suppose you've got, I'm going to make a very simple example of this. And, and I hope that it doesn't incorrect as a matter of oversimplification, but, but let's suppose you've got a storage VM and you've got a, you know, a regular VM with an application that needs to, you know, to access that storage. It does a read or something like that.

Starting point is 00:31:50 Perfect example. So, so effectively, think about the process that you need to go through in order to copy from the storage VM into the regular application VM, all those different layers. Instead of doing that, basically what happens is the application will contact, for lack of a better word, contact the hardware and issue an SDXI command to move the data from one memory location to the other that's accessible to both. So effectively what winds up happening is that you are not asking the abstraction layers to do the copy. You're telling the hardware to move the data from one memory location to another memory location and you're bypassing all of that. There's no pachinko chip of commands and you can do a lot of other things here that are very very cool. One of the things you can do is you can zero out memory very easily. So if you need to reclaim some of that memory space, zeroing out the memory is one of those applications that are operations that it takes it takes a lot of time and it can really be annoying to a to a administrator.

Starting point is 00:32:55 You know, the virtualization hypervisor guys in particular want this to be a big deal. So is SDSI a SNEA standard or SNEA workforce or working group? It is. It is a technical working group, otherwise known as a TWIG. And it is going to be, it's in its final stages of review. It's in editing right now for the final review. We anticipate that it's probably going to be released as a 1.0, probably in Q3, maybe Q4 of 2022. But yeah, it's going to be submitted as an ISO standard.

Starting point is 00:33:33 You think that somebody like Kubernetes or Docker Swarm or those kind of guys will start taking advantage of this to do storage IO? But in this case, it's memory to memory? Yeah, I think they will. I think that once they understand what it is, and I think that once they understand how they can use it, more importantly, then I believe that's exactly what they're going to do. I think they're probably going to build it in. I honestly think that they will because it works on any hardware that supports it.

Starting point is 00:34:03 It's not proprietary under any circumstances. You don't need anything, you know, you don't need a license or any of the kind of stuff to use it. You know, and I think that once they realize that that performance that they've been waiting for for every new ASIC that comes down the road, they're all of a sudden going to find that they can get the exact performance

Starting point is 00:34:22 that they're looking for on existing software hardware combos. Well, I should say that I got to be careful about that. You have to have the SDXI in the hardware in order to access it, right? And right now, that's currently under development by a number of different vendors. But once it's there, then the performance is more than just whether or not you have more cores or whether or not you have, you know, a faster transfer rate across the cores or any of that kind of stuff. It's really a matter of, you know, being able to eliminate the waste that happens in abstractions. And I think, yeah, I think they're going to really like it. And then, of course, they can use Swordfish to manage it. Good luck with that.

Starting point is 00:35:05 I'm kidding. I'm kidding. I'm kidding. I know. I know. So, so that's the XI is the memory to memory interface for primarily for storage to, you know,

Starting point is 00:35:14 application kind of thing. You're bypassing all the world's virtualization. You're bypassing, you know, 5,000, 5,000, five layers of operating system here to going down and coming back up to the other.

Starting point is 00:35:26 Is that, is it kind of like NVMe express? No, no, I can see the parallels. The parallels are very, very strong. They're very strong with this one. No, but the thing is that SDXI is byte addressable and NVMe express is block. So addressable. Yeah. NVMe Express is block. Addressable. Effectively, the SDXI talks memory

Starting point is 00:35:46 and NVMe uses memory semantics to talk block, which are two different things, but it gets a little bit into the weeds as to what that really means. No, no, no. I understand. I get you. What is native memory? What is SNEA doing in the memory space?

Starting point is 00:36:04 That's the other question. You know, I think this is storage. It's obviously a storage – let's say it's a storage VM talking to a user VM, and it is memory-to-memory at some point, but it's not storage. It needs to be block or file. Well, at some level sure i i think one of the things well hang on a second because we're talking about we're talking about using the right tool in the right place okay well i want to make sure that the the listeners do too i don't want people to get the impression that i'm trying to say there's a panacea for all the world's problems here what i'm trying to say

Starting point is 00:36:41 is that as we see the collision of compute and storage and networking and transports, we are putting things into places that traditionally they haven't been before. We have not before had compute so close to storage or storage so close to compute. We've not had the memory semantics in storage capacity until we had NBM Express. We've never had the compute so close to the networks like we do with SmartNICs and DPUs right now, or XPUs. So what the question really becomes, just because I can, does that mean that I should? And in some cases, the answer is no. And in some cases, the answer is yes. So realistically, if you want to do a truly composable or disaggregated system or whatever buzzword you want to choose, but if you want to actually do it, you have to know exactly where the tools are going to fit. And that means that storage is becoming more memory-like, memory is becoming more storage-like.

Starting point is 00:37:38 That's the whole concept of persistent memory. It's memory that acts like storage and storage that acts like memory at a high level. Now, there are people that will disagree with that and say that no, memory is always memory and storage is always storage. But the reality is that we're seeing a collision of where these functions are done inside of an environment. It could be inside of a server. It could be inside of an ASIC. It could be inside of a card. It could be inside the network. And it could be inside of an ASIC, it could be inside of a card, it could be inside the network, and it could be inside of a storage device. And when that happens, there is no one ring to rule them all. There's only using the technologies. Sorry. Well, Smartfish would be the... No, that's a really good point because that's actually what it would do. It would be one ring to rule them all. But ultimately, when you want to apply, I would not apply SDXI to do file-based access.

Starting point is 00:38:32 Right? That's not what I would want to do. But isn't this what you're saying is that memory is becoming more storage-like? So you're thinking that shared memory, right? If I look at CXL and what it has the potential to do is to provide sort of a shared memory environment, it becomes more of a storage device? Is that what you're saying? Just look at Intel PMEM, right? Well, PMEM is sitting on a server someplace, okay? So, yeah, all the cores in that server can access it, but it's not sitting out, you know, on some shared memory

Starting point is 00:39:05 environment yet. Now, maybe CXL is the answer to that. But that's it. That's exactly the point, Ray, is that CXL is looking to do that very thing. And as a matter of fact, it is yet. I mean, we do have companies that have products with NVDIMS-N that are doing that very thing. They're doing shared. I mean, if you talk about, you know, MemVerge and Liquid, for example, they're doing that very thing. They're doing shared. I mean, if you talk about, you know, MemVerge and Liquid, for example, they're doing that very thing. They're doing shared memory pools. So the issue is that they're not creating

Starting point is 00:39:31 or recreating storage as memory, and they're not recreating memory as storage. What they're doing right now is they're creating kind of a middle tier of storage, which has memory characteristics and storage characteristics, buys them granularity. It gives them the ability to do things that they couldn't do with just a static storage device or a static memory device. And people are starting to think about, well,

Starting point is 00:39:55 what happens if I do share this? What if I do share this over CXL? What if I do want to have these different devices that can do computational and storage at the same time and share that information with each other? So you effectively wind up with this, I don't want to reinvent the wheel, but at the same time, the one that I've got isn't going to do it. So, yeah, you've got a situation where the memory and the storage do wind up having very similar characteristics, not exactly overlapping, but enough of an overlap so that you can actually share the, you know, the dynamics of the two, and yet still be unique enough that you can have a completely different way of doing something like memory pooling, as an example. And that's why I do what I do. I love the fact that, you know, my job I love because of the fact that I get took off, but for a variety of reasons that some of it's not technological. Some of it's political, some of it's competitive, some of it's marketing, right? I mean, and yet at the same time, there is a general trend, right?

Starting point is 00:41:20 There's a general trend to better use memory, to better use processing, to better use storage. The whole thing about SmartNICs. Here's a good example. Just think of SmartNICs. In a SmartNIC, basically it's a peripheral that sits on currently a PCIe device that connects into a host and into the network. So what does that do? That separates out the networking from the from the networking from the processing by putting compute next to the network that doesn't exist inside of a host processor we have now created a new tier of network processing that is away from the host and

Starting point is 00:41:56 away from the network it's in a brand new place put storage on that way we already are every single smart nick vendor every single smart nick vendor has an nvme story every one of them yeah no i know i know i know i know i know it is is amazing what what's going on there yeah and so the thing is that one of the things we're doing in siena is saying all right look it since everybody's talking about you know this collision of networking and storage on the smart nicks why don't we start creating a task force for XPUs, right? Because different people call them different things, IPUs, DPUs, NPUs, whatever, APUs, whatever.

Starting point is 00:42:31 So what happens if you want to connect this in a ubiquitous way to any kind of storage device? Well, that's what SNE is working on with the task force. We want to basically be a place where people can come and say, I don't want to have 18 different drivers that I support. Because that's really what's going to happen if we're not careful. All right. So, Jason, any last questions for Dr. J?

Starting point is 00:43:00 I have no last questions for Dr. J. I talk to him all the time. I know. It's unfair, Jason, unfair. Dr. J, anything you'd like to say to our listening audience before we close? Well, yeah, first of all, I want to thank you very much for inviting me here. I really enjoy these conversations. It's fantastic. And I also appreciate the fact that you've given me an opportunity to kind of explain what SNEA has been doing. And I don't think that I've had quite this wonderful an opportunity.

Starting point is 00:43:32 And I'm glad that I was able to take advantage of it. Well, this has been great, Dr. J. Thank you very much for being on the show today. Thank you very much. And that's it for now. Bye, Dr. J. Bye-bye. And bye, Ray.

Starting point is 00:43:45 Bye, Jason Jason until next time next time we will talk to the system storage technology person any questions you want us to ask please let us know and if you enjoy our podcast tell your friends about it please review us on Apple Podcasts, Google Play, and Spotify as this will help get the word out.

Your Ad Here

Grey Beards on Systems - 134: GreyBeards talk (storage) standards with Dr. J Metz, SNIA Chair & Technical Director AMD

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.