Grey Beards on Systems - 129: GreyBeards talk composable infrastructure with GigaIO’s, Matt Demas, Field CTO

Starting point is 00:00:00 Hey everybody, Ray Lucchese here. Jason Collier here. Welcome to the next episode of the Graybeards on Storage podcast, a show where you get Graybeards, storage, and system bloggers to talk with system vendors and other experts to discuss upcoming products, technologies, and trends affecting the data center today. And now it is my pleasure to introduce Matt Demos, CTO of Giga.io. So Matt, why don't you tell us a little bit about yourself and what's going on at Giga.io? Sure.

Starting point is 00:00:34 Thanks, Ray. So my name is Matt Demos, as you said. I'm the CTO running kind of the technical strategy for how the company's going to move forward. We've been doing a lot of really cool, interesting things in the realm of composability. You may have heard of composability in the past from companies like HP and Dell composing drives. I'm sure you guys have talked about that in the past,

Starting point is 00:00:59 but we're not only doing kind of storage, we're taking the whole realm of what makes up a system, disaggregating and recomposing it back together. And now we've been implementing things like composable memory. And as CXL comes out, it's a pretty exciting field for us to be in. Composable memory? Where does that come from? This is a whole different world for me. It's not a DIMM anymore. It is a DIMM, but it's on a box someplace out in the world. Yeah. So it's actually uh it's it's a dim in another box one one area we're seeing a lot of interest in is people saying i've got

Starting point is 00:01:30 these old servers i could throw away those old servers have a lot of memory in them can i go give that memory to a new server and we're saying absolutely let's go do it that's where cxl fits in huh actually we're doing it we're doing it before CXL. We're actually partnering. Yeah, we're partnering with our friends over at MemVerge, and we are implementing a capability to allow you to actually compose DRAM directly into your system and then let your system actually see that DRAM as if it was natively installed prior to CXL. I would like to know more about that MemVerge relationship. What are you guys doing with the MemVerge?

Starting point is 00:02:11 Yeah, so that's fairly new for us to be clear, right? A lot of the stuff that we've been talking about from a memory perspective is we're really looking for a lot of customers that are trying to do a lot of this work today. Having 10, 20 terabytes of memory inside of a server isn't for everybody, but it's certainly the customers that we're looking for right now. So we've been working with them for the last little while getting into beta stage. Now where we actually compose memory into systems, they run their membered software. And then from there, they're able to go address that memory space and go do all the memory management capabilities that they have. So I can do things from just doing load store of memory into remote servers,

Starting point is 00:02:56 all the way up to the checkpointing and snapshotting capabilities that the member offers where I can create snapshots, move those snapshots from one server to another. And soon enough, we'll be driving to the point where we are allowing multiple servers to share the same memory so they can all access the same big pool simultaneously. Does this use like the PMEM interface kind of thing to talk to storage or talk to DIMMs out on this? It's got to be a PCIe kind of extension, right?

Starting point is 00:03:28 Or something like that, right? Yeah, yeah. What is this box you guys got? Yeah, so GigIO has natively been a memory fabric from the beginning, meaning that we have a PCIe switch and PCIe interconnects, but the way it communicates is I have memory in every device out there, right?

Starting point is 00:03:45 I have memory in storage, I have memory in GPUs, I have memory in obviously servers, right? So what we do is we allow connections to talk directly from one memory space to another memory space. So if I want to compose GPUs, it's talking to the GPU's memory. It's not just creating a logical path, it's talking directly to their memory. And so we're able to go utilize that across that PCI fabric, or what we call our memory fabric. And it allows us to go pull remote pools of memory from a distant server, and then allow us to go capture that and utilize it as memory living on that initial host. So once we create that connection, that's what GigIO does. We create that connection and allow that system to see the remote memory.

Starting point is 00:04:33 And then from there, MemVerge takes over and says, hey, I see the memory and I'm going to treat it just like I do my normal PMEM. And so MemVerge does what it normally does. It's just utilizing our remote memory access. In the server, but on the back end of that is a real, some memory someplace on this memory fabric. Exactly. And because we're running across PCIe as well,

Starting point is 00:04:56 our latencies are so low that you don't really see a performance hit. So we're seeing customers that are not going to be able to get to 20, 30 terabytes of DRAM on a box. And without having to go buy one of these crazy- 20 or 30 terabytes of DRAM on a box? What are we doing with this thing? Is this guy, it's like Redis gone mad or something, or I don't know, SAP, whatever, HANA? I can't, 30 terabytes seems. I also want to know when you're talking about latency, what kind of latencies are you talking about? Yeah.

Starting point is 00:05:29 So we're talking similar latencies to traditional high bandwidth, high bandwidth memory. So you're talking 300 nanoseconds, right? You're talking to HBM stuff. Exactly. I mean, so we're talking that type of latency.

Starting point is 00:05:41 So traditional latency you see in HBMs is about 300 nanoseconds. And we're talking right in that same realm. Normally a DRAM on a server or something like that is probably what? An order of magnitude faster than that? Is that? It generally is about 40 to 50. Yes. So it is faster. And we're talking about PCI Gen 4, right? As Gen 5 comes out, then we hit less than 100 nanoseconds. And then I really don't see a difference between composed and non-composed. So we'll be able to offer full scale out composed memory where it's almost imperceivable

Starting point is 00:06:20 here in the very new future with PCI Gen 5. And that's before CXL hits. Exactly. So we will have a CXL memory appliance, but even without implementing the CXL functionality, we will have this capability. So what are the customers and workloads that you're looking at deploying, like that you're deploying today?

Starting point is 00:06:41 And then what are you looking at deploying as far as customer workloads when CXL pretty much hits with the 2.x spec and then when the 3.x spec hits? I can only imagine that the customer workloads grow up significantly, right? That's exactly right. So today, I really kind of focus on more of the AI HPC type of workloads, and mainly because those are workloads that are seeing where single systems need lots of memory, right? Where you're also going to see areas like Spark, things like that, that we'll likely be starting to work with here soon. So large single systems that need lots of memory.

Starting point is 00:07:29 Then you're going to see in the near future, you start taking that same concept where I can dynamically apply memory and add or move it to a host on demand. And then you get into virtualization environments and you start saying, well, what if my VMware cluster has 32 nodes in it and instead of having to put seven or a terabyte of memory in every box where I'm only actually utilizing about 40 percent of it or I chose to only put 500 if I could choose to put 500 gigs in the box and I'm using 80 90 percent or go up to a terabyte and only using 40 percent so I'm in that kind of weird spot now

Starting point is 00:08:05 with how large memory space has to be in each box to be ideal or optimal. So I'm able to go kind of set them to a much lower memory amount in each node and then have a memory pool available to any node in the cluster. So if a VM starts to run and I get higher than I want to be on a single node, instead of having to go try to move VMs around to optimize, I can just simply compose memory to it.

Starting point is 00:08:30 And it'll automatically just grab memory from a pool. This whole memory stuff is brand new to me. I mean, I've been, you know, composability, GPUs, storage, you know, networking cards, things of that nature. Okay, but. Come on, you are a great beta on storage. I guess. I guess. But you know, I saw you do support composability of storage and GPUs and things

Starting point is 00:08:51 of that nature, right? I mean, it's not just a memory. That is absolutely right. Because as I said before, every device has memory. So we talk to them all, right? If I'm going to go compose GPUs, I'm going to write directly to the HBM inside that GPU. I can even let GPUs talk to them all, right? If I'm going to go compose GPUs, I'm going to write directly to the, to the HBM inside that GPU. I can even let GPUs talk to each other, um, across the fabric. So I can do things like with DGXs, for example, or, um, or OEMs. Yeah. So, so, so you have your GPU direct

Starting point is 00:09:21 RDMA is what's out there, what everybody kind of knows about today. That's done with InfiniBand or high speed Ethernet. But when that happens, it has bounce buffers on every single host that it has to go hit in order for those GPUs to talk to each other. And so then that's because the RDMA protocol doesn't allow it to directly talk memory to memory of each of these GPU types. It's got to go translate into all these different memory layer out to the IB layer, communicate over IB back down again to the memory stack and over the GPU. It tries to optimize some of that workload. And it does a decent job, but it's kind of like when you saw GDS, right, or GPU direct storage, when they were able to go take the R out of RDMA in GPU direct storage, it made that the claims are five times faster from storage access.

Starting point is 00:10:15 So we take the R out of RDMA when we do GDR or GPU direct RDMA, we allow that to be a DMA now. So same types of things apply because I no longer have to go hit all these bounce buffers. I no longer have to translate all these protocols. I'm talking directly from memory of host one to the memory of GPU and host two. It doesn't have to go through the host at all, bypasses the kernel altogether

Starting point is 00:10:40 and goes directly to the remote GPU. That's really advantageous for an AI kind of environment where you need a gaggle of GPUs just to keep the training and inferencing activities going on, but you never know where you really need them kind of thing, right? Right, exactly. And then you take the value of composability where I can say, I have training going on, but just because training is going on doesn't mean,

Starting point is 00:11:07 or sorry, if I'm not doing training, if I'm doing inferencing, if I'm doing, if I'm preparing my data, those GPUs don't have to be in that box yet. So I could, those GPUs could be posed to a different box while that boat, all this box is preparing data. So I compose this to it, let it ingest all the data, tag it, label it. And there's no GPUs being used there. They're being used somewhere else.

Starting point is 00:11:28 And then when it's time to actually train, let me bring the GPUs to it. And then they're able to be utilized. So I get maximum efficiency out of all those GPUs I use. At the same time, I can let those GPUs all talk to each other using full DMA capabilities. Yeah. I was going to say, having seen this technology, it is actually, it is so cool. Especially that whole composability piece. And you know what, Matt, can you describe a little bit about, because there's a significant piece of, you know, hardware that you have and basically the interactions with the hardware that you sell. Can you tell us what you sell from basically what do you plug into the server,

Starting point is 00:12:09 what kind of switches you connect to, and then what kind of devices can you connect into to connect to those GPUs that you were talking about? Yeah, yeah, absolutely. So we start off with, as I said, the memory fabric. It's kind of core basis is that PCIe switch. So inside the PCIe switch, I have a bunch of PCIe Gen 4 by 4 ports. I have a ComExpress module in the back that actually runs all of our software. And from there, I plug any other types of devices

Starting point is 00:12:38 I want. So I have an HBA that plugs directly into servers. So those don't really care what the OEM is. Just plug the HBA in. It connects into that fabric. Then those servers can all talk to each other. Anything that lives inside those servers can all talk to each other. So I can talk to the drives that live in the server next to it. I can talk to the GPUs that may live in that server.

Starting point is 00:13:01 But if I keep them in the server, they're still kind of locked to that sheet metal. Meaning that if I keep them in the server, they're still kind of locked to that sheet metal, meaning that if I want to go kind of build a new server for something unique, a unique workload, I have to still communicate across nodes. And that's fine. That works great. But I also have another option. And that other option is going to be using what we call our pooling appliances. So some people would call them JBogs or just a bunch of GPUs. We call them accelerator pooling appliances and storage pooling appliances. So some people would call them JBogs or just a bunch of GPUs. We call them accelerator pooling appliances and storage pooling appliances. And what those are just chassis that are built for power, cooling and uplink of PCI devices.

Starting point is 00:13:36 So I can put GPUs, FPGAs, vector engines, whatever other types of devices you may want to have, even NICs in there. what is the size of the power supply on that thing it has more than one uh let's just say that so i can imagine also i do i do kind of want to rewind just a little bit um what is the so that card that you are sticking in the server that is connecting into your thing, is it basically like a PCI card that is literally just transferring PCI into your PCI switch? That is exactly right. Great piece to point out. So we don't have offloads. We don't have to translate anything. It is native PCIe.

Starting point is 00:14:26 So what that means is if I'm going to compose a device across that card, it doesn't have to translate to anything. It's just if I plug a GPU into our APA and that goes through our fabric to the HBA, it's communicating PCIe the entire way. So it is literally talking as if it's plugged directly into the server. Another silly IT nerd question. What does that cable look like? So the cable is, it comes in two conform factors. So one is gonna be your copper cable

Starting point is 00:14:58 and that actually looks just like a SAS cable. So if you're looking, you used to kind of connecting storage arrays and filers, it looks just like a SAS cable. So if you're looking, you're used to kind of connecting storage arrays and filers, it looks just like a SAS cable. If you are going to go longer distances, you're going to use our fiber option. And that's going to look very similar to an AOC from Mellanox. Give me a second. The storage stuff, how does this play out? And effectively, there's a gaggle of NVMe SSD into a 1U pooling appliance, and then I have uplinks from there. And I assign how many drives I want to go to what server. And basically in a matter of five seconds, those physical drives are electrically connected to that remote server.

Starting point is 00:16:01 And that server has full DMA capability. That server owns those drives as if they were plugged directly into the box. That is by far the most performant way to go connect a drive. In fact, we run some tests with Optane and we use Optane because of the latency characteristics of Optane. And so we ran some tests with Optane and found that we, when doing the full composition, I was able to go do full reads and writes onto that Optane composed, adding one more microsecond of latency over if it was just locally installed.

Starting point is 00:16:37 Oh, that is nice. But you know, the problem is it's like a 10 microsecond latency for Optane. So, okay, now it's 11. It's not, it's not great, but it's, it's, it's like a 10 microsecond latency for Optane. So, okay, now it's 11. It's not great, but it's still pretty damn nice.

Starting point is 00:16:50 11 microseconds in storage is awesome. Yeah, of course. It is, it is. No doubt, no doubt. But we've been talking about nanoseconds this whole time, so it seems so slow. Bear with me for a second. Now, when you change, let's say, from I have a server that had five of these MVME SSDs,

Starting point is 00:17:07 and I want to now move two of those to another server, what has to happen here? I mean, does A to the two servers have to be rebooted, or can it be done non-disruptively? And C, B, you must have some sort of software orchestrating all this stuff, right? Yeah, no, exactly. So I mean, that's one of the great things about what NVMeOF really did back in the day was give you that hot add, hot remove capability. So it really actually was more NVMe than NVMeOF. When with the implementation of NVMe into a server, it forced all these kernels to be able to handle hot adds, hot removes in a much different way than it used to. If I plugged a PCIe device into a server, what, eight years ago,

Starting point is 00:17:53 that server's gone. But because NVMe's had to be able to be pulled out and pulled in at the front of a server without that kernel crashing, all those pieces of code have been put in place now where I can hot add and hot remove devices. The hot swap support made this all available for NVMe SSDs. How does this work for GPUs? I mean, are GPUs hot swappable?

Starting point is 00:18:16 Yeah, so it really depends on the exact OS. Some OSs support it. Some are a little quirky, meaning that if I have a PCIe hub composed to it, I can add more GPUs and remove them without an issue. But if I'm kind of adding a whole new set of GPUs to it, I'll have to go restart the system. To be honest, though, when you're talking about GPUs, you're talking about drivers, right? When you have to talk about those drivers, all those drivers have to be restarted anyway when you add or remove a GPU. So the idea of having to restart the server

Starting point is 00:18:47 versus restart the service is not really that big of an issue. And when it comes down to it, those are the components that don't fail that often. Exactly. And when they do, it's because you've overstressed them

Starting point is 00:19:04 heavily. You need to look at my crypto mind with the- Yeah, well, Ray, you are excluded from this because you do all kinds of silly things that you shouldn't do. No doubt. Well, actually, on that note, you think about it. If you're putting those GPUs inside your server, they're fighting for cooling, right? They're fighting the CPU, they're fighting the memory, they're fighting the disk for cold air. And so by putting those hot devices inside of our GPU chassis or our memory, sorry, accelerator pooling appliances,

Starting point is 00:19:39 I'm able to go really increase the life of both my GPUs and my servers because they didn't have, they're not fighting and battling for, for that, that cold air the entire time. Matt, Matt, if you're arguing the point that Ray needs your appliances, like, I don't think you need to argue that. Yeah. It's like, that's a different question. Yeah. Yeah.

Starting point is 00:20:02 Ray is going to totally agree with that. Let's go back to customers. How does this play out in an HPC environment and things of that nature that, you know, you would think like these supercomputer environments could really benefit from a gaggle of GPUs sitting in, you know, a rack or two that could be allocated to wherever they need to be allocated. Oh, come on. Tell us about TAC, really. That's what Ray's asking. Yeah. I mean, it's definitely a huge advantage for a lot of HPC customers, right? So the idea of being able to be dynamic. And you'll talk to some people in the HPC space and they'll kind of fight against it because it's not what they're used to. And a lot of people are kind of set in the way they do things.

Starting point is 00:20:48 But when you look at what HPC is today and how AEIs merge with HPC and the fact that most of these, especially larger systems, are not built to do one problem. They're built to do hundreds of problems. And so they expect to have different challenges all the time. The idea of having a homogeneous compute environment makes no sense because if everything has to be the same, that means instead of trying to solve a problem the right way, I have to go change my code and make it adjust to the hardware. And so I'm not writing the code that I want to write. I'm writing the code that I have to write. And so what we really enable is that ability to software to find your hardware.

Starting point is 00:21:29 And so a lot of these universities are really starting to see this capability where I can now say yes to my customers instead of saying, well, we could, but you got to change this, this and this, and we got to go buy something that looks like this in order to go make that happen. Give me your wallet and we'll talk to you in nine months. Right? And so instead of having to go do that, they're able to say yes, or at most, hey, buy that new card that you want to have. I'll add it to the fabric and then I'll say yes. And so we're talking about a couple of weeks instead of nine months to a year. Matt, Matt, I got a lot of friends and coworkers I need to introduce you to. Half of them, I think you already know.

Starting point is 00:22:14 Yeah, right, right, right, right. Talk to me a little bit about the software. You must have. So is this like an operating console or something like that that you talk to your composability solution or is it API driven? Yeah. So, so we made a conscious effort early on to say, you don't need another GUI, right? We want to be transparent. And so, so what we've done is we've made everything Redfish based. So I can do all of my composition through the same API that you're already using to go manage your hardware. So since we're moving hardware around and creating hardware connections between devices, it makes sense that Redfish is the API that was chosen. So we actually don't have a GUI in our environment. So today everything is Redfish API driven, and we've actually

Starting point is 00:23:02 integrated with a bunch of partners. And when I say integrated, we didn't build a plugin. They actually came to us and asked to go integrate our capabilities into their software because they saw the value would be to their end customers. Yeah. So Bright, for example, Bright Cluster Manager has integrated Gig.io. Obviously, you saw some big news of those guys this week. And so Bright, you have some Slurm through a couple of different partners. They've done some Slurm integrations. Being that's an open source product,

Starting point is 00:23:35 it's something that people can do. And a few of our partners have actually integrated us into Slurm. We also have a company called Control IQ. You may have heard of them. Greg Kurtzer's new company. They are building a product called Fuzzball and Fuzzball is already certifying or implementing us in their 1.0 release set to come out here shortly. And that's actually gonna be a cloud native HPC toolkit. Yeah, it's good tech.

Starting point is 00:24:06 It's fun to see innovations and basically innovators spurning other innovators to innovate. It's a lot of usage of the word innovate. startups. Awesome.

Starting point is 00:24:24 I'm done. I'm done. Like innovation. It's like, Oh, it's like I get so excited about this stuff when I see, when I see startups, startups like fueling startups, that's the, that is the, the number one thing that basically a, you know, a founder can be proud of. Yeah. All right. All right. Let me get back to scheme here. So how does something like this work with Red Hat or VMware or Nutanix, those kinds of guys? I mean, how does this play out in that space? Yeah. So some of that is in the works right now.

Starting point is 00:25:03 We've done some testing with VMware, for example, and we have beta code with ESX that allows us to go compose. And so we can compose it actually in ESX, we can do it without any reboot at all when adding GPUs and devices into ESXi. So that's super exciting. We're waiting to see what comes further from that relationship. Then you have things like Red Hat. We've been in talks today. The easiest way to go implement that is actually through Supermicro's integrated a product called SuperCloud Composer. They're kind of getting in that software business now, which is nice to see. And their first release at it is their kind of platform management software, and they've integrated GigEye on that as well. So you can manage your whole rack to data center worth of systems.

Starting point is 00:25:57 And that's Super Microsystems, Dell Systems, HP Systems. It kind of manages them all. But you can actually compose your devices across those systems using that tool set as well. So from an ESX perspective, what you're doing is you're actually messing with ESX's hardware in real time, which is not something you typically see. So you and VMware are providing a capability to make this sort of thing happen with GPUs and VME SSDs, I guess, huh? Yeah. I mean, you see, VMware has been pretty excited about trying to get this composability aspect to work, right? They've made acquisitions to go do that. And obviously, we're still in early stages with those guys and we have it working. We can actually go work with some customers and show them how to use it.

Starting point is 00:26:46 Still waiting to see kind of what those next steps look like with VMware. Pretty excited about what that'll do for the enterprise market. Yeah, but none of these guys really deal with the memory size of things. So when you start talking about being able to expand an ESX solution from a 512 gig to a terabyte or two

Starting point is 00:27:08 in real time, it's a, it's a different world, I would think. It is. And, and these are, these are conversations that are, they're likely to be being had soon. So I, I can't really talk too much on, I kind of what that looks like today, because I'll be honest, it doesn't look like anything yet today. But I have a feeling it will be soon. I'm thinking SAP HANA and Redis and all these guys are driving bigger and bigger servers anymore. And having the ability to do something like this would be something VMware and those guys would want, truthfully. No, absolutely. And not only that, I think it's more about just traditional data center flexibility, right?

Starting point is 00:27:58 We've been told that composability is the VMware of today, right? That ability to be as flexible as you need to be to meet your customers' demands, that's what VMware was founded to do right take that server and and let you say yes all the time because I was able to take something big and make it you know all these small things and be very flexible that's what they're that's what the purpose of virtualization is and we're going to kind of help hopefully take that to the next level and I I think I think you're well positioned in taking it pretty much to the next level. And one of the things that I've always looked at as one of those components of like, where do virtual machines go to, to come to the next level? And I think when you can create a virtual machine, that's bigger than the physical constructs of what that virtual machine is.

Starting point is 00:28:53 Boy. Yeah. Yep, dude. So when you, when you create a virtual machine, so you, you got a machine with a terabyte of Ram,

Starting point is 00:29:04 but you can create a virtual machine with two terabytes of memory and something special and that's where cxl is going to come in that's where everything that you guys are doing uh at gig io where that's going to be that's that that's going to push computing forward. I totally agree. And the way I see it right now with CXL is VMware with the first generation of CXL won't be able to do anything with it, right? From that perspective,

Starting point is 00:29:36 they won't be able to go share anything across servers. Now, with that said, GigIO can, and we actually have designs to go do that. So we will be having CXL-enabled sharing even in PCI Gen 5 with CXL 1.1 support inside the servers. We've figured out how to do that. So CXL will be coming in a shared arena here right along with the PCI Gen 5 servers as they come out. Hey, Matt, besides the CXL standards and stuff like that, there are other standards organizations in the composability space.

Starting point is 00:30:10 Do you guys play in that environment as well? Yeah, so obviously we've been on the CXL consortium since the beginning. And we are, it's really kind of more focused on the OCP piece is really where more of the composability is really kind of more focused on the OCP piece is really where more of the composability is, is, is really kind of driving into. And so the, so, so Redfish has been a big part of it. That's why everything's also been really focused on Redfish, but you're going to see a lot more from us here working with OCP and the composable aspects of it.

Starting point is 00:30:41 So what's a, what's a,. So let's talk big things. What's the biggest memory pooling appliance that you guys support at this point? And how many servers is it potentially distributed over? Well, that's the thing is it really, it's kind of whatever your imagination came up with. I mean, there are limits. Imagine a pretty big world here.

Starting point is 00:31:06 There are limits, but I can make the, so basically I can create as many memory, I can create a certain amount of memory windows that I can go mount memory to for that server. It gets kind of technical. I can create so many of them based on the BIOS of that server, but how much memory I put into each of those windows is configurable. So if I have servers, each have a terabyte of

Starting point is 00:31:29 memory in my memory pooling appliance. It's like a virtual page space. So you've got a physical page space that you're managing on the server itself, but the virtual page space behind it used to be on storage. Now it's sitting on a memory device off a PCIe fabric. Is that what you're telling me? That's exactly right. And then you use MemBurj and their technology to literally keep it hot and cold dynamically. Hot and cold memory.

Starting point is 00:31:55 Yep. Oh yeah. So, so well, it's more than memory pages, right? So I'm, I'm bringing it,

Starting point is 00:32:01 bringing the, the warmer pages up whenever they're, whenever they're needed. And I'm dynamically trying to keep everything in the fastest memory, but, but I'm storing it in, I mean, when it's not in the fastest memory, it's still in really fast memory. It's not, it didn't have to go, go pull down to microseconds. It's still well within the nanosecond range. So Bray, how, how gray is that beard feeling in storage now? Tell me about it. I've been doing virtual memory for about four decades here. I was talking like 16 gigs. That was awesome.

Starting point is 00:32:38 Yeah, tell me about it. No, this is a different world. So the, the, the member thing. So it actually plugs in and sort of like a, it has PMM sitting on the server and, and then how's it, how's that connected to the fabric? I guess I'm trying to, I'm trying to understand. So PMM seemed like there was in the past, it was just a, it was PMM. Memberage was just a couple of PMMs and DRAM and it would, it would carve it up for you internally in the server, but there was no external version of that in the old days. That's exactly right. And so, so today they still offer that capability, right? PMEM is just a tier

Starting point is 00:33:19 of storage and remote memory is going to be another tier of that storage. So, so basically if I have PMEM on the system, and to be honest, the latency between PMEM and remote memory is, is pretty similar. We're just, we're just faster on the, on the backside of it. And, and, and gives you the option to, I could even depose PMEM to remotely across the, that PCI fabric if I wanted to. So you can choose DRAM or PMEM as that remote memory. Hey, so Matt, with that, what is that latency? What's the latency differential? From PMEM versus Compose DRAM?

Starting point is 00:33:55 It's actually pretty similar. So they're both run right around 300 nanoseconds. I keep thinking there should be a plug-in to the DMEM with a PCI bus floating back behind it or something like that. Is that how this works? I'm just trying to understand how the – so it's all logical? It's all PCI. There's no real plug-in other than –

Starting point is 00:34:17 It's all PCI. That's the beauty of the architecture. The DIMM slots in the server and they all talk over PCI. And it's a MemVerge software that makes that happen as well as your composability software. Yeah. I mean, we provide the transport. We actually create the connectivity. And from MemVerge's perspective, it sees the memory we connect just the same as if it sees the PMEM living on its own server. And so it just accesses it and says,

Starting point is 00:34:46 hey, all right, I'm going to make you a different tier than the PMEM living on me. And then if I were to compose more memory from either a farther away server or from PMEM on another server, it would make that a different tier with its own characteristics. And then it'll kind of page according to the performance characteristics of the memory that's on that system.

Starting point is 00:35:06 Right, right. So you guys have tightly integrated this solution with MemVerge, it appears. It is getting tighter by the day. I was getting ready to go there, Ray. I'm just like, you guys keep saying MemVerge a lot. Yeah, but it's a total solution here. Yeah, it is a total solution. I mean, it's a fantastic solution.

Starting point is 00:35:32 So what is to come of your organization and members? Oh, I would not comment on that one. Not going there. All right, so back to the talk here. So how does this thing sold? Do you sell through partners only? Are you direct sales? Yeah. So we are a partner only organization. So we do have a direct sales team, but that direct sales team still will only work with partners. What are some of your bigger partners then I guess? Yeah. So from a channel perspective, from a federal perspective, we have federal integrators from

Starting point is 00:36:10 CTG Federal to Cambridge Computing to ID Technologies. And then we have ICC from more of the commercial side, we have advanced data systems that we just did some stuff with SDSC with or San Diego Supercomputer Center with. So it's an ever growing list. Our distribution right now, we're going through Arrow. We are trying to keep it fairly small. And our partners are always going to be those partners that value technology first and want to kind of drive the latest and greatest and the new cool stuff. I'm not looking for a partner that's looking to just make a phone call, say, hey, you need a server. Here you go. I can coach a server. Aero's a great disty, by the way. I was just like, those guys are awesome.

Starting point is 00:37:05 Yeah, yeah. I love them. So. So, I mean, it seems like this is a, it's almost targeted primarily at HPC, but there's a, there's a commercial side of this as well. Right. I mean. Oh, absolutely. So you have HPC, you also have the AI side of that. Right. And so, so it's definitely merging together. And as the memory piece comes comes out farther, you're going to see a lot more things like traditional deep databases in memory databases that are going to be more in focus. And then you'll also you're going to see some more of this DevOps stuff that I'm really excited about. Right. That ability to go compose devices to a container as they spawn is really cool.

Starting point is 00:37:50 You didn't say anything about using Kubernetes and all this stuff. So wait a minute. So I can change the pod configuration on the fly to run the containers? Well, today we do that through Bright. So Bright controls all of that for us. And so all of that works well there. But honestly, it's all API driven, so it can be scripted also. So, but yeah, as you create a new container,

Starting point is 00:38:14 I can go compose devices for that container. I'm allowed to go say those devices are only for that specific container and go. And literally let you change your code sets immediately. It's just the right way to go if you're trying to be a true DevOps, very flexible environment. Ultimately, we will be your cloud, right? That is our goal, is to give you all the cloud flexibility

Starting point is 00:38:41 without having the price that comes with it. That's okay okay so great segue um so from a cloud perspective um what uh so when you're talking about the mega data centers uh out there out there in in the world if i wanted to basically take a look at your technology, what clouds could I go to? I would say I can't. Not disclosed? Yeah, I can't disclose it yet. Fair enough. They have very strong NDAs.

Starting point is 00:39:17 Yeah, I know. Trust me, I know. Okay, Matt, you mentioned the money word. How much does something like this cost and how is it, how is it charged for? Is it charged for, I mean, obviously there's, there's storage and there's GPUs and there's memory and all that's charged, whatever it's charged at. But, uh, and then there's this rack device that you actually are supporting and then it's obviously your own, um, PCIe switch and, and, uh, controller, right? Or something, right?

Starting point is 00:39:46 Yeah, yeah, no, absolutely. So it's all relatively inexpensive. Obviously, of course, I'm going to say that. But basically what we found is because you gain so much utilization of all your devices, you go from having 30% GPU utilization to 70%, right? We generally end up actually selling less hardware overall. A lot of times it's actually servers that we sell less of because you're able to go reconfigure your hardware to match your unique needs. And you end up spending a lot less in a composed solution. And the amount of jobs you can run actually significantly increases. So it's hard to say what it costs because generally, like I said, you'll end up reconfiguring your design to have less of this, less of this,

Starting point is 00:40:38 and to go meet the same job requirements. Do you have any examples of that that you can provide, a.k.a. any like kind of like total cost of ownership kind of documents? My crypto mind has got like six GPUs per server. And, you know, I've got like one or two of one with one one GPU, one or two with four, you know, things like that. I kind of like to put them all across all the servers. Right, right. Exactly. So all those things are possible from a TCO perspective.

Starting point is 00:41:12 We do have a TCO calculator that we could show. And it literally, what I love about it is we actually show a plot map of a whole bunch of jobs being done with certain sizes for each of those jobs and show you what it would look like if you create with a certain static architecture and what that looks composed as far as those jobs completing. You're using a couple of definable characteristics for those jobs. And you'll literally see, you can then pull back certain sets of hardware and go, I'm still doing more jobs, still doing more jobs. All right, now we're finally breaking even. And you're seeing how much less hardware you can do that, which is of course, power, significant power savings, um, to boot, not to mention just hardware costs. I've, I've got a lot of friends in HPC that, um, will love that, right. Um,

Starting point is 00:42:00 only because they, they have been through that. They went through the procurement cycle of, oh, we have to put GPUs in every node that we're deploying in this supercomputer platform that we're putting out there. However, the people that are developing the algorithms are not developing the algorithms for GPU. Uh, so there, there, there is this giant lag of, of where that stuff is actually used and,

Starting point is 00:42:37 and being able to compose that infrastructure, like, which is exactly what you guys are doing. Um, being able to compose that infrastructure into um determining what assets you have available and and how you allocate those that that's the that is the gold mine of this now Now, this has been tried many times. Composable infrastructure, you know, my first toe stepping into the water of composable infrastructure was with the SGI Origin 3000. That was a great system. Yep. That's why you have a gray beard. Yeah. That's why I have a gray beard. That's why I'm on gray beards on storage. I remember that thing. It was, it was,

Starting point is 00:43:34 it was a great system. Um, but you know, but honestly what it did, I mean, guess what it was a PCI switch. It was, which it wasn't PCI at the time. It was SGI's, the proprietary stuff at the time. But it's exactly the same thing you're offering now. How is what you're offering now different? Well, I mean, even Intel tried it, right? So Intel had tried doing the same thing, but what they were doing it on was PCIe Gen 2. And the challenge was latencies just could not keep up with what was actually required.

Starting point is 00:44:08 So you're talking about almost microsecond latencies at the time and composing resources over that type of distance with that type of latency just caused too many errors in the hardware. Not to mention, we've now implemented non-transparent bridging. So that NTB is how we are able to go talk memory to memory. And a lot of the kernels for operating systems haven't really enabled that until fairly recently. So a lot of that communication path using NTB is fairly new. So a lot of this stuff really wasn't an option. It wasn't truly an option to do it the way we did it. They tried, but they ended up finding scenarios where when they composed, they were not able to have GPUs talk to each other, for example. Like Dell had this C410X

Starting point is 00:45:02 GPU chassis and everybody loved it. I was actually working at Dell at the time. And it was a really cool looking box. I thought it was going to do really well. But what they found was because they couldn't have those GPUs talk to each other, it just fell apart and it just died in the vine. So a lot of hype, a lot of people really excited about it. But it's just some of those core technology features just weren't there yet. And we're finally at a spot that we can get there.

Starting point is 00:45:28 And then with CXL, really down the pipe, people are starting to already have a vision of this in their head of what this could look like. And so we're just going to make that vision come to reality. I completely agree with, so I completely agree with everything you're saying. And I really would love to see this push forward. I cannot wait to see what the next generation of this technology is going to look like. So Jason, any last questions for Matt before we leave? No. Matt, anything you'd like to say to our listening audience before we close? You know, I said a lot. I feel like I'm going to get in trouble after I get off of this thing.

Starting point is 00:46:13 But, you know, it was worth it. I really enjoyed the time. And I look forward to doing this again sometime. All right. Well, Matt, this has been great. Thanks for being on our show today. All right. Thank you. And that's it for now. Bye, Jason. All right. Well, Matt, this has been great. Thanks for being on our show today. All right. Thank you. And that's it for now. Bye, Jason.

Starting point is 00:46:32 Bye now. Hey, Matt. Matt, thanks. Awesome. Awesome. Awesome conversation. Yeah. Thanks, you guys. It was a lot of fun. Until next time. Next time, we will talk to the system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.

Your Ad Here

Grey Beards on Systems - 129: GreyBeards talk composable infrastructure with GigaIO’s, Matt Demas, Field CTO

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.