Grey Beards on Systems - 129: GreyBeards talk composable infrastructure with GigaIO’s, Matt Demas, Field CTO
Episode Date: February 8, 2022We haven’t talked composable infrastructure in a while now but it’s been heating up lately. GigaIO has some interesting tech and I’ve been meaning to have them on the show but scheduling never s...eemed to work out. Finally, we managed to sync schedules and have Matt Demas, field CTO at GigaIO (@giga_io) on our show. … Continue reading "129: GreyBeards talk composable infrastructure with GigaIO’s, Matt Demas, Field CTO"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here.
Jason Collier here.
Welcome to the next episode of the Graybeards on Storage podcast,
a show where you get Graybeards, storage, and system bloggers to talk with system vendors and other experts
to discuss upcoming products, technologies, and trends affecting the data center today.
And now it is my pleasure to introduce Matt Demos, CTO of Giga.io.
So Matt, why don't you tell us a little bit about yourself and what's going on at Giga.io?
Sure.
Thanks, Ray.
So my name is Matt Demos, as you said.
I'm the CTO running kind of the technical strategy for how the company's going to move forward.
We've been doing a lot of really cool, interesting things
in the realm of composability.
You may have heard of composability in the past
from companies like HP and Dell composing drives.
I'm sure you guys have talked about that in the past,
but we're not only doing kind of storage,
we're taking the whole realm of what makes up a system,
disaggregating and recomposing it back together. And now we've been implementing things like
composable memory. And as CXL comes out, it's a pretty exciting field for us to be in.
Composable memory? Where does that come from? This is a whole different world for me.
It's not a DIMM anymore. It is a DIMM, but it's on a box someplace out in the world.
Yeah. So it's actually uh it's it's
a dim in another box one one area we're seeing a lot of interest in is people saying i've got
these old servers i could throw away those old servers have a lot of memory in them can i go
give that memory to a new server and we're saying absolutely let's go do it that's where cxl fits in
huh actually we're doing it we're doing it before CXL.
We're actually partnering.
Yeah, we're partnering with our friends over at MemVerge, and we are implementing a capability to allow you to actually compose DRAM directly into your system
and then let your system actually see that DRAM as if it was natively installed prior to CXL.
I would like to know more about that
MemVerge relationship. What are you guys doing with the MemVerge?
Yeah, so that's fairly new for us to be clear, right? A lot of the stuff that we've been talking
about from a memory perspective is we're really looking for a lot of customers that are trying to do a lot of this work
today. Having 10, 20 terabytes of memory inside of a server isn't for everybody, but it's certainly
the customers that we're looking for right now. So we've been working with them for the last
little while getting into beta stage. Now where we actually compose memory into systems, they run
their membered software. And then from there, they're able to go address that memory space
and go do all the memory management capabilities that they have.
So I can do things from just doing load store of memory into remote servers,
all the way up to the checkpointing and snapshotting capabilities
that the member offers where I can create snapshots,
move those snapshots from one server to another.
And soon enough, we'll be driving to the point where we are allowing multiple servers
to share the same memory so they can all access the same big pool simultaneously.
Does this use like the PMEM interface kind of thing to talk to storage
or talk to DIMMs out on this?
It's got to be a PCIe kind of extension, right?
Or something like that, right?
Yeah, yeah.
What is this box you guys got?
Yeah, so GigIO has natively been a memory fabric
from the beginning,
meaning that we have a PCIe switch and PCIe interconnects,
but the way it communicates
is I have memory in every device out there, right?
I have memory in storage, I have memory in GPUs, I have memory in obviously servers, right? So what
we do is we allow connections to talk directly from one memory space to another memory space.
So if I want to compose GPUs, it's talking to the GPU's memory. It's not just creating
a logical path, it's talking directly to their memory. And so we're able to go utilize that across that PCI fabric, or what we call our memory fabric.
And it allows us to go pull remote pools of memory from a distant server, and then allow us to go
capture that and utilize it as memory living on that initial host.
So once we create that connection, that's what GigIO does.
We create that connection and allow that system to see the remote memory.
And then from there, MemVerge takes over and says, hey, I see the memory and I'm going
to treat it just like I do my normal PMEM.
And so MemVerge does what it normally does.
It's just utilizing our remote memory access.
In the server, but on the back end of that is a real,
some memory someplace on this memory fabric.
Exactly.
And because we're running across PCIe as well,
our latencies are so low that you don't really see a performance hit.
So we're seeing customers that are not going to be able to get to 20,
30 terabytes of DRAM on
a box. And without having to go buy one of these crazy- 20 or 30 terabytes of DRAM on a box?
What are we doing with this thing? Is this guy, it's like Redis gone mad or something, or I don't
know, SAP, whatever, HANA? I can't, 30 terabytes seems. I also want to know when you're talking about latency,
what kind of latencies are you talking about?
Yeah.
So we're talking similar latencies to traditional high bandwidth,
high bandwidth memory.
So you're talking 300 nanoseconds,
right?
You're talking to HBM stuff.
Exactly.
I mean,
so we're talking that type of latency.
So traditional latency you see in HBMs is about 300 nanoseconds.
And we're talking right in that same realm.
Normally a DRAM on a server or something like that is probably what? An order of magnitude
faster than that? Is that?
It generally is about 40 to 50. Yes. So it is faster. And we're talking about PCI Gen
4, right? As Gen 5 comes out, then we hit less than 100
nanoseconds. And then I really don't see a difference between composed and non-composed.
So we'll be able to offer full scale out composed memory where it's almost imperceivable
here in the very new future with PCI Gen 5. And that's before CXL hits.
Exactly.
So we will have a CXL memory appliance,
but even without implementing the CXL functionality,
we will have this capability.
So what are the customers and workloads
that you're looking at deploying,
like that you're deploying today?
And then what are you looking at deploying
as far as customer workloads when CXL
pretty much hits with the 2.x spec and then when the 3.x spec hits? I can only imagine
that the customer workloads grow up significantly, right? That's exactly right. So today, I really
kind of focus on more of the AI HPC type of workloads, and mainly because those are workloads
that are seeing where single systems need lots of memory, right? Where you're also going to see
areas like Spark, things like that, that we'll likely be starting to work with here soon.
So large single systems that need lots of memory.
Then you're going to see in the near future, you start taking that same concept where I
can dynamically apply memory and add or move it to a host on demand.
And then you get into virtualization environments and you start saying, well, what if my VMware
cluster has 32 nodes in it
and instead of having to put seven or a terabyte of memory in every box where I'm only actually
utilizing about 40 percent of it or I chose to only put 500 if I could choose to put 500 gigs
in the box and I'm using 80 90 percent or go up to a terabyte and only using 40 percent so I'm
in that kind of weird spot now
with how large memory space has to be in each box
to be ideal or optimal.
So I'm able to go kind of set them
to a much lower memory amount in each node
and then have a memory pool available
to any node in the cluster.
So if a VM starts to run
and I get higher than I want to be on a single node, instead of having to go try to move VMs around to optimize, I can just simply compose memory to it.
And it'll automatically just grab memory from a pool.
This whole memory stuff is brand new to me.
I mean, I've been, you know, composability, GPUs, storage, you know, networking cards, things of that nature.
Okay, but.
Come on, you are a great beta on storage. I guess.
I guess. But you know, I saw
you do support composability
of storage and GPUs and things
of that nature, right? I mean, it's not just a memory.
That is absolutely
right. Because as I said before, every
device has memory. So we
talk to them all, right? If I'm going to go
compose GPUs, I'm going to write directly to
the HBM inside that GPU. I can even let GPUs talk to them all, right? If I'm going to go compose GPUs, I'm going to write directly to the, to the HBM inside that GPU. I can even let GPUs talk to each other, um, across the fabric. So I can do
things like with DGXs, for example, or, um, or OEMs. Yeah. So, so, so you have your GPU direct
RDMA is what's out there, what everybody kind of knows about today.
That's done with InfiniBand or high speed Ethernet.
But when that happens, it has bounce buffers on every single host that it has to go hit in order for those GPUs to talk to each other.
And so then that's because the RDMA protocol doesn't allow it to directly talk memory to memory of each of these GPU types. It's got to go translate into all these different memory layer out to the IB layer, communicate over IB back
down again to the memory stack and over the GPU. It tries to optimize some of that workload.
And it does a decent job, but it's kind of like when you saw GDS, right, or GPU direct storage,
when they were able to go take the R out of RDMA in GPU direct storage, it made that the
claims are five times faster from storage access.
So we take the R out of RDMA when we do GDR or GPU direct RDMA, we allow that to be a
DMA now. So same types of things apply
because I no longer have to go hit all these bounce buffers.
I no longer have to translate all these protocols.
I'm talking directly from memory of host one
to the memory of GPU and host two.
It doesn't have to go through the host at all,
bypasses the kernel altogether
and goes directly to the remote GPU.
That's really advantageous for an AI kind of environment
where you need a gaggle of GPUs just to keep the training
and inferencing activities going on,
but you never know where you really need them kind of thing, right?
Right, exactly.
And then you take the value of composability where I can say,
I have training going on, but just because training is going on doesn't mean,
or sorry, if I'm not doing training, if I'm doing inferencing,
if I'm doing, if I'm preparing my data,
those GPUs don't have to be in that box yet.
So I could, those GPUs could be posed to a different box while that boat,
all this box is preparing data. So I compose this to it,
let it ingest all the data, tag it, label it.
And there's no GPUs being used there.
They're being used somewhere else.
And then when it's time to actually train,
let me bring the GPUs to it.
And then they're able to be utilized.
So I get maximum efficiency out of all those GPUs I use.
At the same time, I can let those GPUs all talk to each other using full DMA capabilities.
Yeah.
I was going to say, having seen this technology,
it is actually, it is so cool. Especially that whole composability piece. And you know what, Matt, can you describe a little bit about, because there's a significant piece of, you know, hardware that you have and basically the interactions with the hardware that you sell. Can you tell us what you sell from basically what do you plug into the server,
what kind of switches you connect to,
and then what kind of devices can you connect into to connect to those GPUs
that you were talking about?
Yeah, yeah, absolutely.
So we start off with, as I said, the memory fabric.
It's kind of core basis is that PCIe switch.
So inside the PCIe switch, I have a bunch of PCIe Gen 4 by 4 ports. I have a ComExpress module in
the back that actually runs all of our software. And from there, I plug any other types of devices
I want. So I have an HBA that plugs directly into servers. So those don't really care what the OEM is.
Just plug the HBA in.
It connects into that fabric.
Then those servers can all talk to each other.
Anything that lives inside those servers can all talk to each other.
So I can talk to the drives
that live in the server next to it.
I can talk to the GPUs that may live in that server.
But if I keep them in the server,
they're still kind of locked to that sheet metal.
Meaning that if I keep them in the server, they're still kind of locked to that sheet metal, meaning that if I want to go kind of build a new server for something unique, a unique workload,
I have to still communicate across nodes. And that's fine. That works great.
But I also have another option. And that other option is going to be using what we call
our pooling appliances. So some people would call them JBogs or just a bunch of GPUs.
We call them accelerator pooling appliances and storage pooling appliances. So some people would call them JBogs or just a bunch of GPUs. We call them accelerator pooling appliances and storage pooling appliances.
And what those are just chassis that are built for power, cooling and uplink of PCI devices.
So I can put GPUs, FPGAs, vector engines, whatever other types of devices you may want to have,
even NICs in there. what is the size of the power supply
on that thing it has more than one uh let's just say that so i can imagine
also i do i do kind of want to rewind just a little bit um what is the so that card that you
are sticking in the server that is connecting into your thing,
is it basically like a PCI card that is literally just transferring PCI into your PCI switch?
That is exactly right. Great piece to point out. So we don't have offloads. We don't have to
translate anything. It is native PCIe.
So what that means is if I'm going to compose a device across that card, it doesn't have to translate to anything.
It's just if I plug a GPU into our APA and that goes through our fabric to the HBA, it's communicating PCIe the entire way.
So it is literally talking as if it's plugged directly
into the server.
Another silly IT nerd question.
What does that cable look like?
So the cable is, it comes in two conform factors.
So one is gonna be your copper cable
and that actually looks just like a SAS cable.
So if you're looking,
you used to kind of connecting storage arrays and filers, it looks just like a SAS cable. So if you're looking, you're used to kind of connecting storage arrays and filers,
it looks just like a SAS cable. If you are going to go longer distances, you're going to use our
fiber option. And that's going to look very similar to an AOC from Mellanox.
Give me a second. The storage stuff, how does this play out? And effectively, there's a gaggle of NVMe SSD into a 1U pooling appliance, and then I
have uplinks from there. And I assign how many drives I want to go to what server. And basically
in a matter of five seconds, those physical drives are electrically connected to that remote server.
And that server has full DMA capability. That server owns those drives
as if they were plugged directly into the box. That is by far the most performant way to go
connect a drive. In fact, we run some tests with Optane and we use Optane because of the
latency characteristics of Optane. And so we ran some tests with Optane and found that we,
when doing the full composition,
I was able to go do full reads and writes onto that Optane composed,
adding one more microsecond of latency over if it was just locally
installed.
Oh,
that is nice.
But you know,
the problem is it's like a 10 microsecond latency for Optane.
So,
okay,
now it's 11. It's not, it's not great, but it's, it's, it's like a 10 microsecond latency for Optane. So, okay, now it's 11.
It's not great, but it's still pretty damn nice.
11 microseconds in storage is awesome.
Yeah, of course.
It is, it is.
No doubt, no doubt.
But we've been talking about nanoseconds this whole time,
so it seems so slow.
Bear with me for a second.
Now, when you change, let's say, from I have a server that had five of these MVME SSDs,
and I want to now move two of those to another server, what has to happen here?
I mean, does A to the two servers have to be rebooted, or can it be done non-disruptively?
And C, B, you must have some sort of software orchestrating all this stuff, right?
Yeah, no, exactly. So I mean, that's one of the great things about what NVMeOF really did
back in the day was give you that hot add, hot remove capability. So it really actually was more
NVMe than NVMeOF. When with the implementation of NVMe into a server, it forced all these kernels to be able to handle hot adds,
hot removes in a much different way than it used to.
If I plugged a PCIe device into a server, what, eight years ago,
that server's gone.
But because NVMe's had to be able to be pulled out and pulled in
at the front of a server without that kernel crashing,
all those pieces of code have been put in place now where I can hot add and hot remove
devices.
The hot swap support made this all available for NVMe SSDs.
How does this work for GPUs?
I mean, are GPUs hot swappable?
Yeah, so it really depends on the exact OS.
Some OSs support it.
Some are a little quirky, meaning that if I have a PCIe hub composed to it, I can add more
GPUs and remove them without an issue. But if I'm kind of adding a whole new set of GPUs to it,
I'll have to go restart the system. To be honest, though, when you're talking about GPUs,
you're talking about drivers, right? When you have to talk about those drivers,
all those drivers have to be restarted anyway when you add or remove a GPU.
So the idea of having to restart the server
versus restart the service
is not
really that big of an issue.
And when
it comes down to it, those
are the components that don't fail that often.
Exactly. And when
they do, it's because you've overstressed them
heavily.
You need to look at my crypto mind with the-
Yeah, well, Ray, you are excluded from this because you do all kinds of silly things that you shouldn't do.
No doubt.
Well, actually, on that note, you think about it.
If you're putting those GPUs inside your server, they're fighting for cooling, right?
They're fighting the CPU, they're fighting the memory, they're fighting the disk for cold air.
And so by putting those hot devices inside of our GPU chassis or our memory, sorry, accelerator pooling appliances,
I'm able to go really increase the life of both my GPUs and my servers because they didn't have, they're
not fighting and battling for, for that, that cold air the entire time.
Matt, Matt, if you're arguing the point that Ray needs your appliances, like, I don't think
you need to argue that.
Yeah.
It's like, that's a different question.
Yeah.
Yeah.
Ray is going to totally agree with that.
Let's go back to customers.
How does this play out in an HPC environment and things of that nature that, you know, you would think like these supercomputer environments could really benefit from a
gaggle of GPUs sitting in, you know, a rack or two that could be allocated to wherever they need to
be allocated. Oh, come on. Tell us about TAC, really. That's what Ray's asking.
Yeah. I mean, it's definitely a huge advantage for a lot of HPC customers, right? So the idea
of being able to be dynamic. And you'll talk to some people in the HPC space and they'll
kind of fight against it because it's not what they're used to. And a lot of people are kind of set in the way they do things.
But when you look at what HPC is today and how AEIs merge with HPC and the fact that
most of these, especially larger systems, are not built to do one problem.
They're built to do hundreds of problems.
And so they expect to have different challenges all the time.
The idea of having a homogeneous compute environment makes no sense because if everything
has to be the same, that means instead of trying to solve a problem the right way, I have to go
change my code and make it adjust to the hardware. And so I'm not writing the code that I want to
write. I'm writing the code that I have to write. And so what we really enable is that ability to software to find your hardware.
And so a lot of these universities are really starting to see this capability where I can now
say yes to my customers instead of saying, well, we could, but you got to change this,
this and this, and we got to go buy something that looks like this in order to go make that happen.
Give me your wallet and we'll talk to you in nine months. Right? And so instead of having to go do that, they're able to say yes, or at most, hey, buy that new card that you want to have. I'll
add it to the fabric and then I'll say yes. And so we're talking about a couple of weeks instead of
nine months to a year.
Matt, Matt, I got a lot of friends and coworkers I need to introduce you to.
Half of them, I think you already know.
Yeah, right, right, right, right.
Talk to me a little bit about the software. You must have. So is this like an operating console or something like that that you talk to your composability solution or is it API driven?
Yeah. So, so we made a conscious effort early on to say, you don't need another GUI, right? We want
to be transparent. And so, so what we've done is we've made everything Redfish based. So I can do
all of my composition through the same API that you're already using to go
manage your hardware. So since we're moving hardware around and creating hardware connections
between devices, it makes sense that Redfish is the API that was chosen. So we actually don't
have a GUI in our environment. So today everything is Redfish API driven, and we've actually
integrated with a bunch of partners. And when I say integrated,
we didn't build a plugin. They actually came to us and asked to go integrate our capabilities
into their software because they saw the value would be to their end customers.
Yeah. So Bright, for example, Bright Cluster Manager has integrated Gig.io. Obviously, you saw some big news of those guys this week.
And so Bright, you have some Slurm
through a couple of different partners.
They've done some Slurm integrations.
Being that's an open source product,
it's something that people can do.
And a few of our partners
have actually integrated us into Slurm.
We also have a company called Control IQ.
You may have heard of them. Greg Kurtzer's new
company. They are building a product called Fuzzball and Fuzzball is already certifying or
implementing us in their 1.0 release set to come out here shortly. And that's actually gonna be a
cloud native HPC toolkit. Yeah, it's good tech.
It's fun to see
innovations and basically innovators
spurning other innovators
to innovate.
It's a lot of usage of
the word innovate.
startups.
Awesome.
I'm done. I'm done.
Like innovation. It's like, Oh,
it's like I get so excited about this stuff when I see, when I see startups,
startups like fueling startups, that's the, that is the,
the number one thing that basically a, you know,
a founder can be proud of. Yeah. All right. All right. Let me get back to
scheme here. So how does something like this work with Red Hat or VMware or Nutanix, those kinds of
guys? I mean, how does this play out in that space? Yeah. So some of that is in the works right now.
We've done some testing with VMware, for example, and we have beta
code with ESX that allows us to go compose. And so we can compose it actually in ESX, we can do it
without any reboot at all when adding GPUs and devices into ESXi. So that's super exciting. We're
waiting to see what comes further from that relationship.
Then you have things like Red Hat. We've been in talks today. The easiest way to go implement that is actually through Supermicro's integrated a product called SuperCloud Composer.
They're kind of getting in that software business now, which is nice to see.
And their first release at it is their kind of platform management software, and they've integrated GigEye on that as well.
So you can manage your whole rack to data center worth of systems.
And that's Super Microsystems, Dell Systems, HP Systems.
It kind of manages them all.
But you can actually compose your devices across those systems using that tool set as well.
So from an ESX perspective, what you're doing is you're actually messing with ESX's hardware in real time, which is not something you typically see.
So you and VMware are providing a capability to make this sort of thing happen with GPUs and VME SSDs, I guess, huh?
Yeah. I mean, you see, VMware has been pretty excited about trying to get this composability
aspect to work, right? They've made acquisitions to go do that. And obviously, we're still in early
stages with those guys and we have it working. We can actually go work with some customers and show them how to use it.
Still waiting to see kind of what those next steps look like
with VMware.
Pretty excited about what that'll do
for the enterprise market.
Yeah, but none of these guys really deal
with the memory size of things.
So when you start talking about being able
to expand an ESX solution from a 512 gig to a terabyte or two
in real time, it's a, it's a different world, I would think.
It is. And, and these are, these are conversations that are, they're likely to be being had soon.
So I, I can't really talk too much on, I kind of what that looks like today, because I'll be honest, it doesn't look like anything yet today.
But I have a feeling it will be soon.
I'm thinking SAP HANA and Redis and all these guys are driving bigger and bigger servers anymore.
And having the ability to do something like this would be something VMware and those guys would want, truthfully.
No, absolutely.
And not only that, I think it's more about just traditional data center flexibility, right?
We've been told that composability is the VMware of today, right?
That ability to be as flexible as you need to be to meet your customers' demands,
that's what VMware was founded to do right take that server and and let you say yes all the time because I was able to take something big and make it you know all these small things and be very flexible that's what they're that's what
the purpose of virtualization is and we're going to kind of help hopefully take that to the next
level and I I think I think you're well positioned in taking it pretty much to the next level.
And one of the things that I've always looked at as one of those components of like, where do virtual machines go to, to come to the next level?
And I think when you can create a virtual machine,
that's bigger than the physical constructs of what that virtual machine is.
Boy.
Yeah.
Yep,
dude.
So when you,
when you create a virtual machine,
so you,
you got a machine with a terabyte of Ram,
but you can create a virtual machine with
two terabytes of memory and something special and that's where cxl is going to come in that's where
everything that you guys are doing uh at gig io where that's going to be that's that that's going
to push computing forward.
I totally agree.
And the way I see it right now with CXL is VMware with the first generation of CXL
won't be able to do anything with it, right?
From that perspective,
they won't be able to go share anything across servers.
Now, with that said, GigIO can,
and we actually have designs to go do that.
So we will be having CXL-enabled sharing even in PCI Gen 5 with CXL 1.1 support inside the servers.
We've figured out how to do that.
So CXL will be coming in a shared arena here right along with the PCI Gen 5 servers as they come out.
Hey, Matt, besides the CXL standards and stuff like that,
there are other standards organizations in the composability space.
Do you guys play in that environment as well?
Yeah, so obviously we've been on the CXL consortium since the beginning.
And we are, it's really kind of more focused on the OCP piece
is really where more of the composability is really kind of more focused on the OCP piece is really where more of the composability is, is, is really kind of driving into. And so the,
so, so Redfish has been a big part of it.
That's why everything's also been really focused on Redfish,
but you're going to see a lot more from us here working with OCP and the
composable aspects of it.
So what's a, what's a,. So let's talk big things.
What's the biggest memory pooling appliance
that you guys support at this point?
And how many servers is it potentially distributed over?
Well, that's the thing is it really,
it's kind of whatever your imagination came up with.
I mean, there are limits.
Imagine a pretty big world here.
There are limits, but I can make the,
so basically I can create as many memory,
I can create a certain amount of memory windows
that I can go mount memory to for that server.
It gets kind of technical.
I can create so many of them based on the BIOS of that server,
but how much memory I
put into each of those windows is configurable. So if I have servers, each have a terabyte of
memory in my memory pooling appliance. It's like a virtual page space. So you've got a physical
page space that you're managing on the server itself, but the virtual page space behind it
used to be on storage. Now it's sitting on a memory device off a PCIe fabric.
Is that what you're telling me?
That's exactly right.
And then you use MemBurj and their technology to literally keep it hot and
cold dynamically.
Hot and cold memory.
Yep.
Oh yeah.
So,
so well,
it's more than memory pages,
right?
So I'm,
I'm bringing it,
bringing the,
the warmer pages up whenever they're,
whenever they're needed.
And I'm dynamically trying to keep everything in the fastest memory, but, but I'm storing it in,
I mean, when it's not in the fastest memory, it's still in really fast memory. It's not,
it didn't have to go, go pull down to microseconds. It's still well within the nanosecond range.
So Bray, how, how gray is that beard feeling in storage now? Tell me about it. I've been doing
virtual memory for about four decades here. I was talking like 16 gigs. That was awesome.
Yeah, tell me about it. No, this is a different world. So the, the, the member thing.
So it actually plugs in and sort of like a,
it has PMM sitting on the server and, and then how's it,
how's that connected to the fabric? I guess I'm trying to,
I'm trying to understand. So PMM seemed like there was in the past,
it was just a, it was PMM.
Memberage was just a couple of PMMs and DRAM and it would, it would carve it up for you internally in the server, but there was no external version of that in the old days.
That's exactly right. And so, so today they still offer that capability, right? PMEM is just a tier
of storage and remote memory is going to be another tier of that storage. So, so basically
if I have PMEM on the system, and to be honest, the latency between PMEM and remote memory is,
is pretty similar. We're just, we're just faster on the, on the backside of it.
And, and, and gives you the option to, I could even depose PMEM to remotely across the,
that PCI fabric if I wanted to. So you can choose DRAM or PMEM as that remote memory.
Hey, so Matt, with that, what is that latency?
What's the latency differential?
From PMEM versus Compose DRAM?
It's actually pretty similar.
So they're both run right around 300 nanoseconds.
I keep thinking there should be a plug-in to the DMEM
with a PCI bus floating back behind it or something like that.
Is that how this works?
I'm just trying to understand how the – so it's all logical?
It's all PCI.
There's no real plug-in other than –
It's all PCI.
That's the beauty of the architecture.
The DIMM slots in the server and they all talk over PCI.
And it's a MemVerge software that makes that happen as well as your composability software.
Yeah.
I mean, we provide the transport.
We actually create the connectivity.
And from MemVerge's perspective, it sees the memory we connect just the same as if it sees the PMEM living on its own server. And so it just accesses it and says,
hey, all right, I'm going to make you a different tier
than the PMEM living on me.
And then if I were to compose more memory
from either a farther away server or from PMEM on another server,
it would make that a different tier with its own characteristics.
And then it'll kind of page
according to the performance characteristics of the memory
that's on that system.
Right, right.
So you guys have tightly integrated this solution with MemVerge, it appears.
It is getting tighter by the day.
I was getting ready to go there, Ray.
I'm just like, you guys keep saying MemVerge a lot.
Yeah, but it's a total solution here.
Yeah, it is a total solution.
I mean, it's a fantastic solution.
So what is to come of your organization and members?
Oh, I would not comment on that one.
Not going there.
All right, so back to the talk here. So how does
this thing sold? Do you sell through partners only? Are you direct sales? Yeah. So we are a
partner only organization. So we do have a direct sales team, but that direct sales team still
will only work with partners. What are some of your bigger partners then I guess?
Yeah. So from a channel perspective, from a federal perspective, we have federal integrators from
CTG Federal to Cambridge Computing to ID Technologies.
And then we have ICC from more of the commercial side, we have advanced data systems that we just did some stuff with SDSC with or San Diego Supercomputer Center with.
So it's an ever growing list. Our distribution right now, we're going through Arrow. We are trying to keep it fairly small. And our partners are always going to be those partners that value technology first and want to kind of drive the latest and greatest and the new cool stuff.
I'm not looking for a partner that's looking to just make a phone call, say, hey, you need a server.
Here you go.
I can coach a server.
Aero's a great disty, by the way.
I was just like, those guys are awesome.
Yeah, yeah. I love them. So.
So, I mean, it seems like this is a, it's almost targeted primarily at HPC,
but there's a, there's a commercial side of this as well. Right. I mean.
Oh, absolutely. So you have HPC, you also have the AI side of that. Right.
And so, so it's definitely merging together.
And as the memory piece comes comes out farther, you're going to see a lot more things like traditional deep databases in memory databases that are going to be more in focus.
And then you'll also you're going to see some more of this DevOps stuff that I'm really excited about.
Right. That ability to go compose devices to a container as they spawn is really cool.
You didn't say anything about using Kubernetes and all this stuff.
So wait a minute.
So I can change the pod configuration on the fly to run the containers?
Well, today we do that through Bright.
So Bright controls all of that for us.
And so all of that works well there.
But honestly, it's all API driven, so it can be scripted also.
So, but yeah, as you create a new container,
I can go compose devices for that container.
I'm allowed to go say those devices are only for that specific container and go.
And literally let you change your code sets immediately.
It's just the right way to go
if you're trying to be a true DevOps,
very flexible environment.
Ultimately, we will be your cloud, right?
That is our goal, is to give you all the cloud flexibility
without having the price that comes with it.
That's okay okay so great
segue um so from a cloud perspective um what uh so when you're talking about the mega data centers
uh out there out there in in the world if i wanted to basically take a look at your technology, what clouds could I go to? I would say I can't.
Not disclosed?
Yeah, I can't disclose it yet.
Fair enough.
They have very strong NDAs.
Yeah, I know.
Trust me, I know.
Okay, Matt, you mentioned the money word.
How much does something like this cost and how is it, how is it charged for?
Is it charged for, I mean, obviously there's, there's storage and there's GPUs and there's
memory and all that's charged, whatever it's charged at.
But, uh, and then there's this rack device that you actually are supporting and then
it's obviously your own, um, PCIe switch and, and, uh, controller, right? Or something, right?
Yeah, yeah, no, absolutely. So it's all relatively inexpensive. Obviously, of course,
I'm going to say that. But basically what we found is because you gain so much utilization
of all your devices, you go from having 30% GPU utilization to 70%, right? We generally end up actually
selling less hardware overall. A lot of times it's actually servers that we sell less of
because you're able to go reconfigure your hardware to match your unique needs. And you
end up spending a lot less in a composed solution. And the amount of jobs you can run actually significantly increases.
So it's hard to say what it costs because generally, like I said,
you'll end up reconfiguring your design to have less of this, less of this,
and to go meet the same job requirements.
Do you have any examples of that that you can provide, a.k.a. any like kind of like total cost of ownership kind of documents?
My crypto mind has got like six GPUs per server.
And, you know, I've got like one or two of one with one one GPU, one or two with four, you know, things like that.
I kind of like to put them all across all the servers.
Right, right.
Exactly.
So all those things are possible from a TCO perspective.
We do have a TCO calculator that we could show.
And it literally, what I love about it is we actually show a plot map of a whole bunch of jobs being done with certain sizes for each of those jobs and
show you what it would look like if you create with a certain static architecture and what that
looks composed as far as those jobs completing. You're using a couple of definable characteristics
for those jobs. And you'll literally see, you can then pull back certain sets of hardware and go,
I'm still doing more jobs, still doing more jobs. All right, now we're finally breaking even. And you're seeing how much less hardware you can do that,
which is of course, power, significant power savings, um, to boot, not to mention just
hardware costs. I've, I've got a lot of friends in HPC that, um, will love that, right. Um,
only because they, they have been through that. They went through the procurement cycle of,
oh, we have to put GPUs in every node
that we're deploying in this supercomputer platform
that we're putting out there.
However, the people that are developing the algorithms are not developing the
algorithms for GPU. Uh, so there, there,
there is this giant lag of,
of where that stuff is actually used and,
and being able to compose that infrastructure, like,
which is exactly what you guys are doing. Um,
being able to compose that infrastructure into um
determining what assets you have available and and how you allocate those that that's the
that is the gold mine of this now Now, this has been tried many times. Composable infrastructure,
you know, my first toe stepping into the water of composable infrastructure was with the SGI
Origin 3000. That was a great system. Yep. That's why you have a gray beard. Yeah. That's why I have
a gray beard. That's why I'm on gray beards on storage. I remember that thing. It was, it was,
it was a great system. Um, but you know, but honestly what it did, I mean, guess what it was
a PCI switch. It was, which it wasn't PCI at the time. It was SGI's, the proprietary stuff at the time.
But it's exactly the same thing you're offering now.
How is what you're offering now different?
Well, I mean, even Intel tried it, right?
So Intel had tried doing the same thing,
but what they were doing it on was PCIe Gen 2.
And the challenge was latencies just could not keep up with what was actually required.
So you're talking about almost microsecond latencies at the time and composing resources
over that type of distance with that type of latency just caused too many errors in the hardware.
Not to mention, we've now implemented non-transparent bridging. So that NTB is how we
are able to go talk memory to memory. And a lot of the kernels for operating systems haven't really
enabled that until fairly recently. So a lot of that communication path using NTB is fairly new.
So a lot of this stuff really wasn't an option. It wasn't truly an option to do
it the way we did it. They tried, but they ended up finding scenarios where when they composed,
they were not able to have GPUs talk to each other, for example. Like Dell had this C410X
GPU chassis and everybody loved it.
I was actually working at Dell at the time.
And it was a really cool looking box.
I thought it was going to do really well.
But what they found was because they couldn't have those GPUs talk to each other,
it just fell apart and it just died in the vine.
So a lot of hype, a lot of people really excited about it. But it's just some of those core technology features just weren't there yet.
And we're finally at a spot that we can get there.
And then with CXL, really down the pipe, people are starting to already have a vision of this in their head of what this could look like.
And so we're just going to make that vision come to reality.
I completely agree with, so I completely agree with everything
you're saying. And I really would love to see this push forward. I cannot wait to see what the
next generation of this technology is going to look like. So Jason, any last questions for Matt
before we leave? No. Matt, anything you'd like to say to our listening audience before we close?
You know, I said a lot.
I feel like I'm going to get in trouble after I get off of this thing.
But, you know, it was worth it.
I really enjoyed the time.
And I look forward to doing this again sometime.
All right.
Well, Matt, this has been great.
Thanks for being on our show today.
All right. Thank you. And that's it for now. Bye, Jason. All right. Well, Matt, this has been great. Thanks for being on our show today.
All right. Thank you. And that's it for now. Bye, Jason.
Bye now. Hey, Matt. Matt, thanks. Awesome. Awesome. Awesome conversation.
Yeah. Thanks, you guys. It was a lot of fun. Until next time.
Next time, we will talk to the system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out. Thank you.