Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x11: Taking CXL To the Next Level with IntelliProp
Episode Date: January 16, 2023CXL is being rolled out in production and more reliability, scalability, and security features are being added all the time. This episode of Utilizing Tech focuses on enterprise-grade CXL with John Sp...iers, CEO of IntelliProp, talking about the ongoing evolution of CXL. IntelliProp is bringing a CXL fabric solution to market the enables memory expansion outside the server over a fabric. CXL 3.0 introduces memory fabrics, but it will take more development to bring features like high availability, routing, peer-to-peer, failover, and re-routing while preserving cache coherency and enabling management, and IntelliProp is working to bring all this to the spec. As we discussed in our last episode with Dan Ernst from Microsoft Azure, CXL delivers acceptable memory latency, and John expects that this will continue with fabrics to some extent. IntelliProp will profile memory and enable tiered memory that matches application needs. The company is also working to enable advanced features, including sharing memory between systems, in association with the CXL consortium. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Nathan Bennett: https://www.twitter.com/vNathanBennett Guest Host: John Spiers, CEO of IntelliProp: https://www.linkedin.com/in/johnwspiers/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789 Tags: #UtilizingCXL #CXLFabric #MemoryLatency @UtilizingTech #CXL @Intelliprop
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season of Utilizing Tech focuses on Compute Express Link, or CXL,
a new technology that promises to revolutionize enterprise computing.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me today as my co-host is Nathan Bennett.
Hey, Stephen, how are you? Pretty good.
Pretty good to be glad to be recording some episodes here in 2023 after the Intel Sapphire
Rapids announcement, which is pretty awesome.
We've been talking quite a lot about bringing CXL to market and making it real.
I mean, that's the point of utilizing tech.
And one of the aspects of that that occurred to me is that, you know, a lot of this stuff to the outside, especially can look a
little bit, I don't know, just experimental, new. Even with the, you know, Genoa and Sapphire
Rapids supporting it, there's still some holes. You know, we're still looking at CXL 1.1. But I
think there's a lot of promise here to bring this thing to the enterprise. What do you think? Absolutely. We're seeing adoption from platforms, especially Intel and
AMD. I mean, as you've mentioned, that's kind of like step one. Now we're seeing additional
expansion from, hey, this is something conceptual to something purposeful, something actually
physical in our hands to this is a platform that multiple other people can now
start developing on. And then it'll just continue to grow from there. It's like the dominoes are
falling finally, and we're going to see more and more of those dominoes continue to fall.
Yeah, absolutely. And that's, I think, where we're going here with this episode. So today,
we've got a special guest here, somebody that I've known for a long time, John Spires, CEO of Intelliprop,
to talk about how they're working with the CXL Consortium to basically add in a lot of
enterprise class features, high availability, security, reliability, sharing of resources,
and so on, and really push this spec to the next level. Welcome to the show, John.
Thanks, guys. I really appreciate you taking the time to do this and very excited to talk about CXL and what Intelliprop is doing in the space.
Our history has been developing ASICs throughout all these years, as Stephen highlighted earlier,
and now we're focused, since 2017, we've been 100% focused on
building our CXL fabric. And we're really excited that we have that FPGA version today.
And customers are working on running workloads on it in our lab. And we also have a few pods
in the field where customers are testing it. So John, tell me a little bit specifically
about the product. So what is the nuts and bolts? Like what are you it. So John, tell me a little bit specifically about the product. So
what is the nuts and bolts? Like what are you making? HBA, a switch, a chip, what is it?
Our main focus is building a CXL fabric. And as you know, the first versions of CXL were really
focused on expanding memory within the server, inside the server. And so adding, you know, things, adding devices that look like SSDs,
but they're memory devices that plug into the front of the chassis
and connect via CXL.
So that's really what
what you're going to see kind of as the first wave of adoption.
We're focused on external expansion.
So, you know, connecting CXL servers to a switch fabric
and then that switch fabric in turn is connected to arrays of CXL devices so expanding outside
the server at the rack level and at the multi-rack and across data center level so we're focused on
developing that core switching fabric. In order to connect
to that switching fabric, you need to have an HBA in your host. And what that does is it tells the
CPU, CXL that's on the motherboard that, you know, we plumb all that memory that's out there
connected to the fabric up to the server through that card. So the server
sees the memory just growing. It doesn't know we're managing it external in a fabric.
Yeah. And I think that from the, well, some listeners may be amazed at the idea of external
memory and the idea of pooled memory. If so, maybe listen to some more episodes of the show.
But I think those of them that have listened to a lot of episodes of the show might say,
cool, that's what CXL version 3 does, right?
But of course, I think there's a lot more that you need to do in order to make this
thing a reality, right?
Yeah, version 3 is really focused on being able to pool these devices together within a server and virtualize them to some extent.
And also has some ability to do external expansion to the server, but it's not a full functioning Fabric yet. The fabric working group, which is part of CXL, is working on building all the other features
and capabilities that make it a full functioning high availability fabric. As you know, you know,
to be enterprise class, you need, you know, failover capability, high availability. So if
a switch goes down, you're still up and running. If a port goes down, there's path failover and you're still up and running. You also need things
like security. You need a packet-based routing and you need to be able to
connect switches together, daisy chain switches together, so you can expand beyond
what can connect into a single
switch. All these features and capabilities, and expand outside the rack. All these features and
capabilities are in the works, and we're excited that Intelliprop's kind of leading that effort
and have a lot of that already incorporated into our switch design. What about taking maybe a bit
a tiered approach?
Is that something that y'all are investigating as well?
Being able to provide a higher level of performance
for some workloads, whereas other workloads
are lower performance.
Is that something that y'all are looking at providing
as well with this platform?
We kind of characterize different tiers of memory.
And so, you know, you have on-processor memory, obviously,
which is the fastest memory, you know,
the kind of the one to 40 nanosecond kind of memory.
And then you have the DIM, you know,
memory that's on the motherboard connected into your DIM sockets.
And then you have CXL connected devices within the server,
and then you have memory devices connected outside the server.
Each one of these has a latency profile.
We're trying to hit that one NUMA hop latency,
and our modeling and design has successfully achieved that,
which is 140 nanoseconds to 240 nanoseconds.
And we're excited that with a single switch hop,
we can achieve those types of latencies.
Now, if you go past the rack or do a multiple switch hop
through multiple switches,
obviously each switch hop adds latency.
But the way we're looking at this is
applications require different latencies and you'll I believe you'll even see
flash devices with a CXL interface and you know flash devices are in the you
know 10 to 40 microsecond kind of latency ranges.
But for some applications, that's just fine for a memory.
And it really depends on what you're doing.
And so our fabric actually has a capability
to keep all these statistics
on all the different memory tiers.
And then the application can assign itself
to the memory that it needs based on whatever
the latency profile is required. So, you know, we're excited that, you know, we see
the kind of the space playing out as multiple tiers of memory. You'll have
non-volatile tiers, volatile tiers, and it'll all be managed and byte addressable
and make your programming models very simple and work with a variety of applications.
So if I'm hearing you correctly, and correct me if I'm wrong here, because I just want to take something,
what you said and boil it down to a little bit of brass tacks.
Customers are going to be able to take their workloads and say, okay, so this workload gets the in-the-box direct memory,
whereas this workload may get a switch fabric type memory and this one may get a multi-switch fabric memory.
Are you seeing that as the tiered approach?
Yeah, the data will be tiered across multiple classes of memory,
and you'll see
management frameworks that make it all seamless. And memory can be dynamically allocated based on
latency requirements. And it's all connected to a single manageable fabric, and
the idea is to make it very simple and easy to deploy.
That's fantastic.
So right now in CXL, it's very exciting,
and there's a bunch of different numbers being thrown across, right?
So we've had Sapphire Rapport's announcement.
We've had the Genoa announcement.
And so we're seeing platforms coming up for CXL,
and we're seeing numbers like Interconnect One being
available on some of these platforms and you know CXL 3 is coming out and all
these different things and PCI 5 and PCIe 6. Where are y'all targeting this
platform to start really getting its traction and moving towards the
market? I hear you say that there are some customers
that are currently working with it or some pods out there.
Where are y'all seeing this actually hitting its stride?
Yeah, so obviously the main major pain point
is applications with large data sets.
So there's a lot of data analytics applications out there.
There's AI applications of data analytics applications out there. There's AI
applications. There's databases. As you know, databases are real memory hogs. In
fact, you know, a lot of databases are spread across multiple nodes because the
amount of memory isn't sufficient in a single node. So you have to have multiple
nodes and then slice and dice the data across multiple nodes,
process that data and then stitch it all back together.
So CXL and having a single memory pool
and having multiple servers or GPUs accessing the same data
in the same pool makes it much more efficient,
makes it faster.
You could even argue that it requires less compute nodes to deliver the results. So yeah, we're working with
customers that hyperscalers, defense agencies, higher ed, you know, higher ed
institutions that are working on HVAC systems, they're all excited about what CXL is going to bring to the market.
Do you think, as somebody who's been in the storage industry, and I'm seeing a lot of
analogies here to like Fiber Channel and iSCSI and the things that we all went through with
building storage fabrics, do you think that there's going to come a time when system memory
is going to have the kind of advanced features that storage has in storage connected devices, things like snapshots, mirrors, clones, you know,
dynamic or sharing, you know, dynamic allocation, and also, you know, literally the multiple nodes
sharing the same memory areas. Do you think that's all coming to memory? And is this
what you're working toward as well? Yeah, most definitely. I mean, we see, you know, it's very, very similar to SAN and NAS.
I mean, if you remember the early days of storage, everybody stuffed disk drives in a server.
One server had too many disk drives.
This one didn't have enough.
You couldn't reach over and grab the surplus disk in the in
the other server well the same thing's happening with memory and it's very difficult to
utilize memory across multiple compute domains if you put it in a central pool like you did disk and
ssds now all your servers can share it And you'll see a lot of the same features
and capabilities. First of all, to make an enterprise class, you need high availability,
path failover. You need it very secure, so you need encryption, packet-based routing. You need
to be able to daisy chain switches together to expand within the rack, across racks, across the data center.
And of course, all this needs to be highly available and performant,
just like SAN and NAS are today, right?
And so, and to your point, a lot of these features like remote replication, snapshots,
deduplication, I don't know.
I mean, some of this stuff may be prohibited because it adds too much latency.
But then again, you know, the benefits from those types of technologies could benefit memory.
At the end of the day, memory is storage, right?
It's volatile storage.
And, you know, it needs to be much faster than non-volatile storage.
But again, there's different tiers.
A lot of people even use, you know, disks or SSDs as a RAM cache, right, for their applications today.
And so, you know, you'll see a lot of the same features and capabilities.
In fact, these same features and capabilities, I think, are required to make it a viable
solution to the enterprise.
Yeah, so I think a lot of that is really awesome. And just like NAS and SAN and all these different networks solutions that we see out there, it brought other complexities that
we had to kind of navigate like, okay, well, where is it? How is it running? And how do we make sure
that if it fails, nothing bad happens? What are you all trying to kind of turn at the curb and catch
and make sure that you don't have your learning from those experiences
and implementing that within your platform?
Some of our history, you know, our history started with Gen Z.
Now Gen Z is part of the CXL consortium, and Gen Z was designed to be a highly
available enterprise class fabric that
connected memory. And so a lot of these high-end features and things that you're talking about,
like path failover and redundancy and the way we do encryption and port-based routing
and things like that, were all designed in Gen Z to work at scale.
Even having 802.3 as a transport within the fabric allows you to go rack to rack.
And so a lot of these features and technologies were developed for years in Gen Z.
And the exciting part about it is Gen Z is now part of the CXL consortium.
And so as you work in what's called the fabric group that's building these switch fabrics
or designing the spec for these switch fabrics, there's talk of, well, we can grab that from
Gen Z.
Oh, Gen Z has that.
We can pull that in.
And so a lot of that's happening.
It's very exciting.
We think we're ahead of the game because we have a lot of that incorporated already.
If you know, Intelliprop really started, we chaired the Gen Z consortium.
So we're kind of ahead on what all those features do and what they're capable of and have incorporated some of that stuff into our design already.
We see CXL 3.0 kind of being the starting point for fabric, fully functional fabrics.
You'll probably see a 3.1, some dot releases associated with 3.0 that add specific features.
But I think we're kind of predicting that to have a fully functioned, fully featured enterprise class fabric, it might not happen until CXL version 4.
But we're working real hard and
working with the consortium to make sure that happens. Yeah, I was going to go there next.
Thanks for bringing that up because I know that the consortium is really, really interested in
bringing these features in. As you mentioned, all the Gen Z work is now part of the CXL effort.
And I think that that's tremendous. I'm really excited to think that you don't have to start from scratch. You don't have to re-engineer it all.
And you can, and folks like yourselves who were working on Gen Z can bring your knowledge,
your experience, and of course, things like protocols and patents even into CXL. Is that
right? I mean, that's really what you're working on in the fabrics group.
Yeah, that's exactly right.
We're working on a switch fabric that has targeted latencies
that we talked about earlier
and has a lot of these high-end features to make it enterprise class so that it can be deployed in, you know, clouds, public clouds, private clouds, high-end enterprises, and be kind of a general purpose switch fabric, much like fiber channel switch fabrics are today.
In your experience in doing this stuff, I mean, what do you think are the big features that are needed in order to bring fabrics to fruition?
You know, what are the kind of low-hanging fruit or maybe the longer reaches that they need to add to the CXL spec?
Well, the first thing is downward compatibility to older versions of CXL.
Well, we've heard that that's going to be in there.
I mean, that seems to be a very strong emphasis from the CXL consortium.
Yeah.
And so at first it wasn't with 3.0.
And so we're excited that that's changing.
Being able to support peer-to-peer sharing of memory.
So GPUs like to communicate among themselves and share memory. And we call that peer-to-peer sharing of memory. So GPUs like to communicate among themselves and share memory,
and we call that peer-to-peer.
So it needs to support that.
Multi-level switching, so being able to connect switches together,
have multiple levels of switching is needed to expand beyond one switch.
802.3 is a transport.
I think that's very important because this is very high-speed technology,
and the cable, just the cable itself introduces latency.
So, you know, I've heard numbers like for every six inches of cable
with Gen 6 PCIe, you'll need a retimer.
Well, you know, that's kind of ridiculous.
So, you know, I mean, and that makes people instantly think, well, you know, this isn't
a viable solution.
Maximum scalability may be, you know, constrained to a rack.
Well, you know, if you have 802.3 as a transport, now you can go rack to rack and across data centers without having to worry about having retimers everywhere.
And so, you know, we think that's required.
You need path failover and hot plug.
As you know, hot plug is, you know, plug in switches, hot plug new switches, hot plug in new memory arrays, taking memory arrays offline without the system
crashing. Just like storage today, you can hot plug drives, hot plug arrays sometimes, and, you
know, hot plug ports on switches without the system crashing. There's also, you know, shared
memory pools across compute domains. So, you know, if you have a pool and have multiple
servers sharing that same data set,
you can't have them step on each other and corrupt memory.
And so you need features that allow you
to share the same memory pool and have coherency, have ordering and all those types
of things. And then in-band hardware acceleration for fabric management. There's a big debate about
having in-band management versus out-of-band management. In the storage world, out-of-band
management is the only way to go, right? Because you don't want management in your data path.
Well, with CXL, if you're building it at a chip level, you want it in-band because out-of-band has that effect of slowing things down.
Being in-band and part of the ASIC framework makes it very fast and
efficient and makes it scalable. It's very hard to scale this stuff without a band management.
So that's a big one itself. And then robust security, hardwareced security is needed. There's elements of that in the spec already,
but we think that it needs more.
And then having packet-based routing,
you want to be able to route your packets
without de-encrypting them.
So you encrypt the data but not the header
so you can route the packet
without decrypting
the whole thing and so you know because if you're deep you know encrypting and decrypting on both
sides of the switch you know you're adding a lot of latency so the little subtle features look i
covered you know a half dozen or so of them but there's even more than that. This stuff's complex. As you guys know,
you know, SANS and NAS were very complex too. It took the industry years to get it right.
I think it's going to take years to get this right as well, but we're excited to be kind of
leading that effort. And we think our first silicon will be a viable enterprise class
fabric switch. Yeah. On that note, I know that y'all are,
have already rolled out an FPGA of the ASIC,
which is great because it gives you the chance to make changes and so on and
really kind of test it out and see how it's working and make sure that you're
really, you know, kind of ready to go. When,
when are we going to start seeing silicon ASICs in the market?
So you're going to see a lot of ASICs around CXL 2.0, 3.0 that allow you to expand memory in the server and maybe even do a direct attached array kind of thing.
So DAS, or I guess you could call it DAM, direct attached memory. We're calling our, you know, memory array, network attached memory, NAM,
you know, which is kind of follow on to the SAN and NAS world, right?
But I think those technologies are probably a year out.
You know, we're going to have our first silicon
towards the end of this year, Q1 next year,
and have customers start testing it.
We also have a network attached memory array design
that you can connect to your switch.
So imagine an HBA, kind of like a fiber channel HBA,
connected to the switch,
and then multiple NAM arrays connected to the switch,
all shared by the servers.
That's incredible. How big would such a thing get, theoretically? How much memory could you
put into something like that? Well, you know, 24, you imagine a 24 drive to you, although memory
generates a lot of heat, more so than an SSD.
So there's thermal issues with high-density arrays of memory.
But we're working on solving those thermal issues. But you can imagine a 24 drive to you.
And so if they're 256 gigabyte or 512 gigabyte modules, right? You can do the math on that, right?
But you can get fairly large arrays.
So, John, we've just seen the announcement of AMD Genoa and Intel Sapphire Rapids,
which are the first two platforms to support CXL in production,
which is awesome because these are going to take over the entire industry in the next year. What's your reaction to these server platforms now that all the work that you've been
doing is finally in the hands of end users? Yeah, we're very excited because CXL is not
really viable until the servers support it. So now the servers do support it. And people ask,
well, it's only version 1.1 on a server. What does that really
mean? Well, what that means is it's capable of connecting to memory within the server,
you know, CXL memory within the server. So Samsung, Micron, SK Hynix, all these guys are
building the memory modules that plug in like an SSD in front of the server.
And so you'll be able to hot add memory to the server through a drive bay, which is very exciting.
And so that essentially buys you expansion within the server, memory expansion within the server outside the DIMM slots.
And that's what CXL 1.1 is all about, expanding memory within the server.
What we do is we come in and put our card in there with our ASIC on it.
And now you can connect to a memory fabric with higher versions of CXL.
We go grab that memory that's out there in the pool connected to our switch,
and we present it to that chip on the motherboard, that 1.1 chip, and it sees its memory pool just
grow. It doesn't know that 15 NAM arrays connected to the switch aren't drives plugged into the
server. It sees it as the same thing. So we're plumbing all that external memory up into the server. It knows, you know, it sees it as the same thing.
So we're plumbing all that external memory up into the server
and CXL 1.1 can access it, make it all byte addressable,
just like it does the stuff that's inside the server.
And so the spec of the, you know,
these servers isn't that much of a concern.
You know, ultimately, say 10 years down the road, maybe CXL version 5 supported by
the server, right, for example. Well, now you won't need an HBA in the server anymore. You can
connect directly to the fabric switch and everything will just work. But that'll be a ways down the
road. Yeah. So just to be clear, these platforms, AMD's and Intel's new server platforms, even though, I mean, I don't want people to be spooked, even though it says 1.1, that's going to be forward compatible with a lot of the things we've been talking about, right?
Yeah, very much so.
Excellent. Well, thank you so much. This has been a great update. It's great to kind of hear where the fabrics aspect of CXL is going. It fits really nicely into some of the episodes
we've recorded recently.
We did have AMD join us.
Also, I'll call out the last episode
where Dan Ernst from Microsoft Azure
talked about what you talked about with memory latency
and talking about the fact that the memory latency
is actually pretty good and pretty useful with CXL.
Yeah, for most applications, it works just fine. So, John, thank you so much for joining us here on Util with CXL. Yeah, for most applications, it works just fine.
So, John, thank you so much for joining us here on Utilizing CXL.
As we wrap, where can people connect with you
and continue the conversation on CXL and other advanced topics?
You can access our website at Intelliprop.com.
And I'm John.Spires at Intelliprop.com.
And, you know, we're an early stage startup.
We haven't invested a lot of money on our website.
But we have some of our early FPGA-based products depicted there and talk about, you know, what our capabilities are.
And then we have a white paper and other collateral material and some videos actually showing, for example, at Super Compute 22,
we were composing memory into a server using Liquid's software.
So the beauty of our solution is we have an API, a rich API, and a lot of these management frameworks like Liquid Composable Management Frameworks
can talk to our APIs that allows them to compose
memory into servers. And we're
working with members and some others on that front.
But yeah, so there's videos
of us composing memory
from our FPGA fabric
into servers, and that's pretty exciting.
It is, absolutely.
Nathan,
anything new? What's going on with you?
Anyone can find me on Twitter at VNathanBennett and at Mastodon at VNathanBennett at AWSCommunity.social.
And I'm also trying to speed up a bunch of YouTube videos so people can find me on YouTube at VNathanBennett.
And as for me, you can find me at S. Foskett on most social medias, including the Twitter and the Mastodon. Also, I will point out that we are putting together a CXL themed Tech Field Day in March.
So if you go to techfieldday.com, you'll see that we have announced Tech Field Day 27.
It's going to be March 8th, my birthday and 9th.
And hopefully we're going to have some of the folks in the CXL community, some of the companies you've just heard about presenting to the Tech Field Day audience, including folks like myself and Nathan. And we'll
put the video live stream of that on LinkedIn. We'll upload the video to YouTube. And if you'd
like to be part of that, please drop me a line. You can find me, as I said, at S Foskett on Twitter.
That's probably the easiest way to find me. So thanks for listening to the Utilizing CXL podcast, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe
in your favorite podcast application and give us a rating and a nice review, if you've got one in
you. This podcast was brought to you by gestaltit.com, your home for IT coverage from across
the enterprise. For show notes and more episodes, though, go to utilizingtech.com or find us on
Twitter at utilizingtech.
Thanks for listening, and we'll see you next week.