@HPC Podcast Archives - OrionX.net - @HPCpodcast-77: Adrian Cockcroft on Future Architectures
Episode Date: November 30, 2023Adrian Cockcroft joins us again after SC23 to discuss TOP500 trends, the AI-HPC crossover, chiplets, and the emergence of UCIe and CXL advancements. Be sure to listen to previous episodes with Adrian...; Episode 36 on HPC in cloud and sustainability data and Episode 55 on decarbonization and ESG. [audio mp3="https://orionx.net/wp-content/uploads/2023/11/077@HPCpodcast_Adrian-Cockcrofy_Future-Architectures_20231130.mp3"][/audio] The post @HPCpodcast-77: Adrian Cockcroft on Future Architectures appeared first on OrionX.net.
Transcript
Discussion (0)
Didn't get a chance to witness how Smarter revolutionizes HPC with Lenovo at SC23?
Check out their Inside HPC booth video to get caught up on the latest from Lenovo and HPC.
Visit InsideHPC.com slash Lenovo dash at dash SC23.
When I make predictions about the future, I like to come back a year later and say,
well, how did I do? And I'm aiming to do it again next year because some of these things
could take a few years to work through. Two requirements of vision are that it is compelling
and it is inevitable. If you get both of them, then you've got the beginning of a good vision.
We're seeing AI applications emerging as one of the use cases for running on the HPC standard.
So I can build myself a super large chip that's got all these chiplets glued together.
A year ago the UCIE consortium was just launching.
This year they've got over 130 companies signed up for it.
In a world like that, what is a computer? What is a system anymore?
From OrionX in association with InsideHPC, this is the AtHPC podcast. Join Shaheen Khan and Doug
Black as they discuss supercomputing technologies and the applications, markets, and policies that shape them. Thank you for being with us.
Hi, everyone.
I'm Doug Black.
And Shaheen, great to be with you again.
Excellent to connect after SC23.
Yeah, we're recording this a few days after SC, and I think we're still buzzing from the
show.
So much went on.
And to help us sort things through, we have a great repeat guest with us today, Adrian Cockroft.
He's a partner and analyst at Orion Exit, obviously a colleague of Shaheen's. He's a
consultant and providing advisory services. Going further back, Adrian was a VP at AWS for several
years. So I want to add, and this is actually a good segue to encourage our listeners to go listen to our previous episodes because I was listening to one from a couple of years ago, and it's still very much valid.
So related to that, Adrian was our guest first back in September of 2022 in episode number 36, where we talked about cloud, obviously given his background at AWS and his previous background as a distinguished engineer at Sun
Microsystems and at eBay and at Netflix and such. And then we talked about HPC data, especially in
the climate modeling world. And then that was a segue into environment, sustainability, governance,
ESG. We also touched on actually the Netflix journey from on-prem to cloud. And then Adrian, you were back on May of
2023, just a few months ago, when we talked specifically about decarbonization, renewable
energy, and then drill down into ESG. So this one is really about a recap on SC23 and what's
becoming an annual paper that Adrian has been writing about system architecture and future trends.
Yes. And the article Adrian wrote last year was a hit and really interesting. So Adrian,
thanks so much for joining us. And where should we all begin? Should we start with the top 500?
Yeah, I think that makes sense. The story I wrote last year was sort of the coming back to
supercomputing after roughly 20 years away. I worked with Shaheen at Sun in
2003, 2004, and then Sun laid off our entire team. So I went off and went to eBay and went off to
do things that weren't HPC for a bit. But I've always been interested in systems architecture,
what's happening next, and looking at the sort of evolution as it goes through. And I wrote this
story after sort of seeing everything happening at supercomputing last year. CXL was a big deal and it looked like it was
a really interesting basis for next generation architecture. So we'll talk a bit about how that's
looking next. And then the other point I was looking at around the sort of computer architecture
and what was going on in the top 500 list was that the workloads don't run
that efficiently on current systems. I mean, the real world workloads and looking at how do we
build architectures where they're going to be more efficient at running real workloads rather than
necessarily just LIMPAC. And that was that general idea. And I think the other thing we're talking
about there was whether we would eventually see more custom CPUs dedicated to the HPC market versus the off-the-shelf CPUs,
which are really not built for HPC. They're built for running AI workloads and things like that.
So I think there's a few different areas discussed last year, and we can talk a bit about how it
looked a year later. When I make predictions about the future, I like to come back a year later and say, well, how did I do? And I'm aiming to do it again next year because some
of these things could take a few years to work through. Well, I respect that because so many
people who write predictions, nobody, especially themselves, no one's held accountable for it.
Well, I'm enjoying not having a corporate PR department telling me I can't say things like
this because as a VP at AWS, you could not make predictions.
It's not, there's a very dangerous place, but I like to just say, well, this is what I think
is going to happen. And then I can, you know, looking at the patterns I've seen over like a
40 year career in the industry. Well, the trick with predictions is to predict stuff that's
already happened. That helps, but also you can predict what will happen fairly well,
but predicting when is hard. Yes, that's right. also you can predict what will happen fairly well, but predicting when
is hard. Yes, that's very true. Sometimes you can predict when, but you can't predict what.
The Heisenberg uncertainty principle applies to predictions. It's a when or a what. Pretty good
at figuring out what will happen. Sometimes I've had to wait a lot longer than I wanted to for the
when. That's right. I once saw a presentation from a futurist who i think was notably self-confident he said i don't predict the future i analyze it also there's alan k's
best way to predict the future is to invent it invented that well some of that in the cloud
space there was some of that going on with all the work we did at netflix we were just putting
stuff out there well if you don't know when it's going to happen, then that becomes vision, right? If then the two requirements of vision are that it
is compelling and it is inevitable. If you get both of them, then you've got the beginning of
a good vision. And which I used to jokingly say, it's another word for you're not going to see it
anytime soon. So what is your walkaway takeaway from SC? So there were a couple of things. The
top 500, there were a couple of updates in the top end of the list that were interesting. One, the HP Cray Frontier system came
out last year as the first real exaflop scale system. So that was interesting. And there was
a bit more. That system is now delivering real results. It's being run. And that's been a very
successful product. And that's a HP Cray, but AMD based system. And sort of roughly at the same time,
an HP Cray,
but Intel based system called Aurora
has been under development
for a few years,
but taking a lot longer.
And that has now been partially installed
and it came in as a second place entrant
with 585 petaflops.
And I think that one sort of,
you sort of squint at it. In some sense, it's successful
because they've got it going. But in other ways, it's really a year or two later than it should
have been. And they still seem to be doing quite a lot of work with a big team to get it up to
speed. So that's sort of interesting from that point of view. And then the third place was
Microsoft Azure. The system's called Eagle. It's Intel CPU, NVIDIA H100 GPUs
on InfiniBand. It's sort of their cloud system. I think it's probably the system they use for
running AI workloads for open AI and people like that. But they got 561 petaflops. And I went to
the BoF session where they were talking about both the Aurora guy got up and said, I've been
working on this and we have this big team and it's all working out and we're fine. And then the Microsoft guy basically said, well,
we had a few people and a few days and we just put this together. We started small and we just
kept scaling it and it just kept going. And we just stopped when we ran out of hardware that
we could find on the standard build of the standard image operating system image. And
they didn't have any custom stuff. They weren't fixing bugs. They were just running the benchmark and throwing more hardware at it. And this was a huge
contrast in sort of, well, okay, you've got a big team working for years to try and build this
machine and somebody else going, yeah, it took us a few, we spent a few weeks, but it wasn't
a massive amount of work for a lot of people. And I think that was, for me, a big interesting
contrast. And it shows that
there's sort of these tipping points where the availability of a supercomputer exascale system
is basically, do you have enough money? And then the system will just appear in front of you in
the cloud. And we've got systems now with, at least on HPL, the same kind of performance as
the biggest systems. But they are primarily built for AI and they don't have all of the exascale software stack that's been really built around the needs of supercomputer applications.
So it's an easier problem to solve. But then also we're seeing AI applications emerging as one of
the use cases for running on the HPC standard, things like Aurora and Frontier. They're running
the AI workloads as well. So I think it's as the workloads blend, it's going to just be easier to deploy a much bigger supercomputer. And I'm
expecting in future sort of as a prediction, we're going to see more, it was easy to run,
let's just put it at put out a number, big numbers in in the top 500 list in the future.
There was a social media post that said Microsoft team had three days to do this.
And like you said, they could have done more if they had access to more resources and had
maybe a little bit more time.
And I think like you're saying, it's the power of using standard, already pre-debugged,
as you put it, just cookie cutter your way into happiness.
And in my mind, the Microsoft team first keeps getting kudos from me for doing what they're doing, which is great.
And they participate because I think other cloud providers in principle can do it too, but they're not playing.
But it shows where the floor is.
I think it shows that the floor for readily available performance is now easily into the multi-hundred petal blocks.
And as you're saying, if you show up with a big enough checkbook, you can just get your way. So that really puts into question what the goal should be for the national centers
and government-funded centers. Yeah. What are the implications of that,
that you can spin up a capability that's, what, 25 pedoflobs below where Aurora is now? You can
do that in three days? They had a graph showing their scaling,
and they had about 85% scaling still going on in terms of like, if you just kept adding resources, you got 85% of it
was going to the benchmark bottom line. So they could have, they just needed to find a few more
racks and they would have been above Aurora. And the interesting, maybe Google and BlueOS,
and I mean, NVIDIA had another system somewhere in the top 10 as well that was basically their own cloud-based system.
So I think that they're all going to, if they just take the time in between their big AI runs, they're running an AI training for a month, and then they take three days to run an HPL, and then they go back to running AI training.
They just treat that as marketing or something.
Maybe we'll see more of those next year? Well, I think a big issue for the HPT community has been risk management, because we've all seen that movie before a few times, is that you rely on a particular vendor
to do something for you, and then their business takes them in a slightly different direction,
and suddenly the piece you were really relying on is no longer available. So that really means
that you have to have the vigilance to make sure that HPC workloads are covered and fed and
technology is moving towards that direction. Today, there's pretty good alignment between HPC and AI,
but can that be guaranteed 10 years from now? And if not, then we need to continue to have our
fingers in that pie, if not the entire hand. Yeah. Some of this comes down to what are the
architectural differences if you're really optimizing for the HPC workloads of finite element analysis and things like that. CFD and all the
weather and all this kind of modeling, different workloads and there's different optimizations.
One thing we've seen is the balance of CPU versus GPU may not be the right for if you just take the stuff you're using for your AI workloads.
So one of the things we saw there was GigaIO, who have figured out how to get 32 GPUs on a single CPU in a single node.
And that was last year. That was their super cluster or super node.
This year, they doubled that to 64.
So eight is the typical number you see in sort of the cloud instances.
It's sort of the cloud instances, sort of the
standard number that you get eight GPUs on per CPU, but they're up to 64. And they called it a
super duper node, which I found amusing. So it's good marketing to come up with a fun name. So the
64 GPU super duper node, you're putting that on a fabric, you've got something that's really aimed
more at something that's going to be much more compute intensive on the float side. And if you can optimize your workload to run in
that environment, then that's at least one sort of direction that seems to be interesting.
That just strikes me as astounding power within one node.
Yeah. And so if you can do it in one node, do it in one node. That's like Amdahl's law,
right? You don't want to communicate unless you have to. If you can run it on a single machine, you've got the highest bandwidth. It's all coherent. Everything is
running within one box. So that's 64 GPUs on a single box is an interesting architecture.
You know, this raises the point of how big your application is and how long it's going to run.
Because if you have a single killer app that's going to run forever, well, then you go and
optimize it all the way. But if you're having a workload app that's going to run forever, well, then you go and optimize it
all the way. But if you're having a workload or if your application is changing all the time,
like it does in the financial services when they tinker with the application on a daily basis.
So the application is never stable enough for you to super optimize for it. And if that's the case,
then you need an infrastructure that is able to dance with you and that prevents you from going all the way.
So when you have these big, quote, imbalanced configurations where you've got like very few
CPUs and massive memory or very few CPUs and massive number of GPUs or vice versa,
then the question becomes, how often can I use that particular configuration? And do I have enough capacity on my data center in varying
modes of computation for me to compose my way into what I need at any given time? I think all
of that becomes a bit of a complexity that essentially sets the tone for your center.
Are you a special purpose center? Are you a general purpose center? That sort of a thing.
And we've also, there's one of the challenges with GPUs is they're moving so fast that if you buy a whole bunch of them, are they still useful in a year,
two years, three years? At what point do you need, if you're depreciating over five years,
like people tend to do with their standard machines, a five-year-old GPU is just a waste
of time at this point. So do you have to depreciate over two or three years, something like that,
because of the rate of change?
So there's some interesting sort of cost of ownership problems with GPUs that you might
want to, this is part of the rent versus buy thing, right?
Just use the latest one and whenever the new one comes out.
That's right.
Then you've got the energy costs.
Yeah.
So the energy cost of running, I mean, the older ones are slower, but they use the same
amount of energy, right?
The power consumption is sort of limited by the packaging. So it's a 500 watt chip and it's a 500 watt chip three years later, but
you've got whatever, 10, 20 times the performance or something. So it's not effective, cost effective
or energy effective to use old models. So there's a couple of things that happened also at the
conference. NVIDIA had some announcements around their Grace Hopper combined super chip,
I think, the GH. And they also said that they're going to be, a recent announcement,
that they're going to be releasing on an annual basis instead of every two years. They're sort
of starting up additional engineering teams. So this seems like, hey, we're doing well,
we've got plenty of money, let's go and double down and push even faster because they can see
things they can do. So that's an interesting development that's going to push this space forward even faster in the
future. I don't know if you've heard about that, Shaheen. Yes. In fact, Carl Freund was our guest
a few episodes ago and we talked about, I was jokingly saying that Intel used to have TikTok
and NVIDIA seems to be having TikTok. And it is also a little bit of a controversial move because some of the other chip vendors are saying,
is that even possible?
Is it practical?
Given that you have to align your chip design
with fab technology that is coming.
So maybe at the end, it becomes like a TikTok again,
but it remains to be seen for the moment though.
As I said, in my wrap-up episode,
there's just so many chips out there
that it is impossible
for the traditional average data center to know which one to pick.
There are literally tens of them.
And now each vendor has three separate, four separate offerings.
So figuring out what to use when for what and what to standardize on is really a very
difficult task.
And I think that's a market opportunity for those who can test drive these and be able to advise customers. If you didn't get a chance to visit the Lenovo booth
at SC23, or you just want to see it again, check out their Inside HPC booth video. Visit
insidehpc.com slash Lenovo dash at dash SC23 to view the video now.
One of the recurring things over decades has been, well, I can build a specialized attached processor or something that will do a better job at that thing. And you have a year
or two before the general purpose things effectively catch up because there's a bigger
market for them. So there's this specialization means that you have to stay
ahead of the general purpose solution. We've seen that at many generations of that over the years.
But I think this is a case right now where there's all of these specialized solutions and that,
well, you get a win for a year or two, but then whatever you can buy that's going to be a general
purpose thing in huge volume in a year or two is going to be potentially better. And then that leads into kind of the next topic I think we should talk about,
which is chiplets and custom CPUs and the ability to sort of think about a next generation PC
oriented supercomputing architecture, where instead of a buying processor and GPU,
you're basically saying, well, I'm going to do a custom chiplet for the core of
my accelerator GPU thing, maybe vector based or something like Fugaku has. And then I can have my
own processor and I can over build the memory subsystem or the IO or whatever I feel like I
need. But then I can surround that with standard chiplets for all of the other bits of the system,
whether it's high bandwidth memory or
IO or whatever. So I can build myself a super large chip that's got all these chiplets glued
together. And a year ago, the UCIE consortium was just launching. This year, they've got over
130 companies signed up for it. And I was talking to somebody who actually works at Intel who's driving this, saying they will happily take chips, chiplets from wherever you happen to have bought them.
And Intel will build that onto a substrate for you with all the interfaces, meaning that this stuff should all just work.
And I think that's going to be really interesting development in the next year or two.
Envisioning a menu that you pick from and you say, I want two of these and three of
those. And if you're building a cell phone, maybe you want a radio. And if you're building a
supercomputer, you want a couple of 64-bit pieces there, but somehow you can formulate your ideal
configuration of all these chiplets. And then you have somebody like Intel or TSMC laid out on the
substrate for you, and then you see howI connects them. So one natural question is that
if that's the world, and we already see that world with the Apple M1, we already saw it a couple of
years ago. You look at your MacBook Air and pretty much everything is on that chip. You get outside
of that chip and you've got the screen and the power supply and the keyboard and that's about it,
right? So in a world like that, what a computer what is a system anymore it probably is mostly
substantially just that big substrate isn't it well what we've really done here is we've moved
from the pcb being the and interconnect there's a limit to how fast you can go over a printed
circuit board right and the interconnects right and we're just saying well the substrate gives
you higher pin density the pins are much closer together and you can put the substrate gives you higher pin density. The pins are much closer together. And you can put
the substrate, there's two different versions of the substrate, but one of them is more
active. Effectively, you've got, you're doping the silicon in the substrate itself so that when
the things land on it, there's some active components there, as far as I could see by
squinting at some diagrams anyway. But what you're basically doing is saying, I'm going to have more
bandwidth because I effectively got more pins and I've got lower drive needs.
So it's overall, you're just going to go faster.
So just think of it as shrinking circuit boards to something where you're running on the substrate
becomes the new circuit board.
So architecturally, it's not that different.
You've just managed to shrink it together in a way that lets you build a system that's
going to be faster, cheaper, and use less power.
Oh, well, better price performance, for sure. I mean, can we use paraphrasing software as
eating the world chips or eating the world? I think the world is eating chips is probably
more of a phrase that people would resonate with. But I think the key thing here is the
standardization means that in the same way as you can build a circuit board, that's how you
integrated parts from different suppliers.
And each chip was from one supplier.
We're now moving to this world where you can integrate chiplets from lots of suppliers
onto one substrate.
And that becomes the new way that I think things are going to be built.
Well, and actually, as you're saying, the real big shift is going to be in the supply
chain because these substrates are being manufactured by fab companies, not by PCB companies.
As we have said in other conversations, you're going from the motherboard to the mother chip.
And then you put it all on a water-cooled plate to keep the thing from melting.
So the other thing is the stacking in 3D as well.
So you can get things like memory chips, you can stack them on top of each other,
and then you have the problem that the ones in the middle melt because they're surrounded, right?
So how do you get the heat out?
So there's a bunch of issues around cooling, but it isn't just a flat substrate with chips laid out on it.
There's also some vertical stacking capability.
And people are laying out these chips and then dropping memory SRAM or HPM chips.
They're scattering them where they need to be.
So there's something interesting because when you start saying I can construct machine in three dimensions,
your path lengths start shrinking and you can go faster and everything gets easier to do.
There was some talk at the conference that maybe the biggest topic or a very widely
focused on topic was cooling. Is that something you're, in reference to your last comment,
is that something you're following, liquid cooling technologies and techniques? It's not something I've been
following that much, but it's definitely a plumbing conference. There are whole sections
of the expo, which is pipes and plumbing pumps and things. But I think that certainly as you
build more and more dense machines, the thing is, how do you cool it? And if you look at a
typical data center and you say, what is your power budget per rack? And these systems are getting to the point where you can't
deliver the rack because you have to have an empty space around it in all directions because that
rack is just going to get too hot. So I think we're seeing next generation data centers include
either liquid cooling or having sort of extension racks, which are the liquid to
air to interface, which take up a large amount of rack space just dealing with the cooling issues.
So I think that that seems to be the trend, but I didn't follow any specific new launches at the
event.
No, there was a lot, as you said, Adrian. And for those of us who walk in a hardware store and look at pipes, you've never seen such
clean, shiny pipes that exist and properly color-coded and they snap fit just so. And then
there is a whole supply chain of that sort of plumbing from copper-based fitters and joints
all the way to massive refrigeration units. So there were several booths on the exhibit floor
and several talks that talk about this because guess what? Liquid cooling is coming and there's
really no other way to get around some of this until quantum computing shows up or something.
The power is going to be dense and it needs to be taken out, like you said.
Let's talk about quantum for a bit because I think that's a natural segue.
My mental image of a quantum computer before I went this year was this big plumbing thing with lots of supercomputing stuff
and pumps and vacuums. And somebody said, yeah, the IBM pictures look like they call them
chandeliers and just big golden pipes in all directions. And I was thinking, you're never
going to have that in a standard enterprise data center,
right? That's a super specialized thing. And then we went along and we're looking at one of the vendors there and says, no, it's just a normal rack and it isn't supercomputing and
everything just works and you just stick it in your data center and access it, just put the right
kind of workloads on it. So maybe Shaheen, you're a bit deeper into this space. You can sort of
summarize a bit more about what we saw there. Yeah, I think we probably need to come back to this
topic and do a deeper dive. We had Bob Sorensen as a guest a few episodes ago to take us through
the market forecast that Hyperion has done and SC23 had a quantum village. So the modalities of
quantum computing are still out there. We haven't got to a transistor moment yet.
As I like to joke, we haven't even got to a Betamax VHS moment.
There are like four or five different quote modalities or approaches that are hovering
around.
But there's definitely a lot of progress in room temperature photonics and neutral atom
modalities.
Now, some of these room temperature approaches still need refrigeration somewhere.
They may or may not. So that's the whole system is more of what you want to look at. But fundamentally,
quantum computing promises huge energy savings. It goes literally from hundreds and thousands of
watts to a few single digit sort of watts. So that benefit is really what is going to drive it in my
mind. Yeah. So they were talking about having, I forget what, tweezers that you'd use to move an
individual atom. Laser tweezers.
Laser tweezers. So you basically pick an individual atom, put it next to another atom,
zap it with a laser so they become entangled, and then push those entangled atoms off somewhere else
and they have an array of these atoms. I have a physics degree from a very long time ago,
and that sounds sort of plausible, but exactly. The engineering...
No, they're doing it, yeah.
I mean, it's just a bit mind-boggling that you actually had to push around individual atoms
with lasers. Sounds, yeah, well, I'm glad someone's figured out how to do that,
but it sounds very cool. But it seemed like they're getting larger and larger arrays,
and they're available from cloud providers, and go are actually starting to use these to try things out.
So I think that's what was the vendor we were talking to that got the demo?
We talked to Quera who do a lot of really fun work and it's just a delight to go delve in
into all of them really. Quandela was there, Quix was there. The market is probably 20, 30 vendors that are building various aspects or systems, and
they're all very interesting.
So I expect that we will have examples of actual applications running, but they may
be set up, the machine may be set up just to do one application really well as the research
moves from the lab towards a real product.
Yeah, there was a report on the market which said that for the next five years at least,
the amount of money being spent on R&D is vastly more than the amount of revenue in the space from
selling machines. So it's developing and it's growing, but there's a point maybe five, ten
years into the future where you actually are getting
some payback on the R&D. But at the moment, it's a good way to sink a few billion dollars into
things that you're not going to get payback from for a long time.
Well, the promise is so intoxicating that if you're a government or if you're a major corporation,
you kind of can't afford not to play.
And that's encouraging because not too long ago, it was 10 to 15 years away.
If we're now saying five years away, that's progress, right?
Yeah, I think it was late 2020s when they were, I vaguely remember saying, but yeah,
it was certainly for quite a while and far enough that your predictions probably are
wrong, wildly wrong into the future.
But the trend was certainly increasing R&D spend and increasing revenue. It's just that the R&D spend is higher than the revenue at the future. But the trend was certainly increasing R&D spend and increasing revenue.
It's just that the R&D spend is higher than the revenue at the moment.
Now, if you're a government, you also think about that, even if I don't get a quantum computer out
of this, I'm going to get a whole bunch of really advanced technology that's going to help in other
as yet unforeseen ways. So you think it's a good spend of money because you're fundamentally
investing into advanced technologies that will be useful for the future. So you kind of hedge
your risk a little bit down that. Yeah. There's some kinds of optimization problems. When they
were talking about using this finance, using it for optimization, scheduling, transport,
there's applications that are not just crazy science things, but things that look like they are real world useful algorithms that people are trying out.
Now, maybe we can conclude with another view of the interconnect.
Because one of our walkaways last year was that with UCI coming for chiplets and then PCIe and CXL coming strong.
And then soon after there was the Ultra Ethernet Alliance.
And you pointed to some of the optimizations that cloud providers had done
to traditional Ethernet with low control and such.
We sort of walked away with a view that the interconnect hierarchy
in the data center was going to develop in a particular way.
Do you still see it that way?
And how would you describe it now?
Yeah, I think it's the sort of, it looked last year as if CXL was emerging as the way most people were looking at
the future. I think from this year, it's sort of the only game in town. So it's definitely a
stronger pitch. Although last year, CXL 3.0 had just come out, which specified a fabric standard.
Everyone was very keen on that and thought that a couple of years, we'll have a fabric.
This year, actually, during the conference, CXL 3.1 came out.
So they had to revise the spec.
And they said, OK, the previous attempt at getting fabrics wasn't quite right.
They got some feedback.
They added some more capabilities to the fabric, some related to trusted computing and security. Because if you're
doing multi-tenant on a fabric, you've got to be very careful about who can see what on the fabric.
And then there was some more around fabric management. So then if you're trying to design
chips, you need the standard to have stopped moving, really. So that means people developing
fabric management into chipsets, the spec just landed. So it's going to be another two or three years before
those come out. So I think the sort of net for CXL was that the basic CXL capabilities are on
a good track, lots of good support. We'll start seeing those more and more, but the more advanced
fabric management things are a little bit further out than we thought they were last year. But
hopefully this reset, this updated 3.1 spec is going to be the right thing to go
build on. So the thing about fabric management is if you have a bank of CXL memory that you're
sharing across a whole load of nodes, you really, right now, you can share the memory capacity
and allocate it to different nodes. So if one of your nodes needs a few terabytes of memory,
say you have a few terabytes of CXL memory, another one needs less,
you can sort of play around with the capacity.
But if you want to share that memory dynamically
between two machines,
then you need a much more fabric management.
You've got, you're using it as a shared memory across nodes.
And that's really where we need the fabric management
to make it more dynamic
and to have the ability to control what those nodes see
and the coherency across them.
So I think that's the ultimate sort of really interesting sort of use case.
But in the meantime, we're starting to see test systems and products and things like
that coming up that are looking at CXL coming out and all these different chipsets as well.
Now, Adrian, I know you saw a demonstration at SC of PCI 6.0 with some pretty amazing performance numbers.
Could you talk about that a little bit?
Yeah, most of the systems that are actually shipping right now are PCI 5.
And that runs, I think it's 32 gigabytes per second per channel, per 16 lane channel.
And then when you get to PCI 6, they basically, they changed the signaling to have four levels on the
wire instead of two. So there were typical digital signals on off. What they do for PCI-6 was they go
to basically 0123 and they're switching that. And then they announced PCI-7 at earlier this year,
and they were talking about that at the event. So PCI-7 takes the PCI-6 thing and just doubles
the clock rate on everything. So they basically figured out how to make it all to generate and detect
these four level signals quickly enough that they can actually run everything at twice the data rate.
So PCI 7 is going to come out in quite a few years out. But what they're trying to do is they
release a new version of PCI every few years that's twice as fast as the previous version.
And that's kind of the path
they're on. So what we were seeing at the show was there are processors available now like the
Intel Sapphire Rapids fourth generation Xeon that has four or kind of five. It's got 80 pins,
which is technically five. But basically people seem to be talking about having four either PCI
5 or CXL 1.1 16 lane channels. So people were doing a bunch of demos
with that. The fifth generation Emerald Rapids, Granite Rapids CPUs coming out, I think next year
will support CXL 2.0 on PCI-5. And the NVIDIA Grace CPU is also stated to support CXL 2.0,
but NVIDIA hasn't been talking that much about CXL. I think it's part of the NVIDIA
chip-to-chip interconnects, but they are pushing their own NVLink sort of right now rather than
confusing people by talking about CXL as well. So then we need to get to some more next generation
CPUs that actually have PCI 6.0, because the CPU is managing the coherence of this. And so if you
don't have a CPU that knows how to talk this new version of CXL, then you're not going to get the performance. It's not going to make sense. But you end up with
just keep doubling up. So it's 256 gigabytes per second on a PCI-7 connector. So 128 on PCI-6,
and I guess 64 on PCI-5 on each connector. And then each connector runs in one direction. So
you have another connector coming in the other direction, two wires to basically go in each direction point to point.
And that's how it's managed. And you can go a meter or two, sort of roughly within a rack with
this. Anyway, so that's kind of what's been going. There was plenty of demos. There are people like
Micron doing shared memory modules that are CXL-based and CXL switching.
So CXL 2.0 is really where I think this becomes useful.
And 3.0, as well as supporting Fabric, also supports multi-level switches.
You can start cascading switches together and build something that's a bit more complex.
Okay, Adrian, great to be with you again.
Thanks so much for joining us.
And I want to tell all of our listeners that Adrian's article,
which will touch on a lot of the topics we got into today, will be appearing on the InsideHPC
site if it's not already on our site. So with that, thanks so much.
Yeah, thanks for having us.
All right. Thank you all. Yeah. Happy Thanksgiving, everybody. And see you next time.
Cheers. Thanks.
That's it for this episode of the at hpc podcast every episode is
featured on insidehpc.com and posted on orionx.net use the comment section or tweet us with any
questions or to propose topics of discussion if you like the show rate and review it on apple
podcasts or wherever you listen the at hpc podcast is a production of OrionX in association with Inside HPC.
Thank you for listening.