@HPC Podcast Archives - OrionX.net - @HPCpodcast-77: Adrian Cockcroft on Future Architectures

Starting point is 00:00:00 Didn't get a chance to witness how Smarter revolutionizes HPC with Lenovo at SC23? Check out their Inside HPC booth video to get caught up on the latest from Lenovo and HPC. Visit InsideHPC.com slash Lenovo dash at dash SC23. When I make predictions about the future, I like to come back a year later and say, well, how did I do? And I'm aiming to do it again next year because some of these things could take a few years to work through. Two requirements of vision are that it is compelling and it is inevitable. If you get both of them, then you've got the beginning of a good vision. We're seeing AI applications emerging as one of the use cases for running on the HPC standard.

Starting point is 00:00:50 So I can build myself a super large chip that's got all these chiplets glued together. A year ago the UCIE consortium was just launching. This year they've got over 130 companies signed up for it. In a world like that, what is a computer? What is a system anymore? From OrionX in association with InsideHPC, this is the AtHPC podcast. Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications, markets, and policies that shape them. Thank you for being with us. Hi, everyone. I'm Doug Black.

Starting point is 00:01:29 And Shaheen, great to be with you again. Excellent to connect after SC23. Yeah, we're recording this a few days after SC, and I think we're still buzzing from the show. So much went on. And to help us sort things through, we have a great repeat guest with us today, Adrian Cockroft. He's a partner and analyst at Orion Exit, obviously a colleague of Shaheen's. He's a consultant and providing advisory services. Going further back, Adrian was a VP at AWS for several

Starting point is 00:01:59 years. So I want to add, and this is actually a good segue to encourage our listeners to go listen to our previous episodes because I was listening to one from a couple of years ago, and it's still very much valid. So related to that, Adrian was our guest first back in September of 2022 in episode number 36, where we talked about cloud, obviously given his background at AWS and his previous background as a distinguished engineer at Sun Microsystems and at eBay and at Netflix and such. And then we talked about HPC data, especially in the climate modeling world. And then that was a segue into environment, sustainability, governance, ESG. We also touched on actually the Netflix journey from on-prem to cloud. And then Adrian, you were back on May of 2023, just a few months ago, when we talked specifically about decarbonization, renewable energy, and then drill down into ESG. So this one is really about a recap on SC23 and what's becoming an annual paper that Adrian has been writing about system architecture and future trends.

Starting point is 00:03:06 Yes. And the article Adrian wrote last year was a hit and really interesting. So Adrian, thanks so much for joining us. And where should we all begin? Should we start with the top 500? Yeah, I think that makes sense. The story I wrote last year was sort of the coming back to supercomputing after roughly 20 years away. I worked with Shaheen at Sun in 2003, 2004, and then Sun laid off our entire team. So I went off and went to eBay and went off to do things that weren't HPC for a bit. But I've always been interested in systems architecture, what's happening next, and looking at the sort of evolution as it goes through. And I wrote this story after sort of seeing everything happening at supercomputing last year. CXL was a big deal and it looked like it was

Starting point is 00:03:50 a really interesting basis for next generation architecture. So we'll talk a bit about how that's looking next. And then the other point I was looking at around the sort of computer architecture and what was going on in the top 500 list was that the workloads don't run that efficiently on current systems. I mean, the real world workloads and looking at how do we build architectures where they're going to be more efficient at running real workloads rather than necessarily just LIMPAC. And that was that general idea. And I think the other thing we're talking about there was whether we would eventually see more custom CPUs dedicated to the HPC market versus the off-the-shelf CPUs, which are really not built for HPC. They're built for running AI workloads and things like that.

Starting point is 00:04:36 So I think there's a few different areas discussed last year, and we can talk a bit about how it looked a year later. When I make predictions about the future, I like to come back a year later and say, well, how did I do? And I'm aiming to do it again next year because some of these things could take a few years to work through. Well, I respect that because so many people who write predictions, nobody, especially themselves, no one's held accountable for it. Well, I'm enjoying not having a corporate PR department telling me I can't say things like this because as a VP at AWS, you could not make predictions. It's not, there's a very dangerous place, but I like to just say, well, this is what I think is going to happen. And then I can, you know, looking at the patterns I've seen over like a

Starting point is 00:05:15 40 year career in the industry. Well, the trick with predictions is to predict stuff that's already happened. That helps, but also you can predict what will happen fairly well, but predicting when is hard. Yes, that's right. also you can predict what will happen fairly well, but predicting when is hard. Yes, that's very true. Sometimes you can predict when, but you can't predict what. The Heisenberg uncertainty principle applies to predictions. It's a when or a what. Pretty good at figuring out what will happen. Sometimes I've had to wait a lot longer than I wanted to for the when. That's right. I once saw a presentation from a futurist who i think was notably self-confident he said i don't predict the future i analyze it also there's alan k's best way to predict the future is to invent it invented that well some of that in the cloud

Starting point is 00:05:57 space there was some of that going on with all the work we did at netflix we were just putting stuff out there well if you don't know when it's going to happen, then that becomes vision, right? If then the two requirements of vision are that it is compelling and it is inevitable. If you get both of them, then you've got the beginning of a good vision. And which I used to jokingly say, it's another word for you're not going to see it anytime soon. So what is your walkaway takeaway from SC? So there were a couple of things. The top 500, there were a couple of updates in the top end of the list that were interesting. One, the HP Cray Frontier system came out last year as the first real exaflop scale system. So that was interesting. And there was a bit more. That system is now delivering real results. It's being run. And that's been a very

Starting point is 00:06:40 successful product. And that's a HP Cray, but AMD based system. And sort of roughly at the same time, an HP Cray, but Intel based system called Aurora has been under development for a few years, but taking a lot longer. And that has now been partially installed and it came in as a second place entrant

Starting point is 00:06:59 with 585 petaflops. And I think that one sort of, you sort of squint at it. In some sense, it's successful because they've got it going. But in other ways, it's really a year or two later than it should have been. And they still seem to be doing quite a lot of work with a big team to get it up to speed. So that's sort of interesting from that point of view. And then the third place was Microsoft Azure. The system's called Eagle. It's Intel CPU, NVIDIA H100 GPUs on InfiniBand. It's sort of their cloud system. I think it's probably the system they use for

Starting point is 00:07:32 running AI workloads for open AI and people like that. But they got 561 petaflops. And I went to the BoF session where they were talking about both the Aurora guy got up and said, I've been working on this and we have this big team and it's all working out and we're fine. And then the Microsoft guy basically said, well, we had a few people and a few days and we just put this together. We started small and we just kept scaling it and it just kept going. And we just stopped when we ran out of hardware that we could find on the standard build of the standard image operating system image. And they didn't have any custom stuff. They weren't fixing bugs. They were just running the benchmark and throwing more hardware at it. And this was a huge contrast in sort of, well, okay, you've got a big team working for years to try and build this

Starting point is 00:08:13 machine and somebody else going, yeah, it took us a few, we spent a few weeks, but it wasn't a massive amount of work for a lot of people. And I think that was, for me, a big interesting contrast. And it shows that there's sort of these tipping points where the availability of a supercomputer exascale system is basically, do you have enough money? And then the system will just appear in front of you in the cloud. And we've got systems now with, at least on HPL, the same kind of performance as the biggest systems. But they are primarily built for AI and they don't have all of the exascale software stack that's been really built around the needs of supercomputer applications. So it's an easier problem to solve. But then also we're seeing AI applications emerging as one of

Starting point is 00:08:56 the use cases for running on the HPC standard, things like Aurora and Frontier. They're running the AI workloads as well. So I think it's as the workloads blend, it's going to just be easier to deploy a much bigger supercomputer. And I'm expecting in future sort of as a prediction, we're going to see more, it was easy to run, let's just put it at put out a number, big numbers in in the top 500 list in the future. There was a social media post that said Microsoft team had three days to do this. And like you said, they could have done more if they had access to more resources and had maybe a little bit more time. And I think like you're saying, it's the power of using standard, already pre-debugged,

Starting point is 00:09:36 as you put it, just cookie cutter your way into happiness. And in my mind, the Microsoft team first keeps getting kudos from me for doing what they're doing, which is great. And they participate because I think other cloud providers in principle can do it too, but they're not playing. But it shows where the floor is. I think it shows that the floor for readily available performance is now easily into the multi-hundred petal blocks. And as you're saying, if you show up with a big enough checkbook, you can just get your way. So that really puts into question what the goal should be for the national centers and government-funded centers. Yeah. What are the implications of that, that you can spin up a capability that's, what, 25 pedoflobs below where Aurora is now? You can

Starting point is 00:10:20 do that in three days? They had a graph showing their scaling, and they had about 85% scaling still going on in terms of like, if you just kept adding resources, you got 85% of it was going to the benchmark bottom line. So they could have, they just needed to find a few more racks and they would have been above Aurora. And the interesting, maybe Google and BlueOS, and I mean, NVIDIA had another system somewhere in the top 10 as well that was basically their own cloud-based system. So I think that they're all going to, if they just take the time in between their big AI runs, they're running an AI training for a month, and then they take three days to run an HPL, and then they go back to running AI training. They just treat that as marketing or something. Maybe we'll see more of those next year? Well, I think a big issue for the HPT community has been risk management, because we've all seen that movie before a few times, is that you rely on a particular vendor

Starting point is 00:11:10 to do something for you, and then their business takes them in a slightly different direction, and suddenly the piece you were really relying on is no longer available. So that really means that you have to have the vigilance to make sure that HPC workloads are covered and fed and technology is moving towards that direction. Today, there's pretty good alignment between HPC and AI, but can that be guaranteed 10 years from now? And if not, then we need to continue to have our fingers in that pie, if not the entire hand. Yeah. Some of this comes down to what are the architectural differences if you're really optimizing for the HPC workloads of finite element analysis and things like that. CFD and all the weather and all this kind of modeling, different workloads and there's different optimizations.

Starting point is 00:11:57 One thing we've seen is the balance of CPU versus GPU may not be the right for if you just take the stuff you're using for your AI workloads. So one of the things we saw there was GigaIO, who have figured out how to get 32 GPUs on a single CPU in a single node. And that was last year. That was their super cluster or super node. This year, they doubled that to 64. So eight is the typical number you see in sort of the cloud instances. It's sort of the cloud instances, sort of the standard number that you get eight GPUs on per CPU, but they're up to 64. And they called it a super duper node, which I found amusing. So it's good marketing to come up with a fun name. So the

Starting point is 00:12:35 64 GPU super duper node, you're putting that on a fabric, you've got something that's really aimed more at something that's going to be much more compute intensive on the float side. And if you can optimize your workload to run in that environment, then that's at least one sort of direction that seems to be interesting. That just strikes me as astounding power within one node. Yeah. And so if you can do it in one node, do it in one node. That's like Amdahl's law, right? You don't want to communicate unless you have to. If you can run it on a single machine, you've got the highest bandwidth. It's all coherent. Everything is running within one box. So that's 64 GPUs on a single box is an interesting architecture. You know, this raises the point of how big your application is and how long it's going to run.

Starting point is 00:13:19 Because if you have a single killer app that's going to run forever, well, then you go and optimize it all the way. But if you're having a workload app that's going to run forever, well, then you go and optimize it all the way. But if you're having a workload or if your application is changing all the time, like it does in the financial services when they tinker with the application on a daily basis. So the application is never stable enough for you to super optimize for it. And if that's the case, then you need an infrastructure that is able to dance with you and that prevents you from going all the way. So when you have these big, quote, imbalanced configurations where you've got like very few CPUs and massive memory or very few CPUs and massive number of GPUs or vice versa,

Starting point is 00:13:57 then the question becomes, how often can I use that particular configuration? And do I have enough capacity on my data center in varying modes of computation for me to compose my way into what I need at any given time? I think all of that becomes a bit of a complexity that essentially sets the tone for your center. Are you a special purpose center? Are you a general purpose center? That sort of a thing. And we've also, there's one of the challenges with GPUs is they're moving so fast that if you buy a whole bunch of them, are they still useful in a year, two years, three years? At what point do you need, if you're depreciating over five years, like people tend to do with their standard machines, a five-year-old GPU is just a waste of time at this point. So do you have to depreciate over two or three years, something like that,

Starting point is 00:14:43 because of the rate of change? So there's some interesting sort of cost of ownership problems with GPUs that you might want to, this is part of the rent versus buy thing, right? Just use the latest one and whenever the new one comes out. That's right. Then you've got the energy costs. Yeah. So the energy cost of running, I mean, the older ones are slower, but they use the same

Starting point is 00:15:01 amount of energy, right? The power consumption is sort of limited by the packaging. So it's a 500 watt chip and it's a 500 watt chip three years later, but you've got whatever, 10, 20 times the performance or something. So it's not effective, cost effective or energy effective to use old models. So there's a couple of things that happened also at the conference. NVIDIA had some announcements around their Grace Hopper combined super chip, I think, the GH. And they also said that they're going to be, a recent announcement, that they're going to be releasing on an annual basis instead of every two years. They're sort of starting up additional engineering teams. So this seems like, hey, we're doing well,

Starting point is 00:15:39 we've got plenty of money, let's go and double down and push even faster because they can see things they can do. So that's an interesting development that's going to push this space forward even faster in the future. I don't know if you've heard about that, Shaheen. Yes. In fact, Carl Freund was our guest a few episodes ago and we talked about, I was jokingly saying that Intel used to have TikTok and NVIDIA seems to be having TikTok. And it is also a little bit of a controversial move because some of the other chip vendors are saying, is that even possible? Is it practical? Given that you have to align your chip design

Starting point is 00:16:12 with fab technology that is coming. So maybe at the end, it becomes like a TikTok again, but it remains to be seen for the moment though. As I said, in my wrap-up episode, there's just so many chips out there that it is impossible for the traditional average data center to know which one to pick. There are literally tens of them.

Starting point is 00:16:31 And now each vendor has three separate, four separate offerings. So figuring out what to use when for what and what to standardize on is really a very difficult task. And I think that's a market opportunity for those who can test drive these and be able to advise customers. If you didn't get a chance to visit the Lenovo booth at SC23, or you just want to see it again, check out their Inside HPC booth video. Visit insidehpc.com slash Lenovo dash at dash SC23 to view the video now. One of the recurring things over decades has been, well, I can build a specialized attached processor or something that will do a better job at that thing. And you have a year or two before the general purpose things effectively catch up because there's a bigger

Starting point is 00:17:21 market for them. So there's this specialization means that you have to stay ahead of the general purpose solution. We've seen that at many generations of that over the years. But I think this is a case right now where there's all of these specialized solutions and that, well, you get a win for a year or two, but then whatever you can buy that's going to be a general purpose thing in huge volume in a year or two is going to be potentially better. And then that leads into kind of the next topic I think we should talk about, which is chiplets and custom CPUs and the ability to sort of think about a next generation PC oriented supercomputing architecture, where instead of a buying processor and GPU, you're basically saying, well, I'm going to do a custom chiplet for the core of

Starting point is 00:18:06 my accelerator GPU thing, maybe vector based or something like Fugaku has. And then I can have my own processor and I can over build the memory subsystem or the IO or whatever I feel like I need. But then I can surround that with standard chiplets for all of the other bits of the system, whether it's high bandwidth memory or IO or whatever. So I can build myself a super large chip that's got all these chiplets glued together. And a year ago, the UCIE consortium was just launching. This year, they've got over 130 companies signed up for it. And I was talking to somebody who actually works at Intel who's driving this, saying they will happily take chips, chiplets from wherever you happen to have bought them. And Intel will build that onto a substrate for you with all the interfaces, meaning that this stuff should all just work.

Starting point is 00:18:56 And I think that's going to be really interesting development in the next year or two. Envisioning a menu that you pick from and you say, I want two of these and three of those. And if you're building a cell phone, maybe you want a radio. And if you're building a supercomputer, you want a couple of 64-bit pieces there, but somehow you can formulate your ideal configuration of all these chiplets. And then you have somebody like Intel or TSMC laid out on the substrate for you, and then you see howI connects them. So one natural question is that if that's the world, and we already see that world with the Apple M1, we already saw it a couple of years ago. You look at your MacBook Air and pretty much everything is on that chip. You get outside

Starting point is 00:19:36 of that chip and you've got the screen and the power supply and the keyboard and that's about it, right? So in a world like that, what a computer what is a system anymore it probably is mostly substantially just that big substrate isn't it well what we've really done here is we've moved from the pcb being the and interconnect there's a limit to how fast you can go over a printed circuit board right and the interconnects right and we're just saying well the substrate gives you higher pin density the pins are much closer together and you can put the substrate gives you higher pin density. The pins are much closer together. And you can put the substrate, there's two different versions of the substrate, but one of them is more active. Effectively, you've got, you're doping the silicon in the substrate itself so that when

Starting point is 00:20:14 the things land on it, there's some active components there, as far as I could see by squinting at some diagrams anyway. But what you're basically doing is saying, I'm going to have more bandwidth because I effectively got more pins and I've got lower drive needs. So it's overall, you're just going to go faster. So just think of it as shrinking circuit boards to something where you're running on the substrate becomes the new circuit board. So architecturally, it's not that different. You've just managed to shrink it together in a way that lets you build a system that's

Starting point is 00:20:43 going to be faster, cheaper, and use less power. Oh, well, better price performance, for sure. I mean, can we use paraphrasing software as eating the world chips or eating the world? I think the world is eating chips is probably more of a phrase that people would resonate with. But I think the key thing here is the standardization means that in the same way as you can build a circuit board, that's how you integrated parts from different suppliers. And each chip was from one supplier. We're now moving to this world where you can integrate chiplets from lots of suppliers

Starting point is 00:21:11 onto one substrate. And that becomes the new way that I think things are going to be built. Well, and actually, as you're saying, the real big shift is going to be in the supply chain because these substrates are being manufactured by fab companies, not by PCB companies. As we have said in other conversations, you're going from the motherboard to the mother chip. And then you put it all on a water-cooled plate to keep the thing from melting. So the other thing is the stacking in 3D as well. So you can get things like memory chips, you can stack them on top of each other,

Starting point is 00:21:42 and then you have the problem that the ones in the middle melt because they're surrounded, right? So how do you get the heat out? So there's a bunch of issues around cooling, but it isn't just a flat substrate with chips laid out on it. There's also some vertical stacking capability. And people are laying out these chips and then dropping memory SRAM or HPM chips. They're scattering them where they need to be. So there's something interesting because when you start saying I can construct machine in three dimensions, your path lengths start shrinking and you can go faster and everything gets easier to do.

Starting point is 00:22:14 There was some talk at the conference that maybe the biggest topic or a very widely focused on topic was cooling. Is that something you're, in reference to your last comment, is that something you're following, liquid cooling technologies and techniques? It's not something I've been following that much, but it's definitely a plumbing conference. There are whole sections of the expo, which is pipes and plumbing pumps and things. But I think that certainly as you build more and more dense machines, the thing is, how do you cool it? And if you look at a typical data center and you say, what is your power budget per rack? And these systems are getting to the point where you can't deliver the rack because you have to have an empty space around it in all directions because that

Starting point is 00:22:56 rack is just going to get too hot. So I think we're seeing next generation data centers include either liquid cooling or having sort of extension racks, which are the liquid to air to interface, which take up a large amount of rack space just dealing with the cooling issues. So I think that that seems to be the trend, but I didn't follow any specific new launches at the event. No, there was a lot, as you said, Adrian. And for those of us who walk in a hardware store and look at pipes, you've never seen such clean, shiny pipes that exist and properly color-coded and they snap fit just so. And then there is a whole supply chain of that sort of plumbing from copper-based fitters and joints

Starting point is 00:23:40 all the way to massive refrigeration units. So there were several booths on the exhibit floor and several talks that talk about this because guess what? Liquid cooling is coming and there's really no other way to get around some of this until quantum computing shows up or something. The power is going to be dense and it needs to be taken out, like you said. Let's talk about quantum for a bit because I think that's a natural segue. My mental image of a quantum computer before I went this year was this big plumbing thing with lots of supercomputing stuff and pumps and vacuums. And somebody said, yeah, the IBM pictures look like they call them chandeliers and just big golden pipes in all directions. And I was thinking, you're never

Starting point is 00:24:22 going to have that in a standard enterprise data center, right? That's a super specialized thing. And then we went along and we're looking at one of the vendors there and says, no, it's just a normal rack and it isn't supercomputing and everything just works and you just stick it in your data center and access it, just put the right kind of workloads on it. So maybe Shaheen, you're a bit deeper into this space. You can sort of summarize a bit more about what we saw there. Yeah, I think we probably need to come back to this topic and do a deeper dive. We had Bob Sorensen as a guest a few episodes ago to take us through the market forecast that Hyperion has done and SC23 had a quantum village. So the modalities of quantum computing are still out there. We haven't got to a transistor moment yet.

Starting point is 00:25:05 As I like to joke, we haven't even got to a Betamax VHS moment. There are like four or five different quote modalities or approaches that are hovering around. But there's definitely a lot of progress in room temperature photonics and neutral atom modalities. Now, some of these room temperature approaches still need refrigeration somewhere. They may or may not. So that's the whole system is more of what you want to look at. But fundamentally, quantum computing promises huge energy savings. It goes literally from hundreds and thousands of

Starting point is 00:25:37 watts to a few single digit sort of watts. So that benefit is really what is going to drive it in my mind. Yeah. So they were talking about having, I forget what, tweezers that you'd use to move an individual atom. Laser tweezers. Laser tweezers. So you basically pick an individual atom, put it next to another atom, zap it with a laser so they become entangled, and then push those entangled atoms off somewhere else and they have an array of these atoms. I have a physics degree from a very long time ago, and that sounds sort of plausible, but exactly. The engineering... No, they're doing it, yeah.

Starting point is 00:26:11 I mean, it's just a bit mind-boggling that you actually had to push around individual atoms with lasers. Sounds, yeah, well, I'm glad someone's figured out how to do that, but it sounds very cool. But it seemed like they're getting larger and larger arrays, and they're available from cloud providers, and go are actually starting to use these to try things out. So I think that's what was the vendor we were talking to that got the demo? We talked to Quera who do a lot of really fun work and it's just a delight to go delve in into all of them really. Quandela was there, Quix was there. The market is probably 20, 30 vendors that are building various aspects or systems, and they're all very interesting.

Starting point is 00:26:51 So I expect that we will have examples of actual applications running, but they may be set up, the machine may be set up just to do one application really well as the research moves from the lab towards a real product. Yeah, there was a report on the market which said that for the next five years at least, the amount of money being spent on R&D is vastly more than the amount of revenue in the space from selling machines. So it's developing and it's growing, but there's a point maybe five, ten years into the future where you actually are getting some payback on the R&D. But at the moment, it's a good way to sink a few billion dollars into

Starting point is 00:27:31 things that you're not going to get payback from for a long time. Well, the promise is so intoxicating that if you're a government or if you're a major corporation, you kind of can't afford not to play. And that's encouraging because not too long ago, it was 10 to 15 years away. If we're now saying five years away, that's progress, right? Yeah, I think it was late 2020s when they were, I vaguely remember saying, but yeah, it was certainly for quite a while and far enough that your predictions probably are wrong, wildly wrong into the future.

Starting point is 00:28:00 But the trend was certainly increasing R&D spend and increasing revenue. It's just that the R&D spend is higher than the revenue at the future. But the trend was certainly increasing R&D spend and increasing revenue. It's just that the R&D spend is higher than the revenue at the moment. Now, if you're a government, you also think about that, even if I don't get a quantum computer out of this, I'm going to get a whole bunch of really advanced technology that's going to help in other as yet unforeseen ways. So you think it's a good spend of money because you're fundamentally investing into advanced technologies that will be useful for the future. So you kind of hedge your risk a little bit down that. Yeah. There's some kinds of optimization problems. When they were talking about using this finance, using it for optimization, scheduling, transport,

Starting point is 00:28:39 there's applications that are not just crazy science things, but things that look like they are real world useful algorithms that people are trying out. Now, maybe we can conclude with another view of the interconnect. Because one of our walkaways last year was that with UCI coming for chiplets and then PCIe and CXL coming strong. And then soon after there was the Ultra Ethernet Alliance. And you pointed to some of the optimizations that cloud providers had done to traditional Ethernet with low control and such. We sort of walked away with a view that the interconnect hierarchy in the data center was going to develop in a particular way.

Starting point is 00:29:20 Do you still see it that way? And how would you describe it now? Yeah, I think it's the sort of, it looked last year as if CXL was emerging as the way most people were looking at the future. I think from this year, it's sort of the only game in town. So it's definitely a stronger pitch. Although last year, CXL 3.0 had just come out, which specified a fabric standard. Everyone was very keen on that and thought that a couple of years, we'll have a fabric. This year, actually, during the conference, CXL 3.1 came out. So they had to revise the spec.

Starting point is 00:29:53 And they said, OK, the previous attempt at getting fabrics wasn't quite right. They got some feedback. They added some more capabilities to the fabric, some related to trusted computing and security. Because if you're doing multi-tenant on a fabric, you've got to be very careful about who can see what on the fabric. And then there was some more around fabric management. So then if you're trying to design chips, you need the standard to have stopped moving, really. So that means people developing fabric management into chipsets, the spec just landed. So it's going to be another two or three years before those come out. So I think the sort of net for CXL was that the basic CXL capabilities are on

Starting point is 00:30:31 a good track, lots of good support. We'll start seeing those more and more, but the more advanced fabric management things are a little bit further out than we thought they were last year. But hopefully this reset, this updated 3.1 spec is going to be the right thing to go build on. So the thing about fabric management is if you have a bank of CXL memory that you're sharing across a whole load of nodes, you really, right now, you can share the memory capacity and allocate it to different nodes. So if one of your nodes needs a few terabytes of memory, say you have a few terabytes of CXL memory, another one needs less, you can sort of play around with the capacity.

Starting point is 00:31:07 But if you want to share that memory dynamically between two machines, then you need a much more fabric management. You've got, you're using it as a shared memory across nodes. And that's really where we need the fabric management to make it more dynamic and to have the ability to control what those nodes see and the coherency across them.

Starting point is 00:31:25 So I think that's the ultimate sort of really interesting sort of use case. But in the meantime, we're starting to see test systems and products and things like that coming up that are looking at CXL coming out and all these different chipsets as well. Now, Adrian, I know you saw a demonstration at SC of PCI 6.0 with some pretty amazing performance numbers. Could you talk about that a little bit? Yeah, most of the systems that are actually shipping right now are PCI 5. And that runs, I think it's 32 gigabytes per second per channel, per 16 lane channel. And then when you get to PCI 6, they basically, they changed the signaling to have four levels on the

Starting point is 00:32:05 wire instead of two. So there were typical digital signals on off. What they do for PCI-6 was they go to basically 0123 and they're switching that. And then they announced PCI-7 at earlier this year, and they were talking about that at the event. So PCI-7 takes the PCI-6 thing and just doubles the clock rate on everything. So they basically figured out how to make it all to generate and detect these four level signals quickly enough that they can actually run everything at twice the data rate. So PCI 7 is going to come out in quite a few years out. But what they're trying to do is they release a new version of PCI every few years that's twice as fast as the previous version. And that's kind of the path

Starting point is 00:32:45 they're on. So what we were seeing at the show was there are processors available now like the Intel Sapphire Rapids fourth generation Xeon that has four or kind of five. It's got 80 pins, which is technically five. But basically people seem to be talking about having four either PCI 5 or CXL 1.1 16 lane channels. So people were doing a bunch of demos with that. The fifth generation Emerald Rapids, Granite Rapids CPUs coming out, I think next year will support CXL 2.0 on PCI-5. And the NVIDIA Grace CPU is also stated to support CXL 2.0, but NVIDIA hasn't been talking that much about CXL. I think it's part of the NVIDIA chip-to-chip interconnects, but they are pushing their own NVLink sort of right now rather than

Starting point is 00:33:30 confusing people by talking about CXL as well. So then we need to get to some more next generation CPUs that actually have PCI 6.0, because the CPU is managing the coherence of this. And so if you don't have a CPU that knows how to talk this new version of CXL, then you're not going to get the performance. It's not going to make sense. But you end up with just keep doubling up. So it's 256 gigabytes per second on a PCI-7 connector. So 128 on PCI-6, and I guess 64 on PCI-5 on each connector. And then each connector runs in one direction. So you have another connector coming in the other direction, two wires to basically go in each direction point to point. And that's how it's managed. And you can go a meter or two, sort of roughly within a rack with this. Anyway, so that's kind of what's been going. There was plenty of demos. There are people like

Starting point is 00:34:19 Micron doing shared memory modules that are CXL-based and CXL switching. So CXL 2.0 is really where I think this becomes useful. And 3.0, as well as supporting Fabric, also supports multi-level switches. You can start cascading switches together and build something that's a bit more complex. Okay, Adrian, great to be with you again. Thanks so much for joining us. And I want to tell all of our listeners that Adrian's article, which will touch on a lot of the topics we got into today, will be appearing on the InsideHPC

Starting point is 00:34:51 site if it's not already on our site. So with that, thanks so much. Yeah, thanks for having us. All right. Thank you all. Yeah. Happy Thanksgiving, everybody. And see you next time. Cheers. Thanks. That's it for this episode of the at hpc podcast every episode is featured on insidehpc.com and posted on orionx.net use the comment section or tweet us with any questions or to propose topics of discussion if you like the show rate and review it on apple podcasts or wherever you listen the at hpc podcast is a production of OrionX in association with Inside HPC.

Starting point is 00:35:26 Thank you for listening.

@HPC Podcast Archives - OrionX.net - @HPCpodcast-77: Adrian Cockcroft on Future Architectures

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.