@HPC Podcast Archives - OrionX.net - @HPCpodcast-74: Karl Freund, AI Chips

Episode Date: November 8, 2023

Karl Freund, founder and principal analyst at Cambrian-AI Research joins us to discuss the, well, "Cambrian explosion" that we are witnessing in AI chips, the general state of the AI semiconductor ma...rket, and the competitive landscape in deep learning, inference, and software infrastructure in support of AI. [audio mp3="https://orionx.net/wp-content/uploads/2023/11/074@HPCpodcast_Karl-Freund_AI-Chips_20231107.mp3"][/audio] The post @HPCpodcast-74: Karl Freund, AI Chips appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Are you attending SC23 in Denver, Colorado? Lenovo would like to count on you to visit booth 601 on the show floor at the Colorado Convention Center, November 13th through 16th, 2023. You can also visit lenovo.com slash HPC to learn more about Lenovo's HPC solutions. Suddenly there's this massive pile of money on the table. Everybody's running for that pile of money as fast as they can. And we're not talking millions, we're not talking hundreds of millions, we're talking billions or tens of billions of dollars of semiconductor revenue. Forget the system side, forget all the software that's being sold. Just semiconductors alone are going to be in excess of $10 billion.
Starting point is 00:00:44 Definitely attracts a crowd. If I were at Intel, I'd be pounding my fist on the table saying, screw HPC. No offense. The bigger market's going to be AI. So you've got to nail AI. If you can also do a good job at HPC, wonderful. But these chips that have massive 64-bit float
Starting point is 00:01:02 are just going to be get-ins. So my feeling is that even if you're going to focus on AI alone, it behooves you to really recognize that HPC is the discipline underneath it. And while you may not have to go all out on 64-bit, you also cannot look like you're abandoning HPC. From OrionX in association with InsideHPC, this is the At HPC podcast. Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications, markets, and policies that shape them. Thank you for being with us. Hey, everybody. I'm Doug Black. Shaheen, great to be with you again. Great to be here. And we're going to talk about something that we rarely talk about and that's AI. I'm just joking. Yeah, we have with us, really excited to be talking with Carl Freund. He is the founder and principal analyst AI chips, GPUs that have direct bearing on this whole craze
Starting point is 00:02:08 centered on generative AI, large language models. Carl, welcome. Thank you very much, Doug. Pleasure to be here. Yeah. And just to start, I mean, we know you've been in the industry for a long time. You were saying you've been around and seen a lot of architectures come and go, but fill us in a little bit. That's a nice way of saying I'm old. That's a nice way of saying I'm really old. Shaheen and I used to work together at Cray. That's right.
Starting point is 00:02:30 Back in the day, right, Shaheen? That's right. Back in the days before we sold to SGI. Boy, those were heady days. Yeah, I've worked for Hewlett Packard. I've worked for Cray. I've worked for IBM for 10 years. I worked for AMD and did a couple of startups, one of which your audience may be familiar with, Calzada.
Starting point is 00:02:47 We were building an arm-based SOC for the data center. And for the last six or seven years, I've just been an analyst focusing almost exclusively on HPC and AI and primarily on the hardware that it takes to run AI efficiently. So you can check out my work on camrian-ai.com and see all the articles I've posted. But mostly I just try to help semiconductor companies articulate their strategy and story and help amplify their story or criticize it if need be. Right on, right on. I love the name of your company, Cambrian. Of course, that's like the explosion we're observing.
Starting point is 00:03:23 So right there, we have a very fertile ground to have a lot of good discussion here. That's for sure. Yeah, when I first decided to go out on my own, I thought, well, what should I name this company? And it's just an explosion of AI. Jensen Wong coined the term Cambrian explosion of AI when he's actually referring to AI models, not semiconductors. But I the side, it was equally applicable to semiconductors all trying to compete with Jensen. That's the name of the company. Sure is. Sure is. Maybe that's the place to start. Yeah. Well, it's funny. For six or seven years now, venture capitalists have been presented with slide decks from startups that they all read the same. There's really
Starting point is 00:04:02 some minor differences in memory compute versus at memory compute. Like that's not a difference, but they're all saying the same thing, which is the hypothesis of, hey, you know, NVIDIA just got lucky. GPs are really good at parallel processing of matrix operations, but it's got a bunch of other stuff on there you don't really need for AI. So we'll just build, ship this just good for AI. And we've seen dozens of companies go after NVIDIA with that kind of approach and haven't seen many succeed. A few exceptions. I think Cerebras had a recent large win that's worth about a hundred million dollars. That's good. But there's not many other ones out there that are seen as even being competitive.
Starting point is 00:04:41 It's starting to change. Obviously H100 is at the king of the hill right now. So along comes Gaudi 2 and says, hey, we're almost as good as an H100. And Gaudi 3 is coming around the corner. AMD says, well, we're almost as good as an H100. And MI300 is just around the corner. And meanwhile, NVIDIA is saying, yeah, wait till you see what I can show you next spring. So everything's just around the corner, right? So it's kind of hard to assess competitive position of products that haven't been launched yet.
Starting point is 00:05:13 There's definitely stealing this, I think, from Andrew Feldman. Suddenly, there's eight years of AI research and products and a reasonable amount of revenues measured in the hundreds of millions of dollars. Suddenly, there's a massive pile of money on the table. Everybody's running for that pile of money as fast as they can. And we're not talking millions, we're not talking hundreds of millions, we're talking billions or tens of billions of dollars of semiconductor revenue. Forget the system side, forget all the software that's being sold, just semiconductors alone are going to be in excess of $10 billion. Definitely attracts a crowd. It sure does. I think IDC had a report this past week that they expect just the generative AI part of it to be like $143 billion or something in four years with a CAGR of 73.3%, which is just
Starting point is 00:05:59 astounding. It's mind boggling. It just indicates that it's definitely a frenzy and people are stampeding towards it right and you know what because of that you know people like amd and intel with with gaudy and startups like cerebrus and others say hey you know what if i could just get two to five percent in the market my investors would be happy it's easier said than done yeah but it never works out that way, does it? 2% usually doesn't. Yeah, it's not a sustainable position. It's not sustainable.
Starting point is 00:06:30 Tell us, you know, let's start with the H100. We constantly hear that NVIDIA is commanding very high prices and lead times are so long. I mean, what's going to break that logjam? What's going to enable a more plentiful supply of these advanced GPUs? Well, time. I mean, time will enable more wafer starts and will enable more cost-loss facilities to do the bonding required for multi-die packages, right? I mean, those are the primary limitations. If you solve one without solving the other, it's not going to do any good. Unfortunately Unfortunately for the industry, everybody's using that same technology.
Starting point is 00:07:05 Everybody's trying to do, you know, 3D stacking of HBM or HBM3 onto their ASIC or GPU, whether it's Gaudi or MI300 or NVIDIA. The exception would be Cerebrus. Cerebrus doesn't use HBM. And so they seem to be unconstrained supply. And that may be one of the reasons why their customers in the
Starting point is 00:07:25 United Arab Emirates decided to go with a supercomputer of Cerebers instead of waiting in line. So I think the only thing that's going to solve it, honestly, Doug, is just time. Time to get more supply. Demand's certainly not abating. There's other technologies on the horizon, but they too will have supply constraints. Now, you could say that a silver lining of the US government's restrictions on high performance technology to China will actually create supply. It's now available for Western countries because it's not going to ship to China. That's kind of a strange way to look at it, but it's probably true. It's probably true. So Carl, as you know, because we've talked about this in the past, we did this Epic AI survey like five years ago.
Starting point is 00:08:06 And at that time, we could count like 27 different chips or projects around the world focused on AI, including the folks that we're talking about. And then I met with a friend of mine who is an executive, used to be semiconductor companies. And he said, you're probably off by a factor of three. That in reality, there's probably like 100. And then recently, I heard that even that number may be too conservative. So just how many projects are going on around the world focused on here's a specialized AI chip that's just going to kill it for my app. And therefore, there's no need for any other chip. And my app is a sufficiently big killer app that's going to give me the volume to do it.
Starting point is 00:08:50 You know, I think separate the market into two big chunks, right? There's training almost exclusively done in very large data centers. And then there's inference processing, which can either be done in data centers, or in cloud or enterprise data center or at the edge. And I think if there are hundreds of companies building AI chips now, it's a function of two things. First of all, the opportunity edge is massive and it's highly differentiated. So you could have a solution that's good for image processing, that's not good for audio processing,
Starting point is 00:09:21 that's not good for text processing and natural language processing. So there's lots of different combinations of power performance and area that can target a different segment of edge inference processing. Now, most of the startups that I work with, they've all kind of done a student body left away from data center training, and they're either focusing on data center inference or edge inference, predominantly edge inference. There's no 800-pound gorilla in edge inference. Maybe Jetson's an 800-pound gorilla, maybe not. There's more opportunity and lower barriers of entry.
Starting point is 00:09:54 The software stacks required are much, much easier to amass. You know, you basically need to run a handful of good models and do them very efficiently and take advantage of every trick you got in the book to make your chip sing and dance. Now, I said there's two things driving it. First of all, is that market opportunity on the edge. Second is chiplets. And so you no longer have to build a large, monolithic, expensive, multi-hundred- dollar project to enter the market. You can go to somebody like Tenstorrent or Sci-5 and buy IP for things like RISC-V cores or from Tenstorrent, you can buy their TenSix core accelerator, and then you just provide the glue and hopefully some sort of secret
Starting point is 00:10:39 sauce that will turn that into a Blockbuster chip. Now, that's the story. We haven't seen a lot of blockbusters chips with enough secret sauce to attract people to them yet. But I think in the edge AI inference, especially for large language models, we're in the very early stage here. We're in the first inning of that market. People are still trying to figure out what you can use a large language model in the edge for. There's got to be something, right? They're all trying to find that something that allow them to do large language models or smaller large language models, I should say, you know, 10 billion parameters or something like that and run them on the edge on their chips. Will they be successful?
Starting point is 00:11:18 History would say no. I mean, the history of all the companies trying to compete with NVIDIA is littered with hundreds of millions or billions of dollars of venture capital that basically went up in smoke. I mean, some companies have had to significantly write down their holdings in AI startups because, you know, suddenly after five years, they have total revenue of $5 million. Total revenue of $5 million. You're kidding. That's the sad truth is they can generate a lot of PowerPoint, but they can't seem to generate a lot of revenue. So that could be a reason why they're gravitating to edge the IoT end of the world, because that's more fragmented and uncontested. The problem is that, as I like to joke, everything is a thing. And so IoT doesn't
Starting point is 00:12:01 really lend itself to a clean consolidation. Everything is like so specialized that you may use the volume to make it sustainable, right? Exactly right, Shaheen. It's kind of like FPGAs, right? FPGAs, they don't have any large market outside of Microsoft. They have hundreds of very, very small markets. There's not a killer app in spite of a lot of people who attempt to create one out of FPJs. And I think similar situation will unfold for Edge AI. There's a lot of interesting use cases, but fact of the matter is, Qualcomm Snapdragon probably has the best AI on the Edge right now. Most people use
Starting point is 00:12:35 it without even knowing it, which is fine, right? You take a picture with an Android phone today, you don't know that you're using AI. You just know it produces really good pictures in the dark. And of course, that's all AI. I think the best AI is perhaps hidden. Right. Okay. That's a really good point. Sticking with sort of the major, the three big GPU vendors, Intel, AMD, and NVIDIA. Let's start with Intel. Shaheen and I were talking about this episode coming up. We frankly aren't sure. We have Ponte Vecchio and we have Gaudi 2. And there have been other acquisitions that Intel has made and spun back out like Mobileye
Starting point is 00:13:11 and a couple of others. So they basically ended up having a lot of choices. What do you see them do? Well, it's not clear to me. Obviously with the convergence to Falcon Shores, the convergence of GPU and the Gaudaudy architecture they have not articulated what that means gaudy 3 is probably going to be a pretty amazing chip quite frankly the question will be is it a dead end and if it is nobody's going to buy if they can show a path from there to the converged product line of gpu and habana labs then they've got a shot they can say look here's your path start with gaudy 2 then go to Gaudi 3, and then you go to what they should have called Gaudi 4, because I think Gaudi's got a lot more going for it right now than Ponte
Starting point is 00:13:52 Vecchio. Ponte Vecchio's got great 64-bit float, which is perfect for national labs and other high performance computing centers. But for AI, it's totally useless for that format. And that's what you're spending probably 30% of your die area on 64-bit float. That die area could have been used for 8-bit enter, 16-bit float, or something more useful for AI. And I think that's where they're headed. I really do. I'm not sure, but I suspect, given the interest in Gaudi 2 and the more concentrated messaging around Gaudi 2 from
Starting point is 00:14:23 Intel in the last two months, I suspect that they're going to make this try to look like Gaudi 4. It also has 64-bit floating. I don't know how you fit that all into a die, but that's their challenge. They've got two architectures. They're both good. Neither are good enough to give a knockout blow to NVIDIA. So maybe if they combine them, in theory, they can have enough weight behind that punch and they can make a dent.
Starting point is 00:14:44 But it's going to be 2025 before we see it. So we'll see if Gaudi 3 can save the day. If so, I think they're going to have to paint a clear roadmap because it's all about software. And if I have to report, if I have to retune, then why would I spend the time and effort on Gaudi 3? In spite of the fact that I think Gaudi 3 is going to be a pretty amazing chip. Yeah. And Gaudi 2 is getting good grades from people who are using it. So I think what you're saying is very plausible path and therefore maybe exactly what they should do if they're not. Yeah. Right. It sounds like an amalgamation of these various efforts kind of coagulating.
Starting point is 00:15:19 Yeah, it does. Honestly, given the growth rates and the numbers that Shaheen just shared, but this is about five minutes ago, if I were were at intel i'd be pounding my fist on the table saying screw hpc no offense the bigger market's gonna be ai so you gotta nail ai if you can also do a good job at hpc wonderful but you know these chips that have massive 64-bit float are just it's just gonna be dead ends right so let me have a slightly different perspective not completely misaligned i think that like look at ibm they went after ai alone they sort of look like they're abandoned hpc and six years later they have neither you know power 10 is a great chip really should do a lot better than it is doing but somehow really hasn't done that so contrast that with amd which has high performance computing as the corporate mantra should do a lot better than it is doing, but somehow really hasn't done that. So contrast
Starting point is 00:16:05 that with AMD, which has high performance computing as the corporate mantra, and they are thriving. So my feeling is that even if you're going to focus on AI alone, it behooves you to really recognize that HPC is the discipline underneath it. And while you may not have to like go all out on 64 bit, you also cannot look like like go all out on 64-bit, you also cannot look like you're abandoning HPC. Yeah, it's got to be, I think we're in alignment there, Gene. If you look at what the MI250 did, it's got too much 64-bit float to build a good AI chip. You could do okay AI, but you're not going to do great AI, mostly because of the lack of support for low precision math. So there's got to be a balance, right? And the balance is probably a little more focused on AI, a little less focused
Starting point is 00:16:50 if you're AMD on 64-bit float. We'll find out when MI300 is actually announced. It's been teased so much. I feel like I'm in Las Vegas. Right. Exactly. But it sounds like a good chip. It really does. Now you could argue maybe they went overboard with too many dies because the secret problem that nobody's talking about with chiplet architectures is that you have to dedicate some die area for the chip-to-chip communication. That takes up space that could be used for SRAM or ALUs. And so you're going to need to take a performance hit, but you could get some cost
Starting point is 00:17:25 savings. So I don't know. If you look at MI300, the biggest question in my mind is, will it have a transformer engine? If you believe what NVIDIA is saying, and I tend to, the transformer engine is going to give you two to three X performance improvements over the same chip without transformer engine. So if what they have is that same chip without transformer engine, they're going to be half the performance of, let's say, maybe not H100, but HNext 100, which will come out contemporaneously with the MI300, right? And then you say, well, now let's talk about HBM. Well, the MI300 seems to have a higher capacity and bandwidth of HBM. Three, NVIDIA has addressed that with their Grace Hopper version with HBM3e, but we'll have to see where the benchmarks land.
Starting point is 00:18:10 Speaking of which, I would be shocked if AMD released public benchmarks. Unfortunately, they've never stood up any benchmarks for MLPerf, and I don't expect them to start. Yeah, a couple of comments I wanted to make. One is on 64-bit support. In total agreement with you, there was that Chinese chip, Baron, was it? That didn't even have 64-bit. And God knows if it can be manufactured now that they might not get allocation from TSMC.
Starting point is 00:18:36 But it was an indication of, in fact, I was contemplating an article saying the end of 64-bit computing sort of thing. On the other hand, HPC people are using lower precision to do HPC, and they're using AI to do HPC. So I think that is kind of a synergy between HPC and AI because HPC enables it, but also takes advantage of it. And that's kind of an interesting thing, right? I agree. I was going to mention that I believe the fastest simulation you can run is the one you don't run, it's the one you estimate. Okay. So instead of going through the effort of actually running a full simulation, I can take all the runs I've done in the past 10 years, I can use those to train a neural network model, and I can estimate what a different set of starting conditions would produce
Starting point is 00:19:25 if I ran a simulation. And the results have been astounding. Exactly. So now you do the last mile for real. Yeah, and you do the last mile for real. Exactly. Smarter revolutionizes HPC. If you're attending SC23 in Denver at the Colorado Convention Center,
Starting point is 00:19:44 stop by booth 601 from November 13th through 16th to learn how. Lenovo is hosting a number of booth sessions covering the latest industry topics, including sustainability, generative AI, genomics, weather, storage, hybrid cloud, and more. You'll also find interactive demos featuring an AI avatar, digital twins, HPC cluster management software, and Neptune liquid cooling. Be sure to visit booth 601 and visit lenovo.com slash HPC to learn more. By the way, Carl, I did read your article in June in Forbes about the MI300. And as you just mentioned, the lack of a transformer engine, I guess is that event in June in San Francisco that AMD held. They did announce a partnership with Hugging Face around a transformer. So I don't know if you saw that announcement.
Starting point is 00:20:37 Yeah, I did. I did. And I think it's interesting. I had a conversation with an executive from AMD a couple months ago. I said, look, I think you've got a great chip coming, but the world thinks you don't have software. How do you respond? And he said, go to Hugging Face and see how many models are already available and optimized to run on the MI250. And it's pretty darn impressive. You know, hundreds of models.
Starting point is 00:20:58 And so the CUDA moat is kind of looking shallow right now, I think, between OpenTriton and PyTorch2. Do you really need CUDA? Well, you do if you're going to run, you know, weather simulation codes or NASTRAN. Sure, you need it. But do you need it to do large language models? The answer is no, you don't.
Starting point is 00:21:18 But it will make it faster if you use it. The same is true for AMD's ROKM software, which is optimized blast libraries, right, to accelerate linear algebra and other important algorithms. You can do a Triton port and run it on an MI250 or soon an MI300, and you'll get good performance. And if you plug in Rockham software libraries underneath it, you should get better performance. So it really gives a lot of flexibility to the development community to port quickly and easily, and then do the tuning with either CUDA or Rockham or one API with Intel. Although I should point out one API is not
Starting point is 00:21:58 available yet on Gaudi. So not on Gaudi. That's right. Now, if anybody can execute, NVIDIA can. But there's also like a qualitative shift when you have no competition versus when you have some competition. And I think that phase is going to be a little bit different for them, isn't it? Well, I was thinking the same thing. And then, you know, a couple of weeks ago, NVIDIA did an investment tour and they were showing a new roadmap that I had never seen before, which doubles the cadence of GPU releases. I mean, really? You can go to a one-year cadence of GPU technology, and I think with the advancements you're going to see in HPM, the advancements you're going to see in chiplet interconnect technology, and two or three nanometer fabrication technology. I would bet on the company that's got a yearly cadence of new technology coming out to take advantage of this new underlying enabling tech. Is that like Intel's old TikTok or is that a TickTick?
Starting point is 00:22:56 I don't know. They haven't disclosed much about it, so I guess it's a TickTick. But this rapid cadence, it sounds like you're getting the sense NVIDIA is kind of stealing a march on everybody. Yeah, I think what happened is kind of speculate a little bit here. NVIDIA saw, looked in the rear view mirror and saw a Gaudi 3 coming. They saw MI300 and the interest it's getting at places, it's attracting in places like OpenAI and elsewhere. And they said, well, we better do something. And Jensen probably looked around and
Starting point is 00:23:24 said, well, I have more money than God. If you want to start up a whole other engineering team and do two chips instead of one, let's go for it. That's right. Have money, we'll spend. Exactly. Now, another question for you. So the lead times for H100s, A100s, and of course, they've done the L40s to alleviate some of that pressure. And like you said, they're trying to get more allocation and all of that in time is going to work. But for the moment, if you're not like a big time famous customer, you may have to wait quite a while to get your hands on something like this. And I feel like that is probably strategically not a good thing for NVIDIA, because it's causing some people to look
Starting point is 00:24:05 at alternatives when otherwise they would not have. So I think that's an opportunity for AMD and Intel for sure. But also for the next tier after that, Samba Nova, Graphcore, Grok, Onteder, and I'm sure I'm missing a few others, Cerberus, of course, as you mentioned, and then perhaps even some tier behind them. Do you see that? Is that like something that they should worry about? Again, you have to segment the market. If you're talking about training, I don't think they have to worry about it. I really don't. That's a good point. If you're talking about inference processing, let's say, let's take data center inference processing. It's a huge market, right? I mean, it takes 16 H100s to answer one chat GPT query.
Starting point is 00:24:46 That's not sustainable. Something's got to change. Something's got to give. And so I think what AMD's considering now is really focusing on that inference opportunity where they've got an advantage over NVIDIA, not parity, but potentially an advantage with their high bandwidth memory that could give them some breathing room. Yeah, I think that because of the lack of availability, because of the expense,
Starting point is 00:25:12 and now because potentially poorer performance, let's put it this way, fewer, you can get the job done with fewer MI300s and H100s. That's going to save you a lot of money. That's going to significantly drop your TCO. And that's why NVIDIA said, oh, hey, we got the Grace Hopper with an equivalent memory capacity as well.
Starting point is 00:25:29 I thought, well, why did you do that? Why did you make that just Grace Hopper and not, let's say, a multi-die Hopper, right? You can put two Hoppers on a super chip as well as a Grace and a Hopper. Why did you do that? I don't know. It's all about memory.
Starting point is 00:25:44 In fact, I think part of the reason people use multiple GPUs is because they need the memory, a grace and a hopper it's memory isn't that i don't know it's all about memory in fact i think part of the reason people use multiple gpus is because they need the memory not the compute and if you're on grace hopper you can get access to the memory that's on the cpu so it's a lower cost way to get access to memory isn't that probably what it is right well i think we'll see i mean unfortunately the war in israel preempted Jensen's keynote at his AI day in Israel, right? And I was anxious to hear what a system looks like. Because if you think about it, system design, if you're HPE or Dell or Lenovo, Supermicro, Penguin, all these guys have got to rethink system design. Because all of a sudden, for an AI supercomputer, there's no DRAM.
Starting point is 00:26:26 That's right. That's right. Whoa, there's no DRAM? Yeah, there's no piece. And potentially with chips like Infabrica's chip, which I've written about in Forbes, if you haven't seen it, I'd encourage you to take a look at it, audience, but that's going to completely change the backend away from PCIe and give you something that's got a Bluefield chip on it. The DPU kind of a thing? Yeah. It's got DPU, it's got a Grace and a Hopper, and then it's got sort of an outbound chip that's aggregating all of the access to remote nodes and obviating the need for PCIe, obviating the need
Starting point is 00:26:59 for standalone NICs to talk to the network. Whoa, that's going to be a very different system design than anything we've ever seen before. And when I think about what's going to happen in supercomputing in a month, that's what I'm looking for. I'm looking for, show me some examples of what one of these things look like. Yeah, me too. I agree. I put something on LinkedIn as a teaser a few weeks ago that a massive change in system design is coming. And it's going to be three, four years before it pans out. But once it's done, oh man, it's going to be three, four years before it pans out. But once it's done, oh man, it's going to be very different. Very different.
Starting point is 00:27:29 Very different. If that happens, who will be the losers? That's a good question. If you're on the edge, you're not using HPM anyway, so you don't care. If you're on the edge, you're using some kind of low power DIMM, LPDDR5 and stuff like that. You know, it's the people going after data center where, and you mentioned it, it's all about the memory bandwidth, right? I mean, answering a query on chat GPT, the GPU is probably less than 20% utilized.
Starting point is 00:27:56 And I've heard much less than 20% utilized. So if you can get more memory capacity and bandwidth, primarily capacity per GPU, you could use far fewer GPUs and dramatically lower the cost of inference processing. So who are the losers? The losers are the guys who, when you do that, no longer have the GPU headroom. So if you're running along and you're doing just fine because your memory capacity limited, and suddenly if you're not memory capacity limited, and there's a lot of technologies we can dive into here that could get you there. And all of a sudden, you're going to be GPU limited, and then you lose. Right. And I'm glad you have some view of what the utilization is, because for years, I've been asking and not really getting an answer. I sort of was joking that
Starting point is 00:28:38 because their multi instance GPU, their make capability started out with like seven partitions. And I think it's like 10 now is it i don't know to me that sort of indicated that oh it's probably between 10 and 15 that's right if mig can help you out that means you're probably underutilized with the and if they put seven in there that means 15 if they put 10 in there that's 10 you know so that was like reverse engineering that probably based on no real data. Yeah. Yeah. I think it's probably pretty accurate there. So Carl, it's obviously early days. And I think the whole picture here is it's really,
Starting point is 00:29:14 NVIDIA is out ahead, but it's really a mishmash and things could go in different directions. And you've got edge and data center with different needs and requirements. But we are going to put in front of you a crystal ball that we'll ask you to look into. Looking out 24, 36 months, even 48 months, 60 months, you know, where do you say within the data center? How do you see things shaking out? Is this on-prem or cloud or how is this all going to come together in terms of, again, this generative AI, big AI? Yeah. So in terms of big AI and data
Starting point is 00:29:46 center, if I look out three to five years, I would be shocked if the entire market outside of NVIDIA had 20% share. So NVIDIA five years from now will have some 80% share of that data center market, both inference and training. And everybody else is going to fight over that other 20%, which by the way, will be billions of dollars of revenue. So they're fine. It's not like they're going to fold up their tents and go away. They got a tremendous opportunity in front of them. But given the software advantage that NVIDIA has, now given the kind of dual-throated roadmap with two engineering teams vying for the next tape out, I'd be shocked if NVIDIA has less than 80% share. That's really very, very interesting because that also means that if you're a different player, it's less about beating NVIDIA and more about serving the part of the market that you can
Starting point is 00:30:38 and have a good time do, right? You know, exactly right. Look at what Tenstorm's done, right? They're not going head to head with NVIDIA. Jim Keller's team, they're saying, we will give you the components from which you can design your own solution. LG, Samsung, Kia, build something that's really bespoke for the problem you're trying to solve, and we'll give you the technology with which to build it. And UCI makes it easy to bolt these chiplets together and get something out with less than, you know, a couple hundred million dollars. I think that is what's going to fuel the revolution in the edge. It's not going to be another NVIDIA for the edge.
Starting point is 00:31:16 That's not the way it's going to play out. It's going to play out where you're going to see a lot of custom chips where all these companies want to do something very specific for their customers, for their smart televisions, their handsets, their washing machines and microwave ovens, their automobiles. They're all going to want to build something small. Honda's not going to want to do the same thing Toyota does, right? So they're all going to do different things. And the idea that you could all build those things from chiplets, from vendors like Tenstorm, there'll be many others, then you're going to see that market fragment significantly. So the commonality is IP and chiplets rather than
Starting point is 00:31:50 a finished product. Correct. Correct. You don't have to go build your own RISC-V cores. I mean, you can go buy those from Sci-5 or TenStorm or anybody else, right? They become commodities. Would it be fair to ask you, with all the hype around generative AI and the talk of, you know, it could be as big a deal in its own way as PC and the internet, do you have views on that? The ultimate impact we're looking at, is this being overhyped or not? I think in the short term, perhaps it was overhyped. But, you know, you've got over 100 million users already for chat GPT. There's an insatiable demand for this kind of service. Now, it could be wrong.
Starting point is 00:32:28 You know, it's funny. I asked Bart a question about supercomputing 23, and it told me it was in Dallas. I said, no, it's not. I know the answer to that question. It says, oh, sorry. Yeah, you're right. It's in Denver. I'm like, yeah, sorry. Yeah, you're right. It's in Denver. I'm like, yeah, duh. But in spite of those problems,
Starting point is 00:32:48 if you look at how enterprises will adapt and deploy, not adopt, adapt large language models for their specific data sets, and they'll do fine tuning and they'll get rid of all the parameters that have to do with who won the World Series or what happened in a war in Africa 10 years ago. All that's gone. So now I don't need a trillion parameters. I just need 10 billion parameters or maybe 20 billion parameters. I can do some real work. And I can build chatbots.
Starting point is 00:33:17 I can improve customer service. I can improve productivity of my coders. Inside, the opportunities are just endless. So it's funny. There was a recent buzz this week in the internet that, man, GPT-5, maybe it won't happen and blah, blah, blah. People are coming up with reasons why they would not do GPT-5.
Starting point is 00:33:35 I don't think they need GPT-5. They need to monetize what they got. They got GPT-4. It's freaking amazing. Lana 2 is fantastic for meta. Go monetize those models instead of focusing hundreds of millions of dollars on running a 10 trillion or 100 trillion parameter model to challenge the human brain.
Starting point is 00:33:55 That's interesting research. That's good science. I love it. But in terms of making money right now, plenty of tools out there to build from and do something very useful today. Excellent. Thank you, Carl. We could go forever. And I really appreciate you making time and having such a good lively discussion and sharing your insights. And yeah.
Starting point is 00:34:14 Yeah, that was great. It was fun. I'll see you in Denver in a couple of weeks. I look forward to in Denver and at the famous Dead Architecture Society. Dead Architecture Society. I'm finally going to be able to attend one because between COVID and family illness, I wasn't able to attend the last four. Yeah, that's right. I'm looking forward to getting back into it.
Starting point is 00:34:32 Definitely look forward to that. Excellent. All right. We'll see you then. Thanks for the opportunity to chat, guys. Take care. Thank you. That's it for this episode of the At HPC podcast.
Starting point is 00:34:43 Every episode is featured on InsideHPC.com and posted on OrionX.net. Use the comment section or tweet us with any questions or to propose topics of discussion. If you like the show, rate and review it on Apple Podcasts or wherever you listen. The At HPC podcast is a production of OrionX in association with Inside HPC.
Starting point is 00:35:03 Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.