@HPC Podcast Archives - OrionX.net - @HPCpodcast-74: Karl Freund, AI Chips
Episode Date: November 8, 2023Karl Freund, founder and principal analyst at Cambrian-AI Research joins us to discuss the, well, "Cambrian explosion" that we are witnessing in AI chips, the general state of the AI semiconductor ma...rket, and the competitive landscape in deep learning, inference, and software infrastructure in support of AI. [audio mp3="https://orionx.net/wp-content/uploads/2023/11/074@HPCpodcast_Karl-Freund_AI-Chips_20231107.mp3"][/audio] The post @HPCpodcast-74: Karl Freund, AI Chips appeared first on OrionX.net.
Transcript
Discussion (0)
Are you attending SC23 in Denver, Colorado?
Lenovo would like to count on you to visit booth 601 on the show floor at the Colorado Convention Center, November 13th through 16th, 2023.
You can also visit lenovo.com slash HPC to learn more about Lenovo's HPC solutions.
Suddenly there's this massive pile of money on the table. Everybody's running for that pile of money as fast as they can.
And we're not talking millions, we're not talking hundreds of millions, we're talking
billions or tens of billions of dollars of semiconductor revenue.
Forget the system side, forget all the software that's being sold.
Just semiconductors alone are going to be in excess of $10 billion.
Definitely attracts a crowd.
If I were at Intel, I'd be pounding my fist on the table
saying, screw HPC.
No offense.
The bigger market's going to be AI.
So you've got to nail AI.
If you can also do a good job at HPC, wonderful.
But these chips that have massive 64-bit float
are just going to be get-ins.
So my feeling is that even if you're going to focus on AI alone, it behooves you to really recognize that HPC is the discipline underneath it. And while you may not have to go all out on
64-bit, you also cannot look like you're abandoning HPC. From OrionX in association with InsideHPC,
this is the At HPC podcast. Join Shaheen Khan and Doug Black as they discuss supercomputing
technologies and the applications, markets, and policies that shape them. Thank you for being with
us. Hey, everybody. I'm Doug Black. Shaheen, great to be with you again. Great to be here.
And we're going to talk about something that we rarely talk about and that's AI. I'm just joking.
Yeah, we have with us, really excited to be talking with Carl Freund. He is the founder and principal analyst AI chips, GPUs that have direct bearing on this whole craze
centered on generative AI, large language models. Carl, welcome.
Thank you very much, Doug. Pleasure to be here.
Yeah. And just to start, I mean, we know you've been in the industry for a long time. You were
saying you've been around and seen a lot of architectures come and go, but fill us in a little
bit. That's a nice way of saying I'm old.
That's a nice way of saying I'm really old.
Shaheen and I used to work together at Cray.
That's right.
Back in the day, right, Shaheen?
That's right.
Back in the days before we sold to SGI.
Boy, those were heady days.
Yeah, I've worked for Hewlett Packard.
I've worked for Cray.
I've worked for IBM for 10 years.
I worked for AMD and did a couple of startups, one of which your audience may be familiar with, Calzada.
We were building an arm-based SOC for the data center.
And for the last six or seven years,
I've just been an analyst focusing almost exclusively on HPC and AI
and primarily on the hardware that it takes to run AI efficiently.
So you can check out my work on camrian-ai.com and see all the
articles I've posted. But mostly I just try to help semiconductor companies articulate their
strategy and story and help amplify their story or criticize it if need be. Right on, right on.
I love the name of your company, Cambrian. Of course, that's like the explosion we're observing.
So right there, we have a very
fertile ground to have a lot of good discussion here. That's for sure. Yeah, when I first decided
to go out on my own, I thought, well, what should I name this company? And it's just an explosion
of AI. Jensen Wong coined the term Cambrian explosion of AI when he's actually referring
to AI models, not semiconductors. But I the side, it was equally applicable to semiconductors all
trying to compete with Jensen. That's the name of the company. Sure is. Sure is. Maybe that's
the place to start. Yeah. Well, it's funny. For six or seven years now, venture capitalists have
been presented with slide decks from startups that they all read the same. There's really
some minor differences in memory compute
versus at memory compute. Like that's not a difference, but they're all saying the same
thing, which is the hypothesis of, hey, you know, NVIDIA just got lucky. GPs are really good at
parallel processing of matrix operations, but it's got a bunch of other stuff on there you
don't really need for AI. So we'll just build, ship this just good for AI. And we've seen dozens of
companies go after NVIDIA with that kind of approach and haven't seen many succeed. A few
exceptions. I think Cerebras had a recent large win that's worth about a hundred million dollars.
That's good. But there's not many other ones out there that are seen as even being competitive.
It's starting to change. Obviously H100 is at the king of the hill right now.
So along comes Gaudi 2 and says, hey, we're almost as good as an H100.
And Gaudi 3 is coming around the corner.
AMD says, well, we're almost as good as an H100.
And MI300 is just around the corner.
And meanwhile, NVIDIA is saying, yeah, wait till you see what I can show you next spring.
So everything's just around the corner, right?
So it's kind of hard to assess competitive position of products that haven't been launched yet.
There's definitely stealing this, I think, from Andrew Feldman.
Suddenly, there's eight years of AI research and products and a reasonable amount of revenues measured in the hundreds of millions of dollars.
Suddenly, there's a massive pile of money on the table. Everybody's running for that pile of money as fast as they can. And we're not talking millions, we're not talking hundreds of
millions, we're talking billions or tens of billions of dollars of semiconductor revenue.
Forget the system side, forget all the software that's being sold, just semiconductors alone
are going to be in excess of $10 billion. Definitely attracts a crowd.
It sure does. I think IDC had a report this past week that they expect just the generative AI part
of it to be like $143 billion or something in four years with a CAGR of 73.3%, which is just
astounding. It's mind boggling. It just indicates that it's definitely a frenzy and people are stampeding
towards it right and you know what because of that you know people like amd and intel with
with gaudy and startups like cerebrus and others say hey you know what if i could just get two to
five percent in the market my investors would be happy it's easier said than done yeah but it never
works out that way, does it?
2% usually doesn't.
Yeah, it's not a sustainable position.
It's not sustainable.
Tell us, you know, let's start with the H100.
We constantly hear that NVIDIA is commanding very high prices and lead times are so long.
I mean, what's going to break that logjam?
What's going to enable a more plentiful supply of these advanced GPUs?
Well, time. I mean, time will enable more wafer starts and will enable more
cost-loss facilities to do the bonding required for multi-die packages, right? I mean,
those are the primary limitations. If you solve one without solving the other,
it's not going to do any good. Unfortunately Unfortunately for the industry, everybody's using that same technology.
Everybody's trying to do, you know,
3D stacking of HBM or HBM3 onto their ASIC or GPU,
whether it's Gaudi or MI300 or NVIDIA.
The exception would be Cerebrus.
Cerebrus doesn't use HBM.
And so they seem to be unconstrained supply.
And that may be one of the reasons
why their customers in the
United Arab Emirates decided to go with a supercomputer of Cerebers instead of waiting
in line. So I think the only thing that's going to solve it, honestly, Doug, is just time. Time to get
more supply. Demand's certainly not abating. There's other technologies on the horizon,
but they too will have supply constraints. Now, you could say that a silver lining of the US government's
restrictions on high performance technology to China will actually create supply. It's now
available for Western countries because it's not going to ship to China. That's kind of a strange
way to look at it, but it's probably true. It's probably true. So Carl, as you know,
because we've talked about this in the past, we did this Epic AI survey like five years ago.
And at that time, we could count like 27 different chips or projects around the world focused
on AI, including the folks that we're talking about.
And then I met with a friend of mine who is an executive, used to be semiconductor companies.
And he said, you're probably off by a factor of three.
That in reality, there's probably like 100. And then recently, I heard that even that number may
be too conservative. So just how many projects are going on around the world focused on here's a
specialized AI chip that's just going to kill it for my app. And therefore, there's no need for any other chip.
And my app is a sufficiently big killer app that's going to give me the volume to do it.
You know, I think separate the market into two big chunks, right? There's training almost
exclusively done in very large data centers. And then there's inference processing, which can either
be done in data centers, or in cloud or enterprise data center or at the edge.
And I think if there are hundreds of companies building AI chips now,
it's a function of two things.
First of all, the opportunity edge is massive and it's highly differentiated.
So you could have a solution that's good for image processing,
that's not good for audio processing,
that's not good for text processing and natural language processing.
So there's lots of different combinations of power performance and area that can target a
different segment of edge inference processing. Now, most of the startups that I work with,
they've all kind of done a student body left away from data center training, and they're either
focusing on data center inference or edge inference, predominantly edge inference.
There's no 800-pound gorilla in edge inference.
Maybe Jetson's an 800-pound gorilla, maybe not.
There's more opportunity and lower barriers of entry.
The software stacks required are much, much easier to amass.
You know, you basically need to run a handful of good models and do them very efficiently
and take advantage of every
trick you got in the book to make your chip sing and dance. Now, I said there's two things driving
it. First of all, is that market opportunity on the edge. Second is chiplets. And so you no longer
have to build a large, monolithic, expensive, multi-hundred- dollar project to enter the market. You can go to somebody like
Tenstorrent or Sci-5 and buy IP for things like RISC-V cores or from Tenstorrent, you can buy
their TenSix core accelerator, and then you just provide the glue and hopefully some sort of secret
sauce that will turn that into a Blockbuster chip. Now, that's the story. We haven't seen a lot of blockbusters chips with enough secret sauce to attract people to them yet.
But I think in the edge AI inference, especially for large language models, we're in the very early stage here.
We're in the first inning of that market.
People are still trying to figure out what you can use a large language model in the edge for.
There's got to be something, right?
They're all trying to find that something that allow them to do large language models or smaller large language models, I should say, you know, 10 billion parameters or something
like that and run them on the edge on their chips.
Will they be successful?
History would say no.
I mean, the history of all the companies trying to compete with NVIDIA is littered with hundreds of millions or billions of dollars of venture capital that basically went up in smoke.
I mean, some companies have had to significantly write down their holdings in AI startups because, you know, suddenly after five years, they have total revenue of $5 million.
Total revenue of $5 million.
You're kidding.
That's the sad truth is they can generate a lot of PowerPoint, but they can't seem to generate a lot of revenue. So that could be a
reason why they're gravitating to edge the IoT end of the world, because that's more fragmented
and uncontested. The problem is that, as I like to joke, everything is a thing. And so IoT doesn't
really lend itself to a clean consolidation.
Everything is like so specialized that you may use the volume to make it sustainable, right?
Exactly right, Shaheen.
It's kind of like FPGAs, right?
FPGAs, they don't have any large market outside of Microsoft.
They have hundreds of very, very small markets.
There's not a killer app in spite of a lot of people who attempt to create one out of FPJs. And I think similar situation will unfold for Edge AI. There's a lot of interesting use cases, but fact of the
matter is, Qualcomm Snapdragon probably has the best AI on the Edge right now. Most people use
it without even knowing it, which is fine, right? You take a picture with an Android phone today,
you don't know that you're using AI. You just know it produces really good pictures in the dark.
And of course, that's all AI. I think the best AI is perhaps hidden.
Right. Okay. That's a really good point.
Sticking with sort of the major, the three big GPU vendors, Intel, AMD, and NVIDIA. Let's start
with Intel. Shaheen and I were talking about this episode coming up. We frankly aren't sure. We have
Ponte Vecchio and we have Gaudi 2.
And there have been other acquisitions that Intel has made and spun back out like Mobileye
and a couple of others. So they basically ended up having a lot of choices. What do you see them do?
Well, it's not clear to me. Obviously with the convergence to Falcon Shores, the convergence
of GPU and the Gaudaudy architecture they have not
articulated what that means gaudy 3 is probably going to be a pretty amazing chip quite frankly
the question will be is it a dead end and if it is nobody's going to buy if they can show a path
from there to the converged product line of gpu and habana labs then they've got a shot they can
say look here's your path start with gaudy 2 then go to Gaudi 3, and then you go to what they should have
called Gaudi 4, because I think Gaudi's got a lot more going for it right now than Ponte
Vecchio.
Ponte Vecchio's got great 64-bit float, which is perfect for national labs and other high
performance computing centers.
But for AI, it's totally useless for that format.
And that's what you're spending probably 30% of your die area on 64-bit
float. That die area could have been used for 8-bit enter, 16-bit float, or something more
useful for AI. And I think that's where they're headed. I really do. I'm not sure, but I suspect,
given the interest in Gaudi 2 and the more concentrated messaging around Gaudi 2 from
Intel in the last two months, I suspect that they're going to make this try to look like Gaudi 4.
It also has 64-bit floating.
I don't know how you fit that all into a die, but that's their challenge.
They've got two architectures.
They're both good.
Neither are good enough to give a knockout blow to NVIDIA.
So maybe if they combine them, in theory,
they can have enough weight behind that punch and they can make a dent.
But it's going to be 2025 before we see it. So we'll see if Gaudi 3 can save the day. If so,
I think they're going to have to paint a clear roadmap because it's all about software. And
if I have to report, if I have to retune, then why would I spend the time and effort on Gaudi 3?
In spite of the fact that I think Gaudi 3 is going to be a pretty amazing chip.
Yeah. And Gaudi 2 is getting good grades from people who are using it. So I think what you're
saying is very plausible path and therefore maybe exactly what they should do if they're not.
Yeah. Right.
It sounds like an amalgamation of these various efforts kind of coagulating.
Yeah, it does. Honestly, given the growth rates and the numbers that Shaheen just shared,
but this is about five minutes ago, if I were were at intel i'd be pounding my fist on the table saying
screw hpc no offense the bigger market's gonna be ai so you gotta nail ai if you can also do a good
job at hpc wonderful but you know these chips that have massive 64-bit float are just it's just
gonna be dead ends right so let me have a slightly different perspective not completely misaligned i think that like look at ibm they went after ai alone
they sort of look like they're abandoned hpc and six years later they have neither you know power
10 is a great chip really should do a lot better than it is doing but somehow really hasn't done
that so contrast that with amd which has high performance computing as the corporate mantra should do a lot better than it is doing, but somehow really hasn't done that. So contrast
that with AMD, which has high performance computing as the corporate mantra, and they are thriving.
So my feeling is that even if you're going to focus on AI alone, it behooves you to really
recognize that HPC is the discipline underneath it. And while you may not have to like go all out
on 64 bit, you also cannot look like like go all out on 64-bit,
you also cannot look like you're abandoning HPC. Yeah, it's got to be, I think we're in alignment there, Gene. If you look at what the MI250 did, it's got too much 64-bit float to build a good
AI chip. You could do okay AI, but you're not going to do great AI, mostly because of the lack
of support for low precision math. So there's got
to be a balance, right? And the balance is probably a little more focused on AI, a little less focused
if you're AMD on 64-bit float. We'll find out when MI300 is actually announced. It's been teased so
much. I feel like I'm in Las Vegas. Right. Exactly. But it sounds like a good chip. It really does.
Now you could argue maybe they went overboard with too many dies
because the secret problem that nobody's talking about with chiplet architectures
is that you have to dedicate some die area for the chip-to-chip communication.
That takes up space that could be used for SRAM or ALUs.
And so you're going to need to take a performance hit,
but you could get some cost
savings. So I don't know. If you look at MI300, the biggest question in my mind is,
will it have a transformer engine? If you believe what NVIDIA is saying, and I tend to,
the transformer engine is going to give you two to three X performance improvements over the same
chip without transformer engine. So if what they have is that same chip without transformer engine, they're going to be half the performance of, let's say, maybe not H100, but HNext 100,
which will come out contemporaneously with the MI300, right? And then you say, well, now let's
talk about HBM. Well, the MI300 seems to have a higher capacity and bandwidth of HBM. Three,
NVIDIA has addressed that with their Grace Hopper version with HBM3e,
but we'll have to see where the benchmarks land.
Speaking of which, I would be shocked if AMD released public benchmarks.
Unfortunately, they've never stood up any benchmarks for MLPerf,
and I don't expect them to start.
Yeah, a couple of comments I wanted to make.
One is on 64-bit support.
In total agreement with you, there was that Chinese chip, Baron, was it?
That didn't even have 64-bit.
And God knows if it can be manufactured now that they might not get allocation from TSMC.
But it was an indication of, in fact, I was contemplating an article saying the end of 64-bit computing sort of thing. On the other hand,
HPC people are using lower precision to do HPC, and they're using AI to do HPC. So I think that
is kind of a synergy between HPC and AI because HPC enables it, but also takes advantage of it.
And that's kind of an interesting thing, right? I agree. I was going to mention that I
believe the fastest simulation you can run is the one you don't run, it's the one you estimate.
Okay. So instead of going through the effort of actually running a full simulation,
I can take all the runs I've done in the past 10 years, I can use those to train a neural network
model, and I can estimate what a different set of starting conditions would produce
if I ran a simulation.
And the results have been astounding.
Exactly.
So now you do the last mile for real.
Yeah, and you do the last mile for real.
Exactly.
Smarter revolutionizes HPC.
If you're attending SC23 in Denver at the Colorado Convention Center,
stop by booth 601 from November 13th through 16th to learn how.
Lenovo is hosting a number of booth sessions covering the latest industry topics, including sustainability, generative AI, genomics, weather, storage, hybrid cloud, and more.
You'll also find interactive demos featuring an AI avatar, digital twins, HPC cluster management software, and Neptune
liquid cooling. Be sure to visit booth 601 and visit lenovo.com slash HPC to learn more.
By the way, Carl, I did read your article in June in Forbes about the MI300. And as you just
mentioned, the lack of a transformer engine, I guess is that event in
June in San Francisco that AMD held. They did announce a partnership with Hugging Face around
a transformer. So I don't know if you saw that announcement.
Yeah, I did. I did. And I think it's interesting. I had a conversation with an executive from AMD
a couple months ago. I said, look, I think you've got a great chip coming, but the world thinks you don't have software.
How do you respond?
And he said, go to Hugging Face
and see how many models are already available
and optimized to run on the MI250.
And it's pretty darn impressive.
You know, hundreds of models.
And so the CUDA moat is kind of looking shallow right now,
I think, between OpenTriton and PyTorch2.
Do you really need CUDA?
Well, you do if you're going to run, you know,
weather simulation codes or NASTRAN.
Sure, you need it.
But do you need it to do large language models?
The answer is no, you don't.
But it will make it faster if you use it.
The same is true for AMD's ROKM software,
which is optimized blast libraries,
right, to accelerate linear algebra and other important algorithms. You can do a Triton port
and run it on an MI250 or soon an MI300, and you'll get good performance. And if you plug in
Rockham software libraries underneath it, you should get better performance. So it really gives
a lot of flexibility to the development community to port quickly and easily, and then do the tuning
with either CUDA or Rockham or one API with Intel. Although I should point out one API is not
available yet on Gaudi. So not on Gaudi. That's right. Now, if anybody can execute, NVIDIA can. But there's also like a qualitative shift when you have no competition versus when you have some competition.
And I think that phase is going to be a little bit different for them, isn't it?
Well, I was thinking the same thing.
And then, you know, a couple of weeks ago, NVIDIA did an investment tour and they were showing a new roadmap that I had never seen before, which doubles the cadence of GPU releases.
I mean, really?
You can go to a one-year cadence of GPU technology, and I think with the advancements you're going to see in HPM, the advancements you're going to see in chiplet interconnect technology, and two or three nanometer fabrication technology. I would bet on the company that's got a yearly cadence of new technology
coming out to take advantage of this new underlying enabling tech.
Is that like Intel's old TikTok or is that a TickTick?
I don't know.
They haven't disclosed much about it, so I guess it's a TickTick.
But this rapid cadence, it sounds like you're getting the sense
NVIDIA is kind
of stealing a march on everybody. Yeah, I think what happened is kind of
speculate a little bit here. NVIDIA saw, looked in the rear view mirror and saw a Gaudi 3 coming.
They saw MI300 and the interest it's getting at places, it's attracting in places like OpenAI
and elsewhere. And they said, well, we better do something. And Jensen probably looked around and
said, well, I have more money than God. If you want to start up a whole other engineering
team and do two chips instead of one, let's go for it. That's right. Have money, we'll spend.
Exactly. Now, another question for you. So the lead times for H100s, A100s, and of course,
they've done the L40s to alleviate some of that pressure.
And like you said, they're trying to get more allocation and all of that in time is going to
work. But for the moment, if you're not like a big time famous customer, you may have to wait
quite a while to get your hands on something like this. And I feel like that is probably
strategically not a good thing for NVIDIA, because it's causing some people to look
at alternatives when otherwise they would not have. So I think that's an opportunity for AMD
and Intel for sure. But also for the next tier after that, Samba Nova, Graphcore, Grok, Onteder,
and I'm sure I'm missing a few others, Cerberus, of course, as you mentioned, and then perhaps even
some tier behind them. Do you see that? Is that like something that
they should worry about? Again, you have to segment the market. If you're talking about
training, I don't think they have to worry about it. I really don't. That's a good point. If you're
talking about inference processing, let's say, let's take data center inference processing.
It's a huge market, right? I mean, it takes 16 H100s to answer one chat GPT query.
That's not sustainable.
Something's got to change.
Something's got to give.
And so I think what AMD's considering now is really focusing on that inference opportunity
where they've got an advantage over NVIDIA, not parity, but potentially an advantage with
their high bandwidth memory that could give them some breathing room.
Yeah, I think that because of the lack of availability,
because of the expense,
and now because potentially poorer performance,
let's put it this way, fewer,
you can get the job done with fewer MI300s and H100s.
That's going to save you a lot of money.
That's going to significantly drop your TCO.
And that's why NVIDIA said,
oh, hey, we got the Grace Hopper
with an equivalent memory capacity as well.
I thought, well, why did you do that?
Why did you make that just Grace Hopper
and not, let's say, a multi-die Hopper, right?
You can put two Hoppers on a super chip
as well as a Grace and a Hopper.
Why did you do that?
I don't know.
It's all about memory.
In fact, I think part of the reason people use multiple GPUs is because they need the memory, a grace and a hopper it's memory isn't that i don't know it's all about memory in fact i think
part of the reason people use multiple gpus is because they need the memory not the compute
and if you're on grace hopper you can get access to the memory that's on the cpu so it's a lower
cost way to get access to memory isn't that probably what it is right well i think we'll
see i mean unfortunately the war in israel preempted Jensen's keynote at his AI day in Israel, right?
And I was anxious to hear what a system looks like.
Because if you think about it, system design, if you're HPE or Dell or Lenovo, Supermicro, Penguin, all these guys have got to rethink system design.
Because all of a sudden, for an AI supercomputer, there's no DRAM.
That's right. That's right.
Whoa, there's no DRAM? Yeah, there's no piece. And potentially with chips like Infabrica's chip,
which I've written about in Forbes, if you haven't seen it, I'd encourage you to take a look at it,
audience, but that's going to completely change the backend away from PCIe and give you something
that's got a Bluefield chip on it.
The DPU kind of a thing? Yeah.
It's got DPU, it's got a Grace and a Hopper, and then it's got sort of an outbound chip that's
aggregating all of the access to remote nodes and obviating the need for PCIe, obviating the need
for standalone NICs to talk to the network. Whoa, that's going to be a very different system design
than anything we've ever seen before. And when I think about what's going to happen in supercomputing
in a month, that's what I'm looking for. I'm looking for, show me some examples of what one
of these things look like. Yeah, me too. I agree. I put something on LinkedIn as a teaser a few
weeks ago that a massive change in system design is coming. And it's going to be three, four years
before it pans out. But once it's done, oh man, it's going to be three, four years before it pans out.
But once it's done, oh man, it's going to be very different.
Very different.
Very different.
If that happens, who will be the losers?
That's a good question.
If you're on the edge, you're not using HPM anyway, so you don't care.
If you're on the edge, you're using some kind of low power DIMM,
LPDDR5 and stuff like that.
You know, it's the people going after data center where, and you mentioned it, it's all about the memory bandwidth, right?
I mean, answering a query on chat GPT, the GPU is probably less than 20% utilized.
And I've heard much less than 20% utilized.
So if you can get more memory capacity and bandwidth, primarily capacity per GPU, you could use far
fewer GPUs and dramatically lower the cost of inference processing. So who are the losers?
The losers are the guys who, when you do that, no longer have the GPU headroom. So if you're
running along and you're doing just fine because your memory capacity limited, and suddenly if
you're not memory capacity limited, and there's a lot of technologies we can dive into here that could get you there. And all of a sudden, you're going to be GPU limited,
and then you lose. Right. And I'm glad you have some view of what the utilization is,
because for years, I've been asking and not really getting an answer. I sort of was joking that
because their multi instance GPU, their make capability started out with like seven partitions.
And I think it's like
10 now is it i don't know to me that sort of indicated that oh it's probably between 10 and
15 that's right if mig can help you out that means you're probably underutilized with the
and if they put seven in there that means 15 if they put 10 in there that's 10 you know so that
was like reverse engineering that probably based on no real
data. Yeah. Yeah. I think it's probably pretty accurate there.
So Carl, it's obviously early days. And I think the whole picture here is it's really,
NVIDIA is out ahead, but it's really a mishmash and things could go in different directions.
And you've got edge and data center with different needs and requirements. But we are going to put in front of you a crystal ball that we'll ask you to look into.
Looking out 24, 36 months, even 48 months, 60 months, you know, where do you say within
the data center?
How do you see things shaking out?
Is this on-prem or cloud or how is this all going to come together in terms of, again,
this generative AI, big AI?
Yeah. So in terms of big AI and data
center, if I look out three to five years, I would be shocked if the entire market outside of NVIDIA
had 20% share. So NVIDIA five years from now will have some 80% share of that data center market,
both inference and training. And everybody else is going to fight over that other 20%, which by the way, will be billions of dollars of revenue. So they're fine. It's not like they're
going to fold up their tents and go away. They got a tremendous opportunity in front of them.
But given the software advantage that NVIDIA has, now given the kind of dual-throated roadmap with
two engineering teams vying for the next tape out, I'd be shocked if NVIDIA has less
than 80% share. That's really very, very interesting because that also means that if you're a different
player, it's less about beating NVIDIA and more about serving the part of the market that you can
and have a good time do, right? You know, exactly right. Look at what
Tenstorm's done, right? They're not going head to head with NVIDIA. Jim Keller's team, they're saying, we will give you the components from which
you can design your own solution. LG, Samsung, Kia, build something that's really bespoke for
the problem you're trying to solve, and we'll give you the technology with which to build it.
And UCI makes it easy to bolt these chiplets together and get something out with less than, you know,
a couple hundred million dollars.
I think that is what's going to fuel the revolution in the edge.
It's not going to be another NVIDIA for the edge.
That's not the way it's going to play out.
It's going to play out where you're going to see a lot of custom chips
where all these companies want to do something very specific
for their customers, for their smart televisions, their handsets, their washing machines and microwave
ovens, their automobiles. They're all going to want to build something small. Honda's not going
to want to do the same thing Toyota does, right? So they're all going to do different things.
And the idea that you could all build those things from chiplets, from vendors like Tenstorm,
there'll be many others, then you're going to see that market fragment significantly. So the commonality is IP and chiplets rather than
a finished product. Correct. Correct. You don't have to go build your own RISC-V cores. I mean,
you can go buy those from Sci-5 or TenStorm or anybody else, right? They become commodities.
Would it be fair to ask you, with all the hype around generative AI and the
talk of, you know, it could be as big a deal in its own way as PC and the internet, do you have
views on that? The ultimate impact we're looking at, is this being overhyped or not? I think in
the short term, perhaps it was overhyped. But, you know, you've got over 100 million users already
for chat GPT. There's an insatiable demand for this kind of service.
Now, it could be wrong.
You know, it's funny.
I asked Bart a question about supercomputing 23, and it told me it was in Dallas.
I said, no, it's not.
I know the answer to that question.
It says, oh, sorry.
Yeah, you're right.
It's in Denver.
I'm like, yeah, sorry. Yeah, you're right. It's in Denver. I'm like, yeah, duh. But in spite of those problems,
if you look at how enterprises will adapt and deploy, not adopt, adapt large language models
for their specific data sets, and they'll do fine tuning and they'll get rid of all the parameters
that have to do with who won the World Series or what happened in a war in Africa 10 years ago.
All that's gone.
So now I don't need a trillion parameters.
I just need 10 billion parameters or maybe 20 billion parameters.
I can do some real work.
And I can build chatbots.
I can improve customer service.
I can improve productivity of my coders.
Inside, the opportunities are just endless.
So it's funny.
There was a recent buzz this week in the internet that, man, GPT-5, maybe it won't happen
and blah, blah, blah.
People are coming up with reasons
why they would not do GPT-5.
I don't think they need GPT-5.
They need to monetize what they got.
They got GPT-4.
It's freaking amazing.
Lana 2 is fantastic for meta.
Go monetize those models instead of focusing hundreds of millions of dollars
on running a 10 trillion or 100 trillion parameter model
to challenge the human brain.
That's interesting research.
That's good science.
I love it.
But in terms of making money right now,
plenty of tools out there to build from and do something very useful today.
Excellent.
Thank you, Carl. We could go forever. And I really appreciate you making time and having
such a good lively discussion and sharing your insights. And yeah.
Yeah, that was great.
It was fun. I'll see you in Denver in a couple of weeks.
I look forward to in Denver and at the famous Dead Architecture Society.
Dead Architecture Society. I'm finally going to be able to attend one
because between COVID and family illness,
I wasn't able to attend the last four.
Yeah, that's right.
I'm looking forward to getting back into it.
Definitely look forward to that.
Excellent.
All right.
We'll see you then.
Thanks for the opportunity to chat, guys.
Take care.
Thank you.
That's it for this episode of the At HPC podcast.
Every episode is featured on InsideHPC.com
and posted on OrionX.net.
Use the comment section or tweet us with any questions
or to propose topics of discussion.
If you like the show, rate and review it on Apple Podcasts
or wherever you listen.
The At HPC podcast is a production of OrionX
in association with Inside HPC.
Thank you for listening.