@HPC Podcast Archives - OrionX.net - @HPCpodcast-97: Addison Snell on HPC, AI, Hyperscalers – In Depth
Episode Date: February 6, 2025In this In-Depth feature of the @HPDpodcast, Addison Snell, co-founder and CEO of Intersect360 joins Shahin and Doug as they discuss a wide range of topics in HPC, AI, and Quantum Computing. [audio m...p3="https://orionx.net/wp-content/uploads/2025/02/097@HPCpodcast_ID_Addison-Snell_Intersect360-HPC-AI-Market_20250205.mp3"][/audio] The post @HPCpodcast-97: Addison Snell on HPC, AI, Hyperscalers – In Depth appeared first on OrionX.net.
Transcript
Discussion (0)
That's now over $100 billion in annual spending and will reach near $200 billion by 2028.
I would like to think we still hold on to proof and peer review and things like that,
knowing how you got to the answer, show your work. AI is still a bit of a
black box there. From OrionX in association with Inside HPC, this is the At HPC podcast.
Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications,
markets, and policies that shape them. Thank you for being with us. Hi, everyone. I'm Doug Black of Inside HPC.
This is the At HPC podcast with Shaheen Khan of OrionX.net. And with us today is our special guest,
HPC AI industry analyst, Addison Snell. He is founder and CEO of Intersect 360.
And Addison, it's great to be with you again. Doug, Shaheen, great to be talking to you.
Really great to have you here. I've been looking forward to this. Thanks for joining.
Superb. So we're here to talk about the major trends driving the HPC AI industry and to look
ahead at what you think could be some of the more significant developments to come in 2025
and the rest of the decade. But Addison, let's start with your bread and butter
research. Please share with us the size and growth figures for the HPC AI industry in 2024
and your projections for this year. Yeah, thanks, Doug. We did release a market update prior to the
SC24 conference where we added dramatically to our outlook for 2024. Now it's only into January
2025. We still have to dot the I's, cross the T's. Not all vendors are reporting yet on the
complete calendar year 2024. So this is all still officially a forecast and not a final number.
But the big thing that we added to was the effect of the hyperscale companies spending on AI, which we'd 6x growth over where it was to where that's now
over $100 billion in annual spending and will reach near $200 billion by 2028. Some of the
factors that went into that, first of all, Twitter or X.AI or whatever you want to call the conglomerate of companies
there under Elon really made the move to become a tier one hyperscale company. They had not been in
that tens of billions of dollars per year range. And then with the Colossus supercomputer going in,
they are now. So that added to it. Microsoft, Meta, both accelerated
their spending. So it's really zoomed ahead. Now, at the same time, we did also add to the forecast
for the on-premises portion. We added $16.5 billion to the five-year forecast in aggregate
over the full year. And it would have looked like a lot had not the hyperscale really
dominated the conversation there. So we've upgraded the five-year compound annual growth rate to the
on-premises or non-hyperscale portion to where that's now going to be over 65 billion by 2028.
But that's peanuts compared to what's going on in hyperscale. So that's what we're seeing.
And it seems to be in line with what other people are seeing with the real breathtaking growth of
AI at hyperscale. Is that in line with what you guys are seeing? Yeah, I'd say so. Certainly
from Hyperion. I think a big change or a change you both have been through is that you've brought AI in with HPC. So by definition,
an expansion in revenues is going to be seen, but you're also really changing,
would it be fair to say, Addison, how you're sizing and also what's involved in your market
analysis by the inclusion of AI? Yeah, that's absolutely right on. And we've been talking about the merged
HPC AI market, at least outside of hyperscale. And I'll come back to that in a second. And that's
because we found that in our research, they're really inseparable. Because you could look at,
people want to say, what's the AI market? And we look at the frontier and El Capitan supercomputers,
is there, these are HPC supercomputers.
They'd be part of the HPC market.
Are they not also part of the AI market?
Do they not do any AI research there?
Of course they do.
And then at the other end of the spectrum, you could look at something like Tesla and
the Dojo supercomputer or some of their investments.
You look at these and call them AI supercomputers.
Does Tesla not also do any
scientific or engineering workloads? Do they not do crash test analysis of their cars or aerodynamics?
I bet they do, right? So there's this spectrum of it's mostly HPC, it's mostly AI. But if you start
to try to divide up those budgets and say, how much is each? It's a shared infrastructure. Now,
what really is different is this hyperscale spending, which is the majority that we get over three quarters of
all data center spending now is going to hyperscale AI or hyperscale in general.
And that is a pure AI use case. And that's the primary segmentation now for our clients. Do you want to count that or
not? Is someone like Meta a prospective customer, or do you want to just focus on the non-cloud,
non-hyperscale, on-premises infrastructure market, which is a blend of HPC and AI?
And we're going to do another round of budget surveys this quarter to look at where are those
budgets the same or different? How much is HPC?
How much is AI? What's the relative growth rate? Because the AI portion is growing more,
but they're inextricable from a market size. Yeah, I totally see it that way as well.
I think it really is impossible to separate them. And one thing I would add too, is that
when you look at what AI means, it's A, lots and lots
of matrix multiplies.
So that looks like matrix algebra to me.
And also some of the algorithms that are used for sensitivity analysis or reasoning or whatnot,
they all have scientific roots in them.
They're all complicated math and including partial differential equations for some things.
So it's really impossible to say what is AI and what is not AI.
I think it's all kind of HPC at the end of the day.
Anderson, do you want to comment?
I agree with that.
And we see HPC users who are adding more accelerators into their configurations.
They're trying to serve both out of different workloads.
We're going to see AI integrated into HPC. But at this point, I think we also have to be on the lookout, not for convergence as much as divergence of HPC and AI, because we're really seeing a difference in now what kinds of processing elements and how many accelerators per node are really applicable. Something that's really optimized for AI is
going to have more GPUs and a lower precision. And something that's really configured for the
scientific and technical computing is going to have fewer GPUs at higher precision. So as much
as we've been talking about the convergence of HPC and AI for several years now, I think now
we've got to be on the lookout for divergence. But don't you think that those AI optimized configurations will also be used in kind of new
approaches to traditional HPC? Yeah, I do think that. And I think that's
potentially a big transition point or inflection point for HPC to watch out for in the long run
is if the supply side really goes toward accelerators and
lower precision, then to what extent does HPC need to respond and say, these are the configurations
available to us, and this is how we have to do HPC going forward, right? HPC has always been
at the wagon of a larger enterprise data center market and it has to go with available
technologies that are for something broader. What's changed is what that larger market looks
like. It's not general enterprise computing anymore. It's now AI is the big market. And
if that sounds desperate, just remember we've been here before with Cluster, right? When Cluster's burst onto the scene in the late 90s, there were plenty of dyed-in-the-wool
HPC diehards who said, that's not real HPC.
It's going to be inefficient.
The utilization's really low.
What a pain to get everything into MPI.
It doesn't suit all applications.
And it took a decade or so, but clusters won out because they
were the dominant technology and HPC had to learn how to adapt. We could really see that with low
precision processing elements going forward. Yeah. I know Doug has told me the story of
your conversations, the two of you some years ago, when you'd made a comment about science
and engineering being insatiable. And science never stops. That's it. That's it. Science never
stops. Right. So it's a great way of putting it. And I think that's what fueled clustering was that
the problems kept getting bigger and bigger until clustering was the only way to go, regardless of how big a
node you had. I think that's exactly right. In fact, there was some comedian recently,
I should look up who it was, who said, science doesn't know everything. If it did, it would stop.
So that's being a joke, but it's also right to the point. Science doesn't know everything.
In fact, science is all about the stuff we don't know. Exactly. We've got to learn more and then that's not going to end. So in a sense,
there's always the need for more computing to solve the next problem.
This reminds me of a quote from a Matt Damon movie. I forget the movie,
but he's dealing with something and he says, I'm going to science the hell out of this. He says, that's from the Martian.
It's a great movie. And it's one of my favorite books that I've read recently. It's one of the
few novels I've read more than once. And it's been at least twice and maybe three times. I find it
so entertaining. Yeah. We often hear that AI will disappear and everything over the long term. But, and also that HPC is really, AI is a HPC workload.
You're also saying, Addison, that really there's two divergent paths here, that there's
traditional HPC, that AI, that's a different, that's a different animal, and there'll still
be tremendous need for traditional HPC, mod, sim.
And then there's also this notion that some of the talk you hear, it sounds like
Gen AI is going to solve every problem we've got. It's going to solve cancer. It's going to
predict climate. It's going to do everything. So it's sort of hard to sort it all out.
I mean, it's definitely hard to sort it all out. And part of the problem is it's moving so fast
that it's hard to predict exactly what this will look like 10, 15 years down the road.
I think a great analog is to think about this like the World Wide Web, which had this much hype and promise to change the world going back to the late 1990s.
And it did. But I think in the late 1990s, it was hard to predict what smartphones would look like 10 years after that. By the time we got to, say, 2009 and the iPhone had really taken off, that was the year of there's an app for that and where we are with that even 15 years later now and how much we've really gone to apps which were enabled by the World Wide Web.
How will AI continue to integrate into technology in our lives? I think
it will have a transformative impact on both personal and enterprise computing. Does that
mean we're all going to spend way more money on it? I mean, do people have more money? Where does
it come from and what does it get spent on? Those are real questions.
Can I move the conversation to software? I know that you guys put a lot of effort into tracking all aspects of the market. I seem to recall some commentary on software. What can you
share with the audience on what you see happening there?
I mean, again, from an HPC standpoint, the software is still out there. The question is, how does AI get integrated into HPC?
And we haven't seen that level of software integration yet. HPC with things like pre and post processing into more areas like, could you do adaptive
mesh refinement? Or could you do predictive data fetching, promoting data to a higher tier
using AI? Or really, I think the biggest thing would be something like computational steering,
do an AI in the loop, teach the AI the rules of a different game that helped me optimize this airplane wing,
right? I think AI is good at that sort of thing. That goes beyond the obvious things like coding
and porting, which AI is also going to be very good at. As far as is AI actually going to solve
the science, I think AI is really good at predictions. And for engineering, that probably becomes good enough in a lot of ways.
But for science, I would like to think we still hold on to proof and peer review and
things like that, knowing how you got to the answer.
Show your work.
AI is still a bit of a black box there.
And we need more than just a leap of faith in terms of proving out
science. Addison, what are your views on the next generation of leadership that's coming out of the
Department of Energy and the National Labs? We see more of this sort of shared, distributed.
Yeah, it's that integrated research infrastructure, Doug, that you were telling me about.
And really, I guess the question is, what is Exascale.next?
I mean, I want to know what you guys think about this too.
But first of all, it's a key phrase there, leadership class supercomputing.
And there is a difference here with where that is now versus where it was.
Because as much as we all stood up and applauded El Capitan at SC24, and we should, and it's a
beautiful machine, and it's a great achievement, there's also this undercurrent of, wow, you built
a $400 million supercomputer. Good for you, right? It's not a big order anymore. It just isn't.
That's what, a 30, 40 megawatt supercomputer? Hyperscale AI data centers are getting built
out hundreds of megawatts, even gigawatts at a time. And these leadership DOE facilities,
they're not leadership anymore. They're leadership in terms of science, but we're used to thinking
of these as setting a direction in enterprise computing and what's going to carry HPC forward.
And that's not the case anymore.
That's changed.
Those DOE labs are not leading the market.
Hyperscale companies are leading the market.
And these leadership class facilities
are like we did with clusters,
are going to have to respond to what the new future
of HPC architecture is.
What I really like is in the FAST initiative, F-A-S-T, I have to look up exactly what that
stands for.
I've got it somewhere.
The FAST initiative out of the DOE, they really are looking at AI for science as being the
area that they're carving out and not AI in general.
Shaheen, what do you think about all that?
I'm highly aligned with you again.
Frontiers in artificial intelligence for science and technology.
Bingo. Nice. Yes, yes.
Science, security, and technology.
That's science, security, and technology is the second S.
Go ahead.
Oh, right. Interesting.
Yeah, yeah. I'd forgotten that the second S there.
I'm highly aligned with you.
I think that they're going to pursue all the next rev stuff in terms of hardware. They're going to
pursue algorithmic advances, including the inclusion of AI, like you said, for coding and
for optimization and orchestration. I think instruments themselves are becoming mini
supercomputers, if not full-on supercomputers. So they need to get integrated and integration becomes a major theme. I think they're going to
have to look at quantum computing and analog computing as a way of complementing the digital
classical world that we've done. And you put all of that together and that gives you another nice
step forward, if not two steps forward. Does that sound right to you? Yeah, I agree with that as well. It's a question of where is this going to have the biggest impact?
Because when we look at that FAST initiative out of the critical and emerging technologies in DOE,
I think that's exactly the point, that the AI designed for hyperscale is for different
applications than what DOE science really wants
to most look at. And therefore, there really is a need for that kind of public sector investment if
we're going to carry AI forward for scientific discovery. I think you're right. I think there
are national security requirements that may very well dictate a more brute force approach,
as long as that is an option.
And that certainly is an option,
because as you mentioned,
all these technologies are going and advancing so fast
that if you just simply replicated El Capitan today
with technologies that are going to be available
in 18 months, you are gonna get
a better, faster, cheaper system.
Yeah, and what we're talking our way around
that I think is a relevant topic we're going to have to get into today is the idea of nationalism or sovereignty in HPC and AI.
We're a few days into the new U.S. administration here. And with his first announcements,
President Trump has made it very clear that he wants to pursue an agenda of American exceptionalism. And he calls
China and other countries competitors in the race for high technology. So it's going to be exciting.
Oh, I think you're absolutely right. That's a whole new world. And even more so than it has
been over the past some years now, as you know, we've gone from like, let's all collaborate towards
advancing science to let's compete and let's fragment the economic spheres
like has happened.
So I think that has multiple implications,
certainly for advancement of science,
because I think it's chilling collaboration,
but also it is exposing geopolitical requirements
that people seem to agree with and are actual issues.
So how do you manage all of that?
Now, one thing I do want to point out that in one of the many executive orders that were
nullified this past week, one that was not was the one that was directing DOE to help industry
and the government to lease public land for sustainably powered data center. That carried
on from literally two weeks before that when it was issued by the previous administration.
Yeah, the good news is the DOE and the Office of Science and Technology Policy in particular
have generally enjoyed bipartisan support, maybe for different reasons. But both parties seem to agree with a lot of what comes out of OSTP
and DOE. I wouldn't say that means that the Trump administration is going to be one that focuses a
lot on sustainability. I think they're going to focus on advancement in whatever form that takes,
right? And when we saw Trump, for example, sort of jumped on to the Stargate project announcement from OpenAI and their associated partner, what President Trump had to say about that was how his administration was going to remove barriers as much as possible to enable this $500 billion investment over four years to further AI.
To me, I see a growing dominance, if you will, within advanced computing of the private sector
over the public sector. I mean, Addison, you brought up the $400 million El Capitan, which
compared with some of the spending that companies like Meta, Facebook is doing,
is really a small amount of money in relative terms.
Right. Be clear, it's off by about a factor of 100.
Yeah.
Two orders of magnitude.
Two orders of magnitude because these companies are spending $40 billion per year.
Yeah.
$40 billion per year. Yeah. 40 billion per year. So take 400 million times 10 again, and you're
talking about the annual infrastructure going into these places. It's that order of magnitude.
So Addison, this could impact your market findings here in some way, shape or form. Am I right?
It's already happening. Yeah. It already has. We've added substantial, right? So the question is, how big can AI get? Because for a company like Meta, does it make sense? Let's call it $30 billion and change for AI infrastructure. They've got $3 billion and change unique users worldwide across all of their platform. Does that mean Meta thinks that it can activate,
use AI to activate those users to the tune of an additional $10 per user per year through the use
of AI? Probably they can, right? That sounds reasonable actually, but it's not that many
companies that have more than a third of their global population as active users.
So I guess this does circle back around. I'll try to pull my thought together that traditional HPC
workloads within this larger spectrum of HPC class technologies driving AI, you've really
almost got traditional HPC is a niche within this larger HPC AI ecosystem or universe. I think that's exactly right.
Yeah. Okay. That would be a good discussion because I'd like to think that HPC is the umbrella,
but within it, there is a massive growth in AI that's dwarfing everything else.
Yeah. This is where I agree with Doug and Shaheen. I have a different take here because
that's the way we were looking at it, but AI is sort of zoomed ahead regardless of HPC. And I think we're more in a world now where
enterprise computing looks more and more like the tools designed for AI. And if you want to do HPC
or anything else, you have to figure out how to use these technologies for it. That's the
forward-looking approach. Yeah. So maybe my issue is more the wording and
the labels rather than what is being done. So I agree with what you're saying. If you define HPC
as scientific engineering-focused application-based, if you define it as a workload
rather than as a kind of what I consider like a discipline, skill set, infrastructure, underlying algorithms. If I
define it the way I just did, then it all is HPC. And there is a piece that is scientific
engineering, and there's a piece that is less scientific engineering, albeit enabled by science
and engineering. But I agree with you that the market as a whole doesn't have time to include
HPC as part of its vocabulary. You don't have to agree with me,
Shaheen. That's all right. I don't know what I'm talking about. Yeah, I'm just parsing it out.
Yeah. So this is not a bad segue into deep seek, which is deep sixing a bunch of existing models.
And it's causing a lot of fear, certainly in the investor community on, oh my God,
is this a brand new way of doing things that's going to reduce our reliance on
data centers and massive infrastructure and big capital and power and cooling and all that?
Yeah, I've been traveling. I was in Europe last week, so I haven't really dug into this in depth
yet, but I was reading up on it this morning. The news, to recap for the listeners to your podcast is that DeepSeek out of China has announced that not only
is it got an AI that's competitive at the same level with the with the chat GPT and other major
American models, but they've done it with a fraction of the high end GPUs. And that begets
the question of this China have the talent to do more AI with less technology,
reducing the need for so many U.S. exports and just forcing it on talent?
What do you think?
It's quite a story.
Like yourself, I think I would love to drill down a lot more than I've had a chance to,
but on the surface of it, it comes across like a breakthrough. And just like deep learning was
initially, and then large language models were a few years ago with ChatGPT, and both of those
events changed the market in a significant way. So it sounds like it has the makings of being that
kind of a breakthrough. And these breakthroughs are not really predictable until they happen.
But that said, I don't know whether exactly how much resources they use to train what
has been clearly described.
I think the paper that they published said the last round of their training took this
many hours on so many GPUs, but it wasn't like the full thing.
And you could always use half of the GPUs and take twice as long to get there.
But the part of it from a technology standpoint that was interesting to me was the presumption
that maybe you don't need the fastest interconnects to do what they're doing. And I think if the
reliance on coherent, low latency interconnects among thousands of GPUs, if that requirement goes away,
like Ten Store and Jim Keller has said that Ethernet will be just fine for what he needs to do.
If that ends up being a revelation for this next phase of AI, I think that would have an impact on
this market. Yeah, I think there's a lot in what you just said, Shaheen. To start with your point
that you could always run it on fewer processors and just take longer.
That's always been the nature of HPC.
That's the whole point, right?
Yeah.
The question is, how much do you speed it up?
You could run it on your laptop and let it take long enough.
That's fine.
So I think that's a really well-founded point. the way this is being discussed and covered is indicative of the fact that we already view the
U.S. and China as competitors, at least, if not enemies in this conversation, where it's a lot
like the U.S.-Soviet space race from the late 50s into the early 70s, that this is the new AI race
that's going to dominate the conversation. The one wrinkle I would
add into that is, what does US leadership mean in this context, or US competitiveness mean in this
context? Because in Europe, I think we really do see a focus on public sector funding of research,
but not so much the big European technology companies outside of that.
China has a unique model of state-sponsored capitalism, where it in essence acts as a
capitalistic market, but the government controls it behind the scenes. Whereas the US really does
have a laissez-faire capitalist market.
So when we talk about American leadership,
we're talking about companies that are headquartered in America,
but they're global companies,
and it's not exactly the same as the government has this leadership.
The leadership belongs to companies like Google and Microsoft and Meta and Amazon
and to an extent, NVIDIA
is the provider of these technologies. But when the Biden administration in its final days put in
export restrictions, that AI diffusion policy, NVIDIA's VP of government relations absolutely
blasted it in a blog. Unusually so, yeah. Talking about how this was, oh, yeah.
Now, part of that was, I think,
ingratiating themselves with the incoming administration
and showing where they wanted to align themselves.
But it really means that any American administration,
Trump or otherwise, has to thread the needle
of where are we enabling American technologies
versus restricting them? Because these are
for-profit companies. They want to sell their technology everywhere.
You said the crux of it to me really is exactly that, is how the funding mechanism and how the
governance of these advances take place. And as you pointed out, there's a stark contrast between
how it is done in China, how it is done in the US, and how it is done in
Europe. And in some ways, that really is the battle that is being waged, is which one of these
models is more effective. And of course, we, being in the US, like our model, and we think
ultimately that's how it's going to pan out. And there's a lot of justification for that.
But boy, the competition is heated. Yeah. Several years ago in the pre-exascale days, I gave one of these Capitol Hill testimonies
to the U.S.-China Security and Economic Commission.
And during the Q&A, someone asked me, did I really think China had the wherewithal to
lead in supercomputing in the long run?
I said, in the long run?
In the long run, China's been the leader in science, technology, and economics for 45 of the
past 50 centuries, right? They've got a billion people there. In the long run, yeah, they can
lead in just about anything. If they're not leading, it's an anomaly, right? So that's how
China tends to think of long run. You're right. There's definitely a different perspective on
long term. Who was it who was saying like some Japanese company just completed their 200-year plan? So India as well,
we talked about India also as an emerging force in the global scene, and they just got into the
chip manufacturing business. They obviously have the demographics advantage, highly educated,
aligned in terms of language and participation. So that's another big,
big variable in this geopolitical scene. Yeah. You take the world's largest population and you
point them all at a problem and say, we're going to enable it. I don't think you can ignore that.
But this all goes back to this notion of nationalism and sovereignty that we have independent efforts now for HPC and AI leadership
coming from the US, from the EU, from China, from the UK, Japan, India. All of these have
their own independent initiatives now, and it leaves other countries that have notable HPC or AI infrastructure, such as, say, Canada or Australia
or South Korea, right? How are they going to fit in on this global landscape?
Yeah, really interesting times for us to be tracking this.
Addison, looking at GPUs, assuming that the most powerful GPUs will continue to have value,
despite this DeepSix, I'm kidding, but despite the DeepS GPUs will continue to have value despite this deep
six.
I'm kidding, but despite the deep six story.
And looking at NVIDIA's lead, do you view it as insurmountable?
We did have Karl Freund of Cambrian AI Research on with us late in 2023, who said that, he
said, let's look at five years.
And he said he would be shocked if NVIDIA did not have 80% market share. But he also
allowed that the remaining 20% was still a significant market. But I'm curious how you view
Intel and AMD's efforts to compete in GPUs with NVIDIA. Yeah, I think first of all, we have to
give NVIDIA credit for running a flawless campaign around CUDA to get GPUs adopted into HPC to begin with before they
really took off for hyperscale and AI, which gave NVIDIA such a jump over the competition.
Now, it's not like Intel didn't see this coming or didn't try to react to it. They've been working
on GPUs and accelerators since Larrabee. In the early CUDA days, they just haven't been able to do it.
Intel has had a steady stream of disappointments outside of the base x86 CPU architecture, and it's really put NVIDIA way out on. AMD has a very competitive CPU-GPU combination. And in fact, if you go into that non-hyperscale HPC AI environment,
our users like the price performance of AMD. It's just they're way behind on the software ecosystem.
And AMD really hasn't been able to take a noticeable bite out of NVIDIA from that GPU
perspective. Now, where it's going to get
interesting is on the GPU side and how do people react to Grace and the ARM processors. AMD was
first to market with the integrated CPU and GPU together. NVIDIA is now following along with that.
From a hyperscale AI perspective, ARM and Grace doesn't seem to be bothering anybody. So I think really the biggest
threat to NVIDIA, I'll change the conversation a little bit. I think the bigger threat to NVIDIA
is the paradigm shift because NVIDIA is competing with its own customers on two fronts. They've
verticalized into system integration, interconnects, everything else. So they compete with the
traditional OEM server vendors who carry
g to market but more importantly they're supporting gpu clouds of their own that compete with aws and
azure and google cloud and oracle cloud and these major hyperscale companies are all designing their
own processors so i think the bigger question isn't how does NVIDIA compete with Intel and AMD? I think by the end of the decade, it could be how
does NVIDIA compete with Amazon and Microsoft and Google? And that's a very different question. If
we really do cross all the way into everything as a service, and we're not thinking about it from a
technology chip perspective as much anymore as who's providing the AI service.
That's right on. Well said.
How about that?
So who's going to win, Anderson? Kidding. I'm kidding. Who knows who's going to win, right?
And also, we're not that kind of analyst. I try to refrain from saying who's going to win because that would affect the market itself.
I'm fine with phrasing the question, but we try not to forecast the success or failure
of individual companies' technologies.
I think that's smart also
because people are not very good at predicting things,
especially, by the way, in this particular industry we're in.
As we were saying in the pre-show,
it's just changing so fast, violently fast.
I mean, the course of events is so unpredictable.
Yeah. And they're all companies already.
Wasn't that a Yogi Berra quote? It's hard to make predictions, especially about the future.
Exactly. God, he was right. One thought I've had is with all these companies, countries,
we're seeking to expand chip fab capacity, especially for GPUs. Is it possible at some point, I think it's almost
inevitable, I could be wrong, but we're going to run into a GPU glut? Or is just the appetite for
GPU computing, especially as it advances, just it will remain insatiable for as far as we can see?
We had that built into our forecast about having a slowdown, having a lag in the growth at some point, specifically around this notion of supply catching up to demand. It has certainly already taken longer than I would have thought. A big key indicator I would like to research more, and Shaheen, maybe you've got some insights here that I don't have. I want to look at some of these GPU focused clouds,
your core weaves and lambdas, denvers, there's tons of them. And a lot of them are getting
priority GPU shipments from NVIDIA in order to do that. I'd like to get a sense of how their
capacity is as a cloud standpoint. How available are these GPUs? Are they fully subscribed? The number of billboards I see driving up 101, imploring me to use their clouds of GPUs leads me to think that maybe they've got capacity. And if NVIDIA is providing GPUs to clouds that have excess capacity, then in some sense, we're already at a point where the demand shortage is artificial, but if they're fully subscribed, then it's not.
Do you have a good sense of that?
I want to dig into that a little more.
I get basically what you're describing, that many of their big customers are actually the
hyperscalers, the tier zero cloud providers, so to say.
For example, I hear that Azure is a big customer of CoreWeave.
That's what I read somewhere. And that certainly changes the dynamics. But I think in agreement with what you were saying earlier, while the slope of growth may go up or down, and certainly it has been way up in the past couple of years, the long-term slope of growth is unmistakable. Life just happens to have a lot of matrix multiplies. And when it's done with that,
it's got a lot of tensor math. And therefore, the future of GPUs is pretty assured and the future of
quantum computing is assured. But if you're sort of in the investment world, that long-term
perspective isn't really helpful. You're trying to predict what happens in the next 24 hours,
and that becomes a lot more difficult to do.
And like you were saying, that's a different game.
But I think the megatrends are quite strong in favor of GPUs and in favor of the ecosystem that NVIDIA has built, including their software library and NIM and AI Cloud and Omniverse and on.
That's a pretty formidable set of technologies.
Yeah, so smart.
And there's two things in what you said
that I want to follow up on
with regard to the future of GPUs.
One is in the very long term with AI,
to what extent does it really need these high-end GPUs?
From time to time in HPC,
we do find applications
where the application doesn't get more complicated as fast as technology gets more powerful.
Look at things like rendering in the entertainment space.
There was a time when I was in the industry that was a huge Silicon Graphics Onyx supercomputer.
And then it was grids of workstations.
And then it was PCs, right workstations and then it was pcs right it just
kept drifting downward finding new big prime numbers was another one that was like that
that went from being a cray supercomputer to downloading targets on your phone so what do
we think of that with respect to things like large language model because if we really are using human
intelligence as an analog, we do most
of our language acquisition by the age of five. After that, we still keep learning language,
but not at the same rate. So in the long term, does the computation for LLMs continue to need
this kind of really intense computational power? Or do you get to where you say they're really
mostly trained, and there's going to be some incremental learning that keeps happening, but it's not the same rate? The other thing that
you mentioned that I want to double click on is quantum. And here you're more of an expert than I
am, but the amount of the advancements we're seeing on quantum, I think if we weren't all so
busy talking about AI all the time, we would definitely be talking about how excited we are
about quantum computing these days. Good point, yeah. I'm surprised to hear you say that, Addison, given
Jensen's and Zuckerberg's recent skeptical comments about quantum. But expand on that. I
mean, where do you think we are in the bigger picture moving toward whether it's AI superiority,
AI complementing, sitting alongside each commercial practicality?
I mean, Shaheen would give a better answer than I will. So I'll go fast and then kick it to Shaheen.
But it feels to me like we've gone from having a 30-year history of quantum being 20 years away
to entering a new era of where it's only 10 years away. And it's really gotten very exciting now.
What do you think, Shaheen?
I think that's right on again. The roadmaps that folks have published starts to look really
interesting between 2030 and 2035. And the progress that's being made is quite substantial.
Now, the hype is also quite substantial. So the market continues to be underestimated and
overhyped, as someone put it so aptly.
So I think that's true.
And then referencing your comment, Doug, about the commentary by Jensen and Zuckerberg,
GTC, the NVIDIA major show in March, now has a quantum day that they set up.
I mean, it happened literally a few days after that question was posed to Jensen.
And I don't think he was trying to
have an agenda there. I think he was just responding the way he saw it. And indeed,
it's very reasonable to say that, quote, very useful, i.e. broadly applicable, deployable
quantum computing is some years out. But there is a billion dollar market today. So people are
using it for something. And for those guys, the use is very valid.
So it is just a little bit farther behind AI and farther behind GPUs by maybe a couple
of decades.
But it seems to be making great advances.
Let's also remember their perspective, right?
That NVIDIA and Meta are going to be primarily focused on AI markets right now.
My understanding of quantum is its best applicability in the near
term are on technical computing workloads that are not tightly aligned with AI. It's not that
it can't do AI eventually, but I think the nearer term advancements we'll see with quantum
are not in AI specifically, but more in other HPC. I think it's quantum chemistry, quantum physics,
which is sort of readily available. And now because...
Material science is another one we hear about.
Right. That essentially is quantum chemistry when you double click on it and electronic
structure of materials and things like that. But in agreement with you, while the quantum physics
of the computer can be formulated in terms of tensor math and matrix algebra, and that makes it
aligned with GPUs and QPUs, quantum processing units. There is an IO problem in quantum computing.
How much data can I feed it and how much data can I extract from it? Usually the read is more than
the write, so the input is more important. But right now it is limited to really
small data. Quantum computing is not for big data. Since we're on the topic of a future technology,
Addison, what if we looked at some of the predictions that Intersect 360 is looking for
this year and maybe looking out further the second half of this decade? Are there any major
pieces for this year that you're looking for? I mean, yeah. I think one thing we haven't talked about yet is the effect of AI on the storage ecosystem and the HPC-focused storage companies that have been out there.
Weka, Vast, DDN, Panasys, which is now rebranded as Madura, Hammerspace.
I'm always going to leave them out if I start listing them.
These companies are growing like crazy
and it's left the enterprise type storage companies
standing flat footed, right?
We've talked about the computing side,
but where's the growth been for NetApp or EMC
or companies like that?
Have they missed out on some of this growth, I think that's going to have big implications on data management for HPC going forward.
And I really want to follow up on what's the continuation of 64-bit computing and who's focusing on that.
It seems to still be a big part of AMD's roadmap. We see other companies like
Next Silicon that I think is notable in bucking the trend and continuing to focus on 64-bit.
In terms of our research agenda, the way we're going to approach that this year is through a
series of what we'll call state-of-the-market reports, state-of-the-HPC AI market for different
technology areas. So we'll do a module on
processing elements. We'll do a module on interconnects, on data management, on cloud,
on emerging technologies like quantum. That's how we're going to survey the market and combine
supply and demand side research on what our users say they need, what's in the roadmap,
what does the gap analysis look like? And a lot of this comes from our HPCAI leadership organization, Halo, having an influence on
our research agenda going forward.
We're getting good validation from them that this is what we need in the market right now.
Could I ask, Addison, one of my pet areas of interest is optical IO, silicon photonics. And people in that sector have told me 2025 will be
the year for this technology. We'll really burst on the scene as commercially ready,
integrated into chiplets, et cetera, et cetera. What's your view of that?
That could be right. I'm a big fan of photonics as a concept. And I even heard at an IDC conference
once, I have to give a shout out to where I heard
it. I forget exactly who said it, but one of the IDC executives said, if you really want to find
out what the next big thing is going to be, look for what was supposed to be the next big thing
15 or 20 years ago. It looks like it's ready now. And I would definitely put optical interconnects
into that bucket. When we had just started Intersect 360
research as TABOR research, which is now 15, almost 20 years ago, optical interconnects were
really big. And we were looking at what the future of those were. I wrote it into a blog about 15
years ago about the future of optical interconnects. It hasn't really blossomed yet.
But yeah, at SC24, we were hearing a lot more again. I would believe that A25 could be a
breakout year. We'll have to look at it. What do you think, Shaheen?
You're absolutely right. It's been another one of those technologies that's been permanently
a few years away. But in referencing Doug's allusion there, there does seem to be a lot
of good progress now with AR Labs and Light Matter and a few others who are, I mean, Intel themselves, including their glass substrate that looked really pretty cool and optically oriented, it seemed.
So it is coming and it feels like it is imminent.
And if it does, I think it'll make a big difference in the interconnect world for sure.
It's obviously used in telcos for long distance communication. So bringing it for super short distance communication
with chiplets and then all the way down to the cluster, I think that will be a game changer as
well. And it feels like it is really close. Yeah. And from my perspective, it's not just
the speed of moving the data, but the less heat that is involved in how it does improves performance.
So I've heard many people say this could be a game changer because of the power consumption and heat dissipation challenges facing server makers and the data center industry.
Yeah, SC24, I met with a company, Lightsolver.
Lightsolver, yes.
Yeah, I mean, this is not even a von Neumann architecture.
It's building a computer in their own press release.
They call it freaking laser beams, right?
So I don't know that we're going to have wide-scale deployment of that this year, but it's still an interesting development.
Like, all right, I'm interested.
Keep talking. I'm listening. Yeah, that was very interesting. I did get a briefing from them and
I really liked them a lot and we'll see what pans out. But it sort of felt like analog computing
and it's all areas that we have to pursue. So I'm glad they're doing it.
Okay. Any other predictions, Addison? No, I think those are the big topics in terms of how we're looking at the
industry right now. It is all moving very fast. I'll say that, especially the conversation around
AI. I've always loved HPC because we refresh the conversation. We're always talking about
new technologies and new workloads going forward. The speed of AI is something new. I'll give one example that at SC23, no one asked me about RAG, retrieval augmented generation, because it wasn't a thing. 2004 and around GTC and some of the big Intel and AMD conferences, everything was about
RAG.
And it was almost like I felt behind for not having written or surveyed anything about
it.
By the time we got to SC24, not one person asked me about RAG.
It was over.
And here in the cycle of one year from SC to SC, this huge topic flared up like an LA
wildfire and then was gone. And it left all
of this in its wake. And it's challenging as an analyst, because people want to know what do the
end users say about. And by the time the requirements come up, and we design a survey and put it out
into the audience, get it back, analyze it and start delivering it. But it's almost like it's
old news. Yeah, which gets back to my comment about where are you making predictions, right?
That's right. Yeah. It's all hard. Predictions are especially hard. We're coming to the top of
the hour. Yeah. But before we go, Addison, I want to ask you about your crossword puzzles
and your presence in New York Times crossword puzzles of all places, which is absolutely fantastic. So
I'd love to hear a little bit of a backstory on how you got into it and how it all went down that
you're featured in New York Times. Thanks. I appreciate it. People have hobbies and sometimes
they're interesting ones and sometimes they're crossword puzzles, but I do the New York Times crossword puzzle every day.
And going back to grad school, even I tried my hand at writing crossword puzzles and just like school newspaper kinds of things.
And I would say it was a pandemic hobby, but it was really before the pandemic.
I started trying to get back into could I write a crossword puzzle again?
And some of the tools for it have gotten a lot better.
AI still is bad at generating a crossword puzzle.
But if you can get your hands on good word lists, there's better puzzle creation software.
It's beyond just doing pencil and paper again.
And I tried my hand at getting a few published. And I think the New York Times also, in particular, also had an emphasis on trying to get more new creators in.
And in 2022, I had three crosswords published in the New York Times, including a Sunday crossword that has a bit of an SC, a supercomputing kind of overlap to it, which I really loved.
And that was my personal Everest.
And I've also had two in the
Wall Street Journal that came after that. So yeah, that was a fun hobby for me. Now, I haven't had
any new ideas recently or a lot of time to write them, but I try to keep a few ideas in the back
of my head in case I get one that I want to write up and send it again. And I've been twice to the
American Crossword Puzzle Tournament that actually competed in speed solving crosswords.
Now, I'm never going to win that.
I'm not that fast.
But I had a good time with it and got to meet Will Shorts and a lot of other notables.
Mike Schenck, who is the editor in the Wall Street Journal, a lot of the rest of the New York Times team.
It's a fun hobby.
Way cool.
That is way cool.
That is incredible. Addison, I don't know if you know the story when the British government during World War II was recruiting for solvers of the German military code.
The story I heard was that they ran a little classified ad in the Sunday Times that said, if you can figure out today's crossword puzzle, if you can complete it in 10 minutes or less,
call this phone number. And that's how they recruited for Bletchley Park. But what about the people who wrote the crossword puzzle? That was actually a plot point in the movie,
The Imitation Game with Alan Turing. And if you're a fan of that, then I would absolutely
refer you to my Sunday crossword in the New York Times from October 30th, 2022. You probably need
a subscription to get at it. And if you don't have that, then someone send me an email and I can
probably hook you up with a copy of that particular crossword. Excellent. Pretty good. Good stuff.
Thank you so much, Edison. This has been really wonderful. It's great talking to you guys. And I
know it's been an hour, but this is a fun kind of water cooler conversation I could spend all day doing. This is a great part of my day. Thanks for inviting me.
Thanks so much. Thanks for being with us. All right. Take care.
That's it for this episode of the At HPC podcast. Every episode is featured on insidehpc.com and
posted on orionx.net. Use the comment section or tweet us with any questions or to propose
topics of discussion. If you like the show, or tweet us with any questions or to propose topics of
discussion. If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The At HPC Podcast is a production of OrionX in association with Inside HPC. Thank you for
listening.