In The Arena by TechArena - From Desktops to Data Centers: Zane Ball on Silicon Innovation
Episode Date: October 16, 2024In this episode – recorded live at the OCP Summit – host Allyson Klein catches up with Intel Corporate VP Zane Ball to discuss silicon innovation, AI evolution, and the future of enterprise adopti...on.
Transcript
Discussion (0)
.
Welcome to the Tech Arena,
featuring authentic discussions between
tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome in the arena. My name is Alison Klein, and we are coming to you this week
from the Open Compute Foundation Conference in San Jose. And I'm so delighted to be joined by
Zane Ball. Zane, you are an incredible force in the tech arena and have driven all sorts of
different silicon innovation across the compute landscape. Why don't you just introduce yourself?
This is your first time on the Tech Arena podcast.
Can you share some background on you and your engagement in this industry?
Absolutely.
Thanks and happy to be here, Alison.
It's great to be here at OCP.
I've been a regular here at OCP for several years.
It's a fantastic conference.
OCP team does a great job.
I've been in the tech industry for 28 years and all at Intel. So I
started at Intel in 1996. Intel's been a great place to work for me. My career's veered all over
the place in a really great way. I started doing signal integrity, silicon platform engineering. So
think of it as like everything that goes into and out of the chip, we would engineer. So making
data move in the chip at faster and faster speeds,
delivering power to the chip.
We were there with the first multi-phase voltage regulators
getting 110 amps into the chip.
We thought that was amazing.
And of course today that would be a little bit laughably small, but also
the heat and thermal mechanical management and all of that physics of
everything that goes around the chip was stuff that I worked on in Intel for many.
From there, I pivoted into the business side. I ran our desktop PC business.
So desktops in the moment of 2007 to 2012 was when the notebook was taking over. People thought desktops were boring. Maybe they were boring, but to some people they weren't. And
in that moment of notebooks disrupting, we found all kinds of cool things people were
doing with desktops and found a way to build a really great business out of that, including embracing gamers and overclockers
in ways we hadn't done before. And that was a wonderful experience for me to get the business
side to go with the technical work. From there, I lived in Taiwan for a couple of years, which is a
big part of the tech industry and anywhere you want to look at it. And there we were enabling
ultra thin and light laptops, getting battery life to 12 hours and enabling touchscreens on laptops for the first time and working with the Taiwan ecosystem to make all that happen.
From there, I came back to the US, led Intel's silicon foundry business along with my colleague, Steve A.
Armelli.
That's that earlier effort that we had.
And as they say, you can either win big or learn something.
And we learned a ton. That was actually probably the best job I ever had, being part of this kind of wonderful, fabulous ecosystem and learning all about how that works.
And then the last chapter, I've been in the data center group at Intel, going back to my engineering roots, platform engineering and architecture, developing chiplet-based high-density packaging, developing high-speed interfaces like CXL, PCI Express,
Gen 5, DDR5.
We're very proud of DDR5.
We have the world's fastest DDR5 memory subsystem in the world with DDR5 6400 and DDR5 8800
MRDIMS.
A huge amount of engineering that went into that.
And then also developing new standards for servers.
And one of the big things we've done with OCP is a thing called Data Center MHS, a modular hardware system that Intel was a key part of working with all kinds of companies here through the OCP foundation.
And the outcomes, I think, have been really outstanding.
And it's just been a hugely fine ride for me.
I want to get to that last one with you.
But as you were talking, I was thinking about,
I've obviously spent a lot of time at Intel as well. And I remember Intel developer forums with
the overclockers and just thinking, what in the heck are they doing? I came from data center. I
was like, that's the most amazing stuff in the world. But, you know, your description just talks
about all of the areas of the compute industry that you've touched.
And it's one of the reasons why I really wanted to talk to you, because we are at the cusp of this massive change, maybe more than anything else we've seen before, affecting every element of
the computing industry with AI and the development of generative AI. How do you see the compute
landscape evolving, given, you can't really call it a workload, but this massive force that's changing computing?
Yeah, it's the biggest technological change, I think, since the internet itself.
And I think there's two big things that occur to me is just when I think about where AI
is at and with this moment and who knows what the future is.
But the two things that percolate in my mind the most are, number one, we haven't really
figured out what the business model is really.
I think, well, some people are definitely making money with AI use cases and there's a lot
of talk and dreaming and huge amounts of investment. I think we're like the internet was in 1996.
Everybody's doing it. Everybody knows it's going to be big, but we weren't yet at that point where
we had the Googles, the Amazons and the new business models, much less the Facebooks and
the social media that eventually became these mega cap companies. So what is going to be that business model for AI or business models?
I think that's super important amount of innovation and experimentation going on.
I think that'll dictate a lot.
So that's one problem statement.
And then I think a technical problem statement is, I think it's fair to say that the AI technology
paradigm that we're on today is plainly not sustainable.
And I don't even mean
sustainable like good for the planet kind of sustainable. I mean, sustainable, like not
physically possible. I think it's true in technology and in science generally, all exponential curves
come to an end eventually. And we're on an exponential curve like no other with models
growing 10 times size every year. You know, what was trained on 10,000 GPUs last year,
100,000 GPUs this year, 100,000 GPUs this year,
maybe a billion GPUs next year. Are we going to see billion dollar models? Are we going to want
to see $10 billion models? I think it's plainly not sustainable and it'll be very interesting
how that exponential curve comes to an end. That doesn't mean, by the way, that AI innovation comes
to an end. It just means the current paradigm of just build bigger models with bigger and bigger
computers with more and more synthetic data. There probably is an end to that. And there's probably other things that are going to wrap
around that technology that carry us forward. And I think we're already seeing some pretty good
signs of what those things will be. But I think those are the two big things. How is this going
to swallow all the power in the world or something will change? And then how is we going to pay for
all this? What is the business model going to be? Those are the two things I think that.
And I think that second one is where I want to go first,
which is we've seen the hyperscalers invest incredible amounts of money in building these massive models in their pursuit of dominance, their pursuit of getting closer to AGI,
whatever they're pursuing, they are investing and developing technology at a rabid pace.
And then the second wave of this is going
to be enterprise adoption and where folks really see those use cases that are going to change
business and introduce new capabilities to industries and transform industries in unexpected
ways. Where do you see us on that second wave curve in terms of enterprise adoption? And is
that the monetization that the hyperscalers are enterprise adoption? And is that the monetization
that the hyperscalers are going after? Or is it something even different that they're pursuing?
I think there's a bimodal world emerging. There's a very practical enterprise world where you're
measuring return on investment. You're not going to change what business you're in if you're
McDonald's, American Express, or Bank of America, you're just using AI as a
technology to deliver better services to your customers or maybe make things more cost
efficient.
So in a sense, it's bounded by just conducting your business better.
And then I think on the hyperscaler side and on the big model innovators, they're really
just trying to push the frontiers of this technology and see what it's capable of, almost in a leap of faith that there's going to be some big payout as the
transformation occurs with what it's capable of. And I think that world of building just bigger
and more capable models is in contrast to maybe what the enterprise would do. You know, like I was
just at my annual health appointment on Friday at Oregon Health Sciences University, and I was
looking at all the doctors I was going to see through this whole day. And I had a new cardiologist,
my cardiologist retired and a new cardiologist. And I was curious about him. He's actually mostly
a researcher and he just does a little book. Go to chat GPT. I'm like, tell me about the research
of this doctor. And sure enough, it summarizes the SPX family of receptors on the heart,
calcium influx channels, all this stuff.
And I was like, make that a little simpler so I can understand. And then within five minutes, I felt like I understood what this guy does. And that was awesome. And I had a nice visit.
My heart's good, by the way. That's good. If I'm a financial services company,
do I need a model so expansive and so capable that it can tell me about the research of my
cardiology? Probably not. I can probably get by with something smaller, cheaper, more energy
efficient because I'm going to bring my own data to the model and I'm going to be very task specific.
Either I'm going to retrain the model or I'm going to engineer what goes in the context window
of the model in order to accomplish the business result I'm looking for. And so the rest of that
capability is so much wasted energy in a way.
And so I think one technology, retrieval augmented generation, I'm sure many people are familiar with RAG-based approaches.
It's really about bringing your proprietary data to the model.
It's about engineering the context window and then getting a much more accurate result. It's not that hallucinations can't occur, but you can really minimize it.
Or if I'm trying to interact with the drive-in at the burger joint, I could probably do a really, really good job.
Right. You can mitigate the risk.
I could do a really good job.
Something not making too important of a decision.
And that feels like where the enterprise would go.
And you're not going to have like racks and racks of supercomputers to do that.
And I do think it's like wide open.
When I talk to people in the industry, where are most enterprises at with generative AI?
Yes, freezing generative AI.
What are you using?
I'm using Microsoft CodePay.
Right.
They're not yet there.
And so there's a fair amount of experimentation, there's a fair amount of talk.
But I think the big enterprise build out of the applications is in front of us, not behind us at all.
So I think that's pretty interesting just as a business.
And I think there's plenty of use cases that are exciting.
And I think we'll see that build out.
But I don't think it's going to be megawatts and gigawatts. I think it's going to be on a smaller, because the ROI just has to be there from day one. There will be speculative investments, but not like huge we reach the performance requirements for continued pursuit
of AI? And are we at a bifurcation moment? I mean, we have been for a while between hyperscaler
computing and what we talk about at SEP and enterprise computing, and are they just going
to keep drifting apart? And if so, it seems like we have the right compute curve for enterprise
needs, but probably not reaching the stars that
the hyperscalers are seeking with today's technology. Is that an accurate assessment?
I think there is a bifurcation in the technology, but in terms of OCP, I would say one thing to
think about is, well, this show has gotten really big, the OCP organization, and you see a lot of
enterprises here, right? You'll see enterprises probably on stage. And just one other thought, a lot of relay experts that I talk with that in use cases are very important. So it isn't like
there's just a model that sits in some hyperscale data center. There's also a distillation of that
model that needs to exist on your phone or in an edge server or somewhere. And I think there's
a need for solutions to span across a lot of different infrastructure types.
To get to the more exciting use cases, this solution will be isolated to just one type of infrastructure or another.
And I don't think we've seen that quite happen yet.
Some of the Apple use cases we saw with the last iPhone month, I think, point a little bit in that kind of direction. to see how that distillation goes and what form of model takes shape in the edge device and how
it works together with what's at the back office or at the hyperscape. When you think about the
fundamental challenge of compute demand and what the industry can deliver, is this a question of
just the silicon players delivering the right logic and the right acceleration? Or is there
something even more fundamental in the computing
platforms that needs to be addressed? One of the things that we've talked about on Tech Arena quite
a bit is that every element of the platform is being innovated at the same time. And you typically
see engineering efforts intense in one or two places at any given time, but it's everything
right now. So what do you think is going to fundamentally shift in order for us to continue
to break through here? First, I think that just the intensity of the change and the rapid increase
in the amount of power consumption and the power density is, I think, amplifying trends that have
already been there. And there's really, I think, has always been throughout the cloud computing
revolution, three major directions in terms of
innovation. One of them is at the data center level, which has been extraordinarily important.
It's how we deliver power to the data center, how we cool it, how we construct it, how we manage it
is a first order bit in the amount of power consumption and unlocking innovation. So that
will continue to be a big thing. Where we find our power source,
the technologies around which the power gets from the transformer to the node to the chip,
a huge amount of power is lost in that journey. And so that's very important. How we cool the
chip is incredibly important. The move to liquid cooling is a huge efficiency or can be a huge
efficiency. It also opens up heat reuse opportunities. There's directed chip approaches, there's immersion approaches, there's
multi-phase immersion approaches.
There's RPE research program.
The cooler chips comes to mind.
There's a lot of innovation happening on that side of the equation.
So that's one thing that sits out there.
And that has been very important in the last 10 years as well.
There is Moore's law.
You need more efficient silicon.
And we are getting more efficient over time, right?
We're getting more efficient memories. We're getting more efficient logic. And then are getting more efficient over time, right? We're getting more efficient memories.
We're getting more efficient logic.
And then the third piece has always been software.
We got huge innovation and upside when we implemented virtualization in CPUs.
And then virtualization in the cold cloud architecture was a far more efficient use of resources.
I suspect we will see a lot of innovation in how the resources are utilized.
One of the biggest barriers in how the resources are utilized.
One of the biggest barriers in AI is just networking technology.
You know, we spend a huge amount of power moving the data around.
And more efficient networking technology is a huge area of investment.
It's easily as important as the processing of the data today.
And I think over time, it will become more and more.
And so those are all areas where we have to engineer very significant
improvements if we want to keep up with the curve that's being laid out. I had an interview with
Phoenix Snap recently, their middle cloud provider, and he said something really interesting,
which was a fully configured AI server today with eight GPUs and all of the accoutrement that comes
with that running it idle. When that system comes on, he's seeing 50 percent of the accoutrement that comes with that, running it idle, when that system comes on,
he's seeing 50% of the power that is required for that system at peak coming on just to keep
everything at idle, given how loaded this platform is. And one of the things that you're talking
about is just all of these different opportunities to drive more efficiency. Do you think we're on a path to address that and bring
systems like that back to the 8% to 10% idle power that we're typically looking at in these systems,
or is this just the new normal? And I know that's probably an oversimplification of a question, but...
No, I can speak a little bit more in terms of CPU systems, maybe, where we've been working
on exactly that problem. We just launched our Xeon 6 processor, and we made some substantial gains there.
And we learned to that point that just measuring power at the maximum load
doesn't necessarily address the real world.
The real world is more of a 50% load kind of case, or some people say 40, some people say 60.
But that is really where the power bill gets burned.
Not even at idle, really, because people don't where the power bill gets burned. Not even at idle really,
because people don't have a lot of idle hardware, but we developed the five golden KPIs of power.
So we really drive those. And so in the latest processor platform, the curve is a lot more optimized at that point where the servers are really used. And so we get some significant
power efficiency at that kind of middleweight point. And I suspect there'll be a
lot more engineering of the understanding what the power consumption is on average and pushing
the silicon and the systems into more efficient points. And then with telemetry, managing the
systems to those points to get the energy efficiency overall. I think we've been at such
a breakneck pace in the last 18 months that there's probably a lot of opportunity on the table to dial in that kind of power optimization. Those are going to be linear
opportunities. Those are opportunities, maybe 10% or maybe 20%. They're not 10x kind of opportunities.
Meanwhile, your systems are getting 10x bigger and more power. So you're trimming around the edges,
even though those are still a substantial amount of power at stake.
So we're at a CP Summit, and it's always a great show to see what's coming next.
Last year, I remember at OCP Summit, everybody was talking about the Submer liquid cooling demonstration,
and I think that gained a tremendous amount of attention.
What do you think is going to be the topic this year or the topics this year
that are going to capture the minds of the engineers
gathered here? That's a really good question. We're at the beginning of the show, so I haven't
seen what's on display yet. I think cooling will still be a huge deal. I think you'll see even more.
I think you'll see a lot of focus on rack scale level computing. You described earlier the x8 GPU.
I think you're going to see a lot more focus on rack scale solutions. NVIDIA has donated the NVL-72 rack design. And I think how we deliver greater than 100 kilowatt type designs and how OCP and standards bodies evolve to emplace envelopes that large, I think is bound to be a big conversation. Zane, it's been a pleasure talking to you. I just
have one more question for you. I have a million more questions for you, but one more question for
this interview. Where can folks continue the conversation with you and engage with you directly?
Easiest place to reach me is on LinkedIn. I'm always interested in hearing about innovation
going on in the industry. Well, thank you so much for being on the show today. It's
been a real pleasure. Thank you. Thanks for joining the Tech Arena. Subscribe and engage
at our website, thetecharena.net. All content is copyright by the Tech Arena. Thank you.