In The Arena by TechArena - Ampere Advancing in the Data Center with Jeff Wittich
Episode Date: January 2, 2024TechArena host Allyson Klein chats with Ampere Chief Product Officer Jeff Wittich about his company’s progress in winning data center deployments, advances in performance and sustainability, and pus...hing the limits on core density.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now, let's step into the arena.
Welcome to the Tech Arena. My name is Allison Klein, and today I am so delighted to have
Jeff Wittek, Chief Product Officer of Ampere, back in the studio with me. Welcome to the program,
Jeff. Thanks, Allison. Great to be here. So, Jeff, you were one of my first guests on Tech Arena.
Tech Arena just celebrated its one-year anniversary, and we're kicking off Season 2.
So it's so nice to have you back for another go-around to talk about what's going on in the data center.
Why don't we just remind our listeners your role at Ampere and Ampere's role in the industry. Yeah, so I'm the chief product officer at Ampere Computing.
And at Ampere, what we've set out to do is to build the compute that's required for the future of the cloud.
Compute that's both high performance and sustainable.
We've long been building high performance processors across the industry to meet the growing compute demand in data centers and in the cloud.
But increasingly, our ability to continue to improve compute capacity, to improve performance,
it comes with another consideration,
and that's efficiency and sustainability.
We can't just keep throwing more and more power
at the problem.
We need to actually build more efficient processors.
More efficient processors will inherently allow us to deploy greater compute capacity and achieve higher performance.
So, Jeff, I've been following the Ampere story since you guys launched.
It's such an exciting entry into the data center arena. You know, we've always talked about when will ARM reach that tipping point
and when will a processor architecture
that is efficient and performant
deliver to data center requirements.
And I think that one of the things
that I wanted to start with you on
is I know that Ampere has really built
that foundation of we're going to do it in power envelopes.
We're going to deliver more performance within power envelopes.
What has been the reason why you think customers are turning more and more to power being a central design point?
Yeah, well, I think that it's long been a consideration, but it's not really been the main constraint. If you look back a decade ago or 15 years ago, with the United States, say, as an example, we always had a lot of excess power on the grid.
There was no shortage of available power in any of the main regions.
Building out data centers that consume more power did increase your operational costs,
and there was some logistical challenge of getting more power into your data centers,
but it wasn't an inherent limitation. Today, that's just no longer the case. The grid has a
lot more demand on it than it did previously. Things like electric vehicles, obviously a very
good thing. It's getting people from consuming gas in their vehicle
to consuming, hopefully, much more green energy off of the grid.
But it does increase the demand on the grid itself.
And with data centers continuing to need to meet this super fast growing need
for more and more compute,
that's just one more constraint.
And so I think that as we talk to customers and data center operators,
they're finding that it's no longer as simple as just planning out the number of data centers
that they're going to build over the five or 10 years,
or procuring more power we needed from the power grid, from the power
company that supplies that area, they're actually getting blocked entirely from being able to
build a new data center.
Maybe it's because of water concerns, could be because of noise concerns, but increasingly
it's because there just isn't any more power to supply that data center.
And so people are needing to do more with their existing data center footprint,
which is really where we come in.
The only option they have over the next couple of years is to create denser
and denser compute solutions within their existing footprint.
And we're able to provide a perfect solution for that.
So I think that's what's changed.
It's become not just an economic consideration. It's become a fundamental constraint to the business.
You know, what's interesting is I just published my sustainability report on Tech Arena and talked
to a bunch of people in the industry about that. You know, one of the things that I took away was there's estimates that the CSPs, your
earliest customers, are looking at doubling their compute capacity over the next few years to fuel
the needs for AI. You know, there's estimates and there was lots of disagreement on exactly how much
energy data centers are using. I picked the, you know, the stat of up to 30.5% of the world's energy.
And I think that one of the things that I wanted to talk to you is when you're talking
to your customers, where do you see computing going?
One thing, you know, I know they're all grappling with the question of AI.
Where do you see Amper having a role in terms of that broader compute landscape as heterogeneity comes into play?
And, you know, how do you see Greenfield versus Brownfield in that equation?
Are they going to be building out more Greenfield data centers?
Are they really just going to be managing most efficiently in Brownfield or is it a combination of two? Yeah. Yeah. All right. Good. Lots of,
lots of impact there. Yeah. So I think the AI one is a good one. Obviously that's been
actually one of the things that's fueled this conversation over the last six to nine months,
because I think, yeah, we kind of, we've seen similar stats, somewhere between 3-4% of the world's power is being used by data centers.
And AI just suddenly created this massive demand for more and more compute.
And unfortunately, today, a lot of that compute that's being used isn't that efficient.
It's a lot of really, really power-hungry are being used for the mostly for the training phase of AI.
Now, the good thing is, is that as the AI opportunity matures and we move a little further along the hype cycle and get from kind of the research and R&D phase into the real deployment at scale phase, the training piece will become a smaller
and smaller piece of the problem. Everyone's concerned right now about how they train their
multi-hundred billion or trillion parameter model. And the fastest way to train those models today is
to just throw an amazing amount of compute at them. And it's expensive and consumes a lot of power.
But the good news is you only need to train this model once or maybe just once every six
to 12 months.
So that's right.
So where Ampere comes in is as customers are looking to move from that training phase over
to the inference phase. And inferencing is inherently a scale workload, one that is being run all around the world,
close to the users.
And it's a different type of compute task, where training is one really, really large
compute task that's run once.
Inferencing is a smaller task that's run maybe even millions of times a second.
And in that case, you're looking
for a compute that can be pervasively deployed everywhere, economical, and it's also really
efficient. And so we've helped customers to move from that big GPU training phase to now
deploy their models on lots and lots of Ampere CPUs that are sitting all around the world,
that are consuming considerably less power, and that are allowing them to bring down their
inferencing costs, in some cases, by as much as 5x or more. We've got a lot of really good examples
of customers that are doing this from people like Wallaroo. I was actually just out in Europe recently,
in France at an event,
and we had a customer there named Wampy
that's flying on a French cloud service provider scale.
So there's a lot of appetite for this.
And frankly, this is a major global problem
that we need to solve because that 3.7% power consumption today, it's clearly not acceptable for that to go up to 5, 6% over the next couple of years with increased AI demand.
We need to sit at that same level, if not even shrink our power footprint over time.
And obviously, you know, one of the key components is the amount of energy a microprocessor consumes, but there are other topics that are entering the sustainability zone. One is code efficiency. The
other is carbon utilization efficiency.
What does Ampere's take on both of those?
And how are you engaging the industry to work on those challenges?
Yeah, that's a really good question.
Because, yeah, you're right.
At the start, what most people think about first when they think about power consumption,
they think about how much power is actually being consumed by operating the CPU?
And so obviously, if we can bring that down, which we are, that's a big help.
The second piece of it is, what is the amount of power that needs to be consumed to cool that processor? within a reasonable thermal envelope, running processors that only consume one, two, 300 watts,
that's a lot easier to cool
than a processor that's consuming
five, six, seven, 800 watts.
And so we can also bring down
the power consumed
for cooling the processor as well.
But you're right,
you bring up two other areas as well.
One of them is code efficiency,
and the other one is utilization of the hardware.
When we look at it from an AI perspective,
that's kind of an easy one to see
where code can be optimized to be more efficient.
When you look at these models,
say multi-hundred billion parameter models, I've
talked to a couple of companies that have trained models of that size, and they said that now that
the model is trained, you can actually determine that maybe 95% of the parameters in the model
don't have any bearing on the accuracy. And you wouldn't have known that a priori. But now that
you've trained the model, it's clear which parameters matter and which ones don't. And you wouldn't have known that a priori. But now that you've trained the model,
it's clear which parameters matter and which ones don't. And so now they can go in and they
can economize those models, make them smaller. Smaller models require less compute, less power
to run. And they can do that with no loss of accuracy. The other thing people are doing is quantizing their models. So instead of running,
say with FP32 precision, you're running in FP16 or BFloat 16 or even Int8 or Int16.
And that oftentimes results in no loss in accuracy, but requires a lot less computing power to go in and run those moths.
And that's an area where we've also been really focused with our customers.
We have native support for all of those numerical formats on our Ampere processor.
So that's definitely a piece of it, is economizing the models.
And I think you're going to see a lot of this, where those multi-hundred billion parameter
models, they exist, but many of
the new models that come out are actually smaller and smaller models that are much more
economical to run.
That's fantastic.
But also the second one, though, that you mentioned is the utilization one, which I
think is really important um it is a key reason also why using more general purpose hardware like cpus um is really powerful
especially for ai inferencing they have more than enough compute power they're readily available
around the world they're generally more efficient than than gpus and you can run them with very, very high utilization. If GPUs have two flaws in this space,
one, while you may be able to heavily utilize a GPU
on a training task,
because it's running for many, many days, weeks, months,
you can ensure that all the cycles are being well utilized.
Inferencing is different.
Inferencing is a much less dense problem.
And you're often not fully utilizing the GPU, even when you're running inferencing all the time.
And then the second aspect of it is you're probably not running inferencing all the time.
You clearly, there must be some other element of whatever service you're running, a web server,
database behind it. There's something
else there. And over time, the balance of demand between different workloads does change. CPUs give
you the complete flexibility to run any of those workloads and keep that CPU fully utilized,
whereas a GPU isn't going to be utilized when the web server demands, for instance.
So those are key parts of the efficiency story.
That's a really good point. And I think that that's so pragmatic that we forget about it,
right, Jeff? It's like, yes, you can use it for other things. Yeah, it sounds so simple.
And it is. But yeah, we forget about it because we're looking for some more complicated solutions, a shiny object out there.
And we forget that sometimes the best solutions have been sitting there right in front of us the whole time.
Now, you mentioned customer examples, and I'm so excited to hear you talk about them.
You know, Ampere has been making incredible strides in data centers.
And I was at your analyst event earlier where you had Oracle on the scene talking about development of optimized code on Ampere solutions.
There's just a tremendous amount of momentum in the marketplace for you guys.
Tell me about what 2023 has been like and where do you see Ampere headed as we pivot into forward looking into 2024?
Yeah, Alison, 2023 was a pretty pivotal year for Ampere in a number of areas.
Number one, it's really the year that sustainability became front center in the data center space.
So no longer was there any doubt as to whether power efficiency mattered.
And so the strategy that we had employed from day one to build an incredibly efficient processor
mattered more than ever in 2023.
And just the rise of compute demand from AI just further accelerated that. The second thing is that 2023 marked a big year in terms of the pervasiveness of Ampere solutions across the market. Google Cloud, Tencent, or other regional clouds like Hedster in Germany, Scaleway in France,
Gleesis in Scandinavia, LeaseWeb in Netherlands, you name it. There are clouds now around the world
that are deploying Ampere CPUs. So it's never been easier for people to gain access to this
technology. And we're seeing rapid adoption in places like enterprise private cloud using solutions from OEMs like HPE and Supermicro.
So that's been really rewarding to see that anyone now can get access to an Ampere server or an Ampere instance in the cloud.
And now we're really seeing that even expand out into the edge.
So 2023 was a really pivotal year in terms of pervasiveness of Ampere
solutions. And then I think the third thing that was really important from an Ampere perspective
and very strategic for us is that 2023 marked the year that Ampere One, our next product family,
launched. And Ampere One is particularly important to us because it's the culmination of years of engineering work
behind the scenes to build our own core
that uniquely addresses the needs of efficiency
and high performance in the cloud.
And now you're seeing Ampere One deployed
at Google Cloud and Oracle Cloud.
And it's that innovation across the whole SOC
all the way down to the core that allows us to
provide a unique solution in terms of scaling out to a higher number of cores than anyone else,
192, hitting efficiency points that just aren't possible with other architectures and other
designs, and delivering some really cool, unique features to deliver better manageability, better security, better power efficiency for
cloud users.
Features that some of them have really, really sought for a long time.
Not to go too deep on that, but a great example of this memory tagging, something that Oracle
had asked the industry for for over a decade, which allows them to both ensure that code
is working properly,
but also protects against things like buffer overflow attacks
that could affect users in the cloud.
And it's just something that doesn't exist in any CPU.
So I think across all those areas,
you know, sustainability being crazy important now,
pervasiveness of our solutions
and the software ecosystem around it.
And third, taking that next
strategic step with our next Ampere One product. 2023 was very, very important for Ampere.
That's amazing. I think that you are ahead of where I thought you would be by the end of 2023. And I can't wait to see what you guys deliver in 2024.
Please keep coming back to the tech arena and keep being surprised at the progress.
You mentioned the 192 cores.
It's an impressive feat.
I actually read that you're pushing the limits of Linux.
We are.
I love that.
Yeah, no, we're the first people
to push to 192 cores and even more.
Obviously, we need to upstream changes
to the Linux kernel
a bit before future products come out.
So yeah, if you pay attention
to that community,
you'll see that we were pushing
for the enablement
of even higher core counts in Linux.
So yeah, we're pushing the industry to some places that it just hasn't been.
Really, really exciting stuff.
I guess I know where you're going then.
That's so exciting.
I love to see it, and I love to see the pressure you're putting on the market to think differently.
That's really refreshing.
I only have one more question for you, Jeff, before I let you go.
Where can folks engage your team
and find out more about the Ampere solutions,
engage with you in a POC or trial
and talk about it at the point?
Yeah, there are a few good places.
Probably the best place to start
is to go to amperecomputing.com.
There's a number of ways to engage
through amperecomputing.com.
You can either contact our sales team or some of the other members of Ampere.
There are also links out to a lot of places where you can go and get trial hardware or get started in cloud, even for free.
There's free instances out there from various cloud providers.
The third area is our developer program.
There are ways to engage there across our developer forums
to start to learn more about what you can do with Ampere CPUs.
So a number of great places to get engaged.
And we hope to see more and more people engaged
in using Ampere processors out into 2024.
Awesome.
Thank you so much for being on the show today.
It was a real pleasure.
Well, thanks, Allison.
It's always great to be here on Tech Arena.
Keep inviting me back.
It's always a great time.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by the Tech Arena. All content is copyright by The Tech Arena.