Everyday AI Podcast – An AI and ChatGPT Podcast - EP 406: Boosting Performance - Azure's Proprietary Data Center Chips Unveiled
Episode Date: November 20, 2024Microsoft unveiled two new custom processors, Azure Integrated HSM for security and Azure Boost DPU for data processing, at their Ignite conference to enhance security and efficiency in Azure data cen...ters, positioning themselves against competitors like Nvidia and AMD. We sit down with Alistair Speirs, Senior Director, Azure Infrastructure at Microsoft to discuss.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan and Alistair questions on Microsoft AIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. New AI Tech by Microsoft at Ignite2. Detailed explanation about DPU3. Chip specialization in the tech industry4. Role of the HSM in encryption5. Future developments and challengesTimestamps:00:00 AI era's optimization progress for infrastructure delivery.03:45 AI infrastructure is becoming more specialized efficiently.08:08 Sustainable construction: Cross-laminated timber for data centers.10:29 Optimizing infrastructure for faster, efficient, sustainable services.15:21 Security and sustainability are our primary focuses.17:23 Embedded encryption enhances security and performance.20:12 Azure improves rapidly; visit youreverydayai.com.Keywords:Microsoft Ignite, Chicago, DPUs, GPUs, NPUs, Microsoft, Azure Infrastructure, data centers, Intel, AMD, NVIDIA, server architectures, silicon development, AI announcements, Azure Boost DPU, ASIC, storage operations, power usage, CPU, network card, infrastructure optimization, AI acceleration, cloud workloads, Azure services, energy consumption, renewable energy, power sources, recycling rates, server materials, data center construction.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info)
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
As soon as you think you knew what was happening with GPUs, then there's something called NPUs.
And today at Microsoft Ignite here in Chicago, a lot of us got our first taste of DPUs.
It's like you can hardly keep up.
But here's the thing with what Microsoft just announced today at the Microsoft Ignite conference,
I think things are going to be getting probably cheap.
cheaper, faster, and more secure.
It's like that triangle of three things we all want,
but you can never really get.
Don't worry, I brought in someone much smarter than me today
to help us talk about that.
So please help me welcome Alster Spears,
the senior director of Azure infrastructure at Microsoft.
Thank you so much for joining us to Joe.
Thanks, Jordan.
All right, cool.
So can you tell us a little bit first
about what you do at Microsoft?
What is your role?
Yeah.
I work on the global infrastructure team,
and our job is really to manage the data
centers, growing capacity, the worldwide footprint of our cloud infrastructure, from a macro lens
of where to build, the power we need, the market kind of forces that are driving, like the relationships
with our intel, AMDs, and Vidias, and building up that infrastructure. And then, of course,
what goes inside the server as well, so that is the server architectures and some of our own
first-party silicon development work that we do as well. Yeah. Can you tell us a little bit what was
announced today because there was a lot, you know, there was chips, there was HSMs, DPUs,
there was so much announced today. I think they said it was like more than 90 different like
AI announcements and I was trying to keep up. But at least on your side, what was actually
announced, you know, with these new DPU chips and the HSM chips? Yeah, I think, you know,
and Satcha said it really interestingly is the middle innings of this era of AI, right? And so
it was almost a progress report and all the work that we're doing to optimize this infrastructure
and optimize the capabilities to deliver AI to more and more people.
So if I think of optimization, every layer of the stack,
it's not just what we're doing in software,
it's not just what we're doing in like data center facilities,
but the hardware and then the specific types of hardware
that make up this data center kind of platform.
We talked about optimizing specific ones there as well.
So first, the Azure Boost DPU.
So DPU is a special type of processor
for what we call an ASIC or an application-specific integrated circuit.
And this chip is designed to do one thing and one thing well,
and that's storage operations.
So you can think of it as taking the place of a CPU, a network card,
other infrastructure that goes on in a traditional server,
replaces it with one card that just has one job,
which is to kind of read data from hard drives and push it out over the network.
And by doing it and by integrating that into this really single-purpose chip, we're able to get a lot more efficiencies out of that chip, use less power to run that operation as well.
So could you kind of help put the last couple of years into perspective, right?
Because I think when the chat GPT wave, you know, 2022 came and, you know, then we've had, you know, co-pilot now generally available for more than a year.
But was it kind of like you just had chips doing all different jobs and maybe they weren't too efficient?
And that's kind of why in what's led us to now having these NPUs for edge AI and in the DPUs.
Is that kind of what's happening?
Essentially, you're seeing a lot more efficiencies out of this.
You know, the early models will run, you know, brute force infrastructure with CPUs, GPs, whatever we can get our hands on.
Like, let's run these models and try and train and infer these AI models on that.
infrastructure. Over time, we optimize the models a little bit more and get better at the software.
As new chips become available, new capabilities become available, we start to fine tune the infrastructure
to really focus on this kind of emerging category of AI-type apps. And that's really what you're
seeing here with new GPUs that, you know, graphics processing unit, or the full name was GPGPU,
a general purpose graphic processing unit. Right now they're getting more and more specialized in
just that AI acceleration workload as well.
So they're almost becoming more fit for purpose
rather than these general purpose computing devices.
They're really targeted to particular use cases.
And then if you think about the scale we operate in
in the Microsoft Cloud,
as that infrastructure gets bigger,
as the scale gets bigger,
then there's more opportunity to optimize
for a very targeted workload like AI
or like storage or like security as well.
Okay.
And for the dorks of us out there, I'm a little dorky, but what are some of the specs, like as an example on the DPU?
I think I read, is it like 4x more efficient?
Something like that?
So, you know, this DPU is really looking to run about 100 watts per server.
And you think about that, the traditional data sender server with like a traditional CPU, traditional network card, it's probably drawing about 400 watts.
And so, you know, a 4x reduction in power is super important.
for us, especially in this world of like GPUs and AI accelerators, you know, doing more
and more power draw as they grow as well.
That's a big thing to think about because I think there's a couple of things.
So you have your, you know, your bigger companies that have to worry about where are they
getting their power for AI, right?
And then there's also the even bigger picture, right, the environmental impact.
So how does, you know, some of what's announced, that was announced today at Ignite.
go to address both of those things.
So how is this going to impact companies having access to DPUs that are four times more energy efficient?
And then what does that mean?
The bigger picture, like the environment.
Yeah, that's a great point.
The cloud's always been designed on this economy of scale efficiencies.
And so working in a cloud environment is generally far more energy efficient than running an on-premises server.
and we're building
fit for purpose buildings
that are really designed
for servers,
for data centers
and for these cloud workflows.
So as you think about
building something like
AI or an Azure service
or M365
or something like that,
it's not just a software problem anymore.
It's a software problem
and the software engineers
need to talk to the hardware engineers,
the hardware engineers need to talk
to the power engineers,
the data engineers,
the construction engineers.
even the sustainability engineers to understand the full footprint of what they need.
And so as we build all this new infrastructure, optimizing the power draw of particular servers
is important.
Optimizing the power sources that we use, we sourced about 34 gigawatts of renewable energy
around the world through power purchase agreements.
So that's essentially securing renewable energy to continue to run these operations in the future.
So kind of forward planning that and targeting those contracts are important as well.
And then, of course, if you zero out your carbon emissions from power, what's left is carbon emissions from construction, building materials, server materials as well.
So by building our own infrastructure and servers, we're able to improve our recycling rates.
We're able to define the types of components that we use in that device.
can reduce landfill by reusing these components rather than throwing them out.
And then on the data center construction side, we have opportunities for lower carbon-intensive
building materials as well.
And one of the ones that we talked about today was cross-laminated timber.
And this sounds like a crazy idea, but building a data center out of wood.
And it turns out, like, wood has been used in construction for a long time.
But the innovation with cross-laminated timber is to essentially build these.
wider planks of layered timber. It's laminated together and is essentially much lighter,
much easier to work with, and has less embodied carbon than say a concrete or steel construction
as well. So having like lighter frames that are as strong as a steel or concrete construction
means that you need less concrete from the foundation, less easy to move, less equipment,
less logistics costs.
And so all of those things add into essentially what we call our scope three emissions.
So that's the emissions caused by our broader operations as well.
Y'all are in for a treat because you thought you were tuning in to learn about,
you know, Azure and DPU chips, which we are.
But I think the environmental piece is huge because I think sometimes big tech gets a bad rap, right?
They're like, oh, you know, AI and it's so, you know, costly and inefficient and all these things.
but I mean, here you are just explaining,
even going down to the materials
that you use to build these centers.
That is the system, right?
When we think about the system,
it's not a software system,
it's not a software plus hardware system.
It is an ecosystem that guides down
all the way from the construction,
and then, of course, the water,
the power usage,
and how that infrastructure runs
in its environment,
in the built environment,
in the natural environment as well.
Yeah, okay, so let's maybe take it back
back to the office, back to, yeah.
So, I mean, what is this ultimately going to mean, number one?
Number two, when our customer is going to be able to get their hands on things that were announced today or did they already?
And then, you know, both I want to look at it through the scope of, you know, the more technical people a little bit, right?
Even though that's not always our audience.
But, you know, what does it mean for people in the IT department for big enterprise companies?
And then what does it mean for everyone else?
Are they going to be noticing all of a sudden they're working with, you know, everything in Azure's,
much faster or like overnight or how is this role I'm going to work for those different groups?
Yeah, I mean, the general problem is it just gets better.
That's, I think, what we're really trying to do here.
So as we do these optimizations at kind of a component level and other things,
they're sort of out of sight, right?
They're running in a data center somewhere else,
so you don't necessarily see the change in like architecture or server design or rack design.
But you'll see the benefits.
You'll see the services being faster,
the services being more integrated, the services being more cost-effective to run as well.
On the other side, being able to control out energy draw and being more optimized
means that essentially we're placing less burden on the grids that we operate in as well.
So that has a benefit for every consumer of energy on a particular energy grid as well.
Yeah, the energy thing, yeah, because I think we just shared in our newsletter last week,
something like there's going to be 40% of data centers we're going to be facing,
power struggles by 2025 and here we are at the end of 2024.
So this was timely news, right?
Yeah.
And for us, like, long-term sourcing of that power is really important to us as well.
We want to make sure we have stable operations for years to come.
And so providing like a demand signal to our utility providers is really important.
And so these long-term agreements, some of them, five years, some of them, 10 years,
some of them even longer help us kind of guarantee a certain amount of new energy,
is joining the grid so that when as we need it, we have the energy there and we don't become
like a burden on the overall energy groups themselves. Yeah. And you know what? This is helpful for me too
as a non-technical person. I was sitting in the keynote today or you know by the time that
everyone hears this yesterday and you know everyone was clapping when this came out and I'm like okay
but now I understand it so much more because of its impact that it makes really just all across
all across the enterprise.
But, you know, I'm curious because it's almost like,
it seems like one of those things that's not like too good to be true,
but it's faster, it's cheaper, it's more secure.
So what are some of the challenges, right?
Like as you guys continue this work and continue the development.
Yeah.
You know, the thing with the, as we get lower into the stack,
into the infrastructure, being wrong with software is bad.
You have a bug.
You got to fix the bug.
You can generally fix software relative.
fast. Being wrong in hardware is more expensive, right? So you want to catch those bugs earlier
in the cycle, preferably before you've even made the chip. It's very hard to replace the chip
once it's in production and out there as well. And then being wrong in construction is another
problem as well. It's multi-year complex infrastructure projects to building in the wrong place
or building in areas that won't have the energy needs or the energy supply that you need in five or
10 years is a challenge as well.
So as we get closer to the infrastructure and the physicality of the cloud, being accurate,
longer term, is really important for us.
And so that's the constant pressure between like what can be fungible.
Software can be fungible, right?
We can swap out one software for another software.
What can be a little bit more fungible?
Hardware.
We can swap out one hardware for another type of hardware.
It's a little harder, but both use electricity.
then as you get down to like space facilities, square footage, power supply, water usage,
all these other things, there's less flexibility there.
So we spend a lot more time on making sure we're right and we're in the right places
that we expect the demand to be in the future as well.
So you kind of talked about some of the challenges that the rest of us may not see.
So what are you all focused on now?
Because at least to me when I see this, I'm like, oh, you know, this checks off.
all the boxes, you guys have made it. But what are the next challenges, you know, from the
Azure side, from the, you know, processing side? What are the next big challenges that you need
tackle? And maybe not just Microsoft, but the industry at whole, like what is the next
couple of hurdles that maybe, not the things that keep you up at night, but the things that
you're still actively working on? Yeah. It's a great question. What keeps me up at night?
A couple of things are really paramount right now.
It's like security.
How do we build security into every layer of this stack as well?
And you can think about software security.
What we announced today are new HSM.
So that's a new hardware security module that will go in all our servers going forward.
And you think about network security, perimeter security.
Quantum technologies are coming as well.
And one of the scary parts of quantum is just how fast it can work
and how it can cut through traditional cryptography like butter,
which, again, not a good thing.
And so building in, like, quantum cryptography protections
into our platform now is really important to us as well.
So that security layer of making sure that we can run
this trusted, secure platform for all of our customers
is really important for us to focus on.
Then secondly, I'd say, is the environmental impact.
We've set some bold goals around being a,
carbon negative, being water positive, being zero waste.
And so all of that really translates to how we build an architect,
this data center infrastructure and this server infrastructure as well.
And part of that is also working with our community,
working with our broader supply chain,
working with our energy providers and others to integrate into these grids.
And as you think about renewable energy, wind and solar,
when the wind's blowing, the sun's shining, like the power price fluctuates.
unlike a traditional factory,
the data center is kind of software defined.
So we have fairly granular controls
of when we use energy,
when we sip, when we slurp from the grid, if you like.
And so being, again,
fungible with all of these workloads,
being able to move them around our global infrastructure
is something that's really interesting as well,
to really take advantage and support
kind of renewable energy.
Yeah, it's possible as well.
Yes, I feel we focus a lot on the DPU
and maybe scaped over the HSM,
a little bit, but can you explain a little bit more? How does that work and how does it actually
make operations more secure? Yeah. Great question. So let me give you the 101 on cloud cryptography.
So HSM, a hardware security module, is traditionally being kind of an appliance, a separate
rack that does nothing but generate cryptopathy's so that use for encryption and decryption.
And these things are generally really large prime numbers.
If you want to really dump it down,
you can think of it as a really expensive random number generator
that will give you kind of uncrackable keys
or very hard to crack keys.
And so in that model, you have this separate appliance.
You use it when you have something really important to secure.
So you'll talk to this machine,
you'll get back your cryptography keys,
you'll do the encryption,
you move the data around.
Someone will decrypt it.
They'll talk back to that HSM machine.
as well. As you think about a world where you want more security, more defense in depth,
more layers of this, essentially in a world where every transaction, every message, every API
call, every database read will be encrypted, you can have more and more traffic on that.
So moving that encryption decryption off a dedicated device, or maybe still there on a dedicated
device for specific use cases, but embedding it in the server allows you to do it much fun.
It allows you to even have scenarios where you may not even completely trust all of the components on the server.
You'd have transaction between two processes on the same server encrypted between the traffic or something like that.
And this is called ephemeral key cryptography, where you may be just encrypting and decrypting transactions.
It might just last seconds or milliseconds, but you're just essentially creating that secure chain between that transaction for the last.
transaction for the life of that transaction, but it may be ephemeral. It's there for that transaction
in memory, secure, and then gone away after that. So these sorts of chips support that kind of
next level of security that we're really looking to bring. I feel so much more now. So,
so Alistair, I mean, we've covered a lot. We've talked about some of the new announcements
today from Microsoft at the Ignite conference here in Chicago. We talked about HSSs. We've talked about
DPUs, cryptography. We've talked about water cooling.
the environment. Timber, my gosh, everything. But maybe what is the one most important takeaway
that you want the average business leader listening in on what this actually means for them
and their business moving forward? Yeah, I think the key takeaway here is that this cloud technology
and this AI technology is rapidly becoming more and more optimized. And our vision is that every
application, every user, every workload, every business is going to be using these AI capabilities.
And to be kind of democratized or commoditized and just be part of the fabric of how we do business is going to require like a new type of infrastructure, a new type of architecture that we're building out.
We need to build it out in a way that it will scale to every business, every user.
And that's in line with our mission at Microsoft, right?
It's what every user and every person on the planet, every person or organization on the planet to achieve more.
Right.
And so that scale out model is really what we're thinking about with all.
all of these decisions across software, hardware, all the way down to physical space plan.
Wow.
I think that was an extremely impressive and important way to wrap up today's show.
So, hey, audience, next time you're out there, don't take what's happening under the hood at Azure for granted.
There's a lot going on, and it's only going to get apparently faster, more secure, and cheaper.
So thank you so much for tuning in.
make sure if you haven't already go to our website at your everyday AI.com.
Alistair just dropped a whole bunch of knowledge on our heads.
We're going to be breaking it down for you in our newsletter.
So make sure you go check that out.
Thank you for you back tomorrow.
And every day for more everyday AI.
Thanks, y'all.
Thanks.
Meet Firefly AI Assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
