Everyday AI Podcast – An AI and ChatGPT Podcast - EP 406: Boosting Performance - Azure's Proprietary Data Center Chips Unveiled

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. As soon as you think you knew what was happening with GPUs, then there's something called NPUs.

Starting point is 00:00:51 And today at Microsoft Ignite here in Chicago, a lot of us got our first taste of DPUs. It's like you can hardly keep up. But here's the thing with what Microsoft just announced today at the Microsoft Ignite conference, I think things are going to be getting probably cheap. cheaper, faster, and more secure. It's like that triangle of three things we all want, but you can never really get. Don't worry, I brought in someone much smarter than me today

Starting point is 00:01:20 to help us talk about that. So please help me welcome Alster Spears, the senior director of Azure infrastructure at Microsoft. Thank you so much for joining us to Joe. Thanks, Jordan. All right, cool. So can you tell us a little bit first about what you do at Microsoft?

Starting point is 00:01:34 What is your role? Yeah. I work on the global infrastructure team, and our job is really to manage the data centers, growing capacity, the worldwide footprint of our cloud infrastructure, from a macro lens of where to build, the power we need, the market kind of forces that are driving, like the relationships with our intel, AMDs, and Vidias, and building up that infrastructure. And then, of course, what goes inside the server as well, so that is the server architectures and some of our own

Starting point is 00:02:03 first-party silicon development work that we do as well. Yeah. Can you tell us a little bit what was announced today because there was a lot, you know, there was chips, there was HSMs, DPUs, there was so much announced today. I think they said it was like more than 90 different like AI announcements and I was trying to keep up. But at least on your side, what was actually announced, you know, with these new DPU chips and the HSM chips? Yeah, I think, you know, and Satcha said it really interestingly is the middle innings of this era of AI, right? And so it was almost a progress report and all the work that we're doing to optimize this infrastructure and optimize the capabilities to deliver AI to more and more people.

Starting point is 00:02:42 So if I think of optimization, every layer of the stack, it's not just what we're doing in software, it's not just what we're doing in like data center facilities, but the hardware and then the specific types of hardware that make up this data center kind of platform. We talked about optimizing specific ones there as well. So first, the Azure Boost DPU. So DPU is a special type of processor

Starting point is 00:03:06 for what we call an ASIC or an application-specific integrated circuit. And this chip is designed to do one thing and one thing well, and that's storage operations. So you can think of it as taking the place of a CPU, a network card, other infrastructure that goes on in a traditional server, replaces it with one card that just has one job, which is to kind of read data from hard drives and push it out over the network. And by doing it and by integrating that into this really single-purpose chip, we're able to get a lot more efficiencies out of that chip, use less power to run that operation as well.

Starting point is 00:03:46 So could you kind of help put the last couple of years into perspective, right? Because I think when the chat GPT wave, you know, 2022 came and, you know, then we've had, you know, co-pilot now generally available for more than a year. But was it kind of like you just had chips doing all different jobs and maybe they weren't too efficient? And that's kind of why in what's led us to now having these NPUs for edge AI and in the DPUs. Is that kind of what's happening? Essentially, you're seeing a lot more efficiencies out of this. You know, the early models will run, you know, brute force infrastructure with CPUs, GPs, whatever we can get our hands on. Like, let's run these models and try and train and infer these AI models on that.

Starting point is 00:04:31 infrastructure. Over time, we optimize the models a little bit more and get better at the software. As new chips become available, new capabilities become available, we start to fine tune the infrastructure to really focus on this kind of emerging category of AI-type apps. And that's really what you're seeing here with new GPUs that, you know, graphics processing unit, or the full name was GPGPU, a general purpose graphic processing unit. Right now they're getting more and more specialized in just that AI acceleration workload as well. So they're almost becoming more fit for purpose rather than these general purpose computing devices.

Starting point is 00:05:09 They're really targeted to particular use cases. And then if you think about the scale we operate in in the Microsoft Cloud, as that infrastructure gets bigger, as the scale gets bigger, then there's more opportunity to optimize for a very targeted workload like AI or like storage or like security as well.

Starting point is 00:05:29 Okay. And for the dorks of us out there, I'm a little dorky, but what are some of the specs, like as an example on the DPU? I think I read, is it like 4x more efficient? Something like that? So, you know, this DPU is really looking to run about 100 watts per server. And you think about that, the traditional data sender server with like a traditional CPU, traditional network card, it's probably drawing about 400 watts. And so, you know, a 4x reduction in power is super important. for us, especially in this world of like GPUs and AI accelerators, you know, doing more

Starting point is 00:06:05 and more power draw as they grow as well. That's a big thing to think about because I think there's a couple of things. So you have your, you know, your bigger companies that have to worry about where are they getting their power for AI, right? And then there's also the even bigger picture, right, the environmental impact. So how does, you know, some of what's announced, that was announced today at Ignite. go to address both of those things. So how is this going to impact companies having access to DPUs that are four times more energy efficient?

Starting point is 00:06:39 And then what does that mean? The bigger picture, like the environment. Yeah, that's a great point. The cloud's always been designed on this economy of scale efficiencies. And so working in a cloud environment is generally far more energy efficient than running an on-premises server. and we're building fit for purpose buildings that are really designed

Starting point is 00:07:03 for servers, for data centers and for these cloud workflows. So as you think about building something like AI or an Azure service or M365 or something like that,

Starting point is 00:07:14 it's not just a software problem anymore. It's a software problem and the software engineers need to talk to the hardware engineers, the hardware engineers need to talk to the power engineers, the data engineers, the construction engineers.

Starting point is 00:07:26 even the sustainability engineers to understand the full footprint of what they need. And so as we build all this new infrastructure, optimizing the power draw of particular servers is important. Optimizing the power sources that we use, we sourced about 34 gigawatts of renewable energy around the world through power purchase agreements. So that's essentially securing renewable energy to continue to run these operations in the future. So kind of forward planning that and targeting those contracts are important as well. And then, of course, if you zero out your carbon emissions from power, what's left is carbon emissions from construction, building materials, server materials as well.

Starting point is 00:08:13 So by building our own infrastructure and servers, we're able to improve our recycling rates. We're able to define the types of components that we use in that device. can reduce landfill by reusing these components rather than throwing them out. And then on the data center construction side, we have opportunities for lower carbon-intensive building materials as well. And one of the ones that we talked about today was cross-laminated timber. And this sounds like a crazy idea, but building a data center out of wood. And it turns out, like, wood has been used in construction for a long time.

Starting point is 00:08:50 But the innovation with cross-laminated timber is to essentially build these. wider planks of layered timber. It's laminated together and is essentially much lighter, much easier to work with, and has less embodied carbon than say a concrete or steel construction as well. So having like lighter frames that are as strong as a steel or concrete construction means that you need less concrete from the foundation, less easy to move, less equipment, less logistics costs. And so all of those things add into essentially what we call our scope three emissions. So that's the emissions caused by our broader operations as well.

Starting point is 00:09:34 Y'all are in for a treat because you thought you were tuning in to learn about, you know, Azure and DPU chips, which we are. But I think the environmental piece is huge because I think sometimes big tech gets a bad rap, right? They're like, oh, you know, AI and it's so, you know, costly and inefficient and all these things. but I mean, here you are just explaining, even going down to the materials that you use to build these centers. That is the system, right?

Starting point is 00:09:58 When we think about the system, it's not a software system, it's not a software plus hardware system. It is an ecosystem that guides down all the way from the construction, and then, of course, the water, the power usage, and how that infrastructure runs

Starting point is 00:10:12 in its environment, in the built environment, in the natural environment as well. Yeah, okay, so let's maybe take it back back to the office, back to, yeah. So, I mean, what is this ultimately going to mean, number one? Number two, when our customer is going to be able to get their hands on things that were announced today or did they already? And then, you know, both I want to look at it through the scope of, you know, the more technical people a little bit, right?

Starting point is 00:10:37 Even though that's not always our audience. But, you know, what does it mean for people in the IT department for big enterprise companies? And then what does it mean for everyone else? Are they going to be noticing all of a sudden they're working with, you know, everything in Azure's, much faster or like overnight or how is this role I'm going to work for those different groups? Yeah, I mean, the general problem is it just gets better. That's, I think, what we're really trying to do here. So as we do these optimizations at kind of a component level and other things,

Starting point is 00:11:06 they're sort of out of sight, right? They're running in a data center somewhere else, so you don't necessarily see the change in like architecture or server design or rack design. But you'll see the benefits. You'll see the services being faster, the services being more integrated, the services being more cost-effective to run as well. On the other side, being able to control out energy draw and being more optimized means that essentially we're placing less burden on the grids that we operate in as well.

Starting point is 00:11:37 So that has a benefit for every consumer of energy on a particular energy grid as well. Yeah, the energy thing, yeah, because I think we just shared in our newsletter last week, something like there's going to be 40% of data centers we're going to be facing, power struggles by 2025 and here we are at the end of 2024. So this was timely news, right? Yeah. And for us, like, long-term sourcing of that power is really important to us as well. We want to make sure we have stable operations for years to come.

Starting point is 00:12:05 And so providing like a demand signal to our utility providers is really important. And so these long-term agreements, some of them, five years, some of them, 10 years, some of them even longer help us kind of guarantee a certain amount of new energy, is joining the grid so that when as we need it, we have the energy there and we don't become like a burden on the overall energy groups themselves. Yeah. And you know what? This is helpful for me too as a non-technical person. I was sitting in the keynote today or you know by the time that everyone hears this yesterday and you know everyone was clapping when this came out and I'm like okay but now I understand it so much more because of its impact that it makes really just all across

Starting point is 00:12:47 all across the enterprise. But, you know, I'm curious because it's almost like, it seems like one of those things that's not like too good to be true, but it's faster, it's cheaper, it's more secure. So what are some of the challenges, right? Like as you guys continue this work and continue the development. Yeah. You know, the thing with the, as we get lower into the stack,

Starting point is 00:13:09 into the infrastructure, being wrong with software is bad. You have a bug. You got to fix the bug. You can generally fix software relative. fast. Being wrong in hardware is more expensive, right? So you want to catch those bugs earlier in the cycle, preferably before you've even made the chip. It's very hard to replace the chip once it's in production and out there as well. And then being wrong in construction is another problem as well. It's multi-year complex infrastructure projects to building in the wrong place

Starting point is 00:13:39 or building in areas that won't have the energy needs or the energy supply that you need in five or 10 years is a challenge as well. So as we get closer to the infrastructure and the physicality of the cloud, being accurate, longer term, is really important for us. And so that's the constant pressure between like what can be fungible. Software can be fungible, right? We can swap out one software for another software. What can be a little bit more fungible?

Starting point is 00:14:07 Hardware. We can swap out one hardware for another type of hardware. It's a little harder, but both use electricity. then as you get down to like space facilities, square footage, power supply, water usage, all these other things, there's less flexibility there. So we spend a lot more time on making sure we're right and we're in the right places that we expect the demand to be in the future as well. So you kind of talked about some of the challenges that the rest of us may not see.

Starting point is 00:14:37 So what are you all focused on now? Because at least to me when I see this, I'm like, oh, you know, this checks off. all the boxes, you guys have made it. But what are the next challenges, you know, from the Azure side, from the, you know, processing side? What are the next big challenges that you need tackle? And maybe not just Microsoft, but the industry at whole, like what is the next couple of hurdles that maybe, not the things that keep you up at night, but the things that you're still actively working on? Yeah. It's a great question. What keeps me up at night? A couple of things are really paramount right now.

Starting point is 00:15:14 It's like security. How do we build security into every layer of this stack as well? And you can think about software security. What we announced today are new HSM. So that's a new hardware security module that will go in all our servers going forward. And you think about network security, perimeter security. Quantum technologies are coming as well. And one of the scary parts of quantum is just how fast it can work

Starting point is 00:15:39 and how it can cut through traditional cryptography like butter, which, again, not a good thing. And so building in, like, quantum cryptography protections into our platform now is really important to us as well. So that security layer of making sure that we can run this trusted, secure platform for all of our customers is really important for us to focus on. Then secondly, I'd say, is the environmental impact.

Starting point is 00:16:04 We've set some bold goals around being a, carbon negative, being water positive, being zero waste. And so all of that really translates to how we build an architect, this data center infrastructure and this server infrastructure as well. And part of that is also working with our community, working with our broader supply chain, working with our energy providers and others to integrate into these grids. And as you think about renewable energy, wind and solar,

Starting point is 00:16:34 when the wind's blowing, the sun's shining, like the power price fluctuates. unlike a traditional factory, the data center is kind of software defined. So we have fairly granular controls of when we use energy, when we sip, when we slurp from the grid, if you like. And so being, again, fungible with all of these workloads,

Starting point is 00:16:54 being able to move them around our global infrastructure is something that's really interesting as well, to really take advantage and support kind of renewable energy. Yeah, it's possible as well. Yes, I feel we focus a lot on the DPU and maybe scaped over the HSM, a little bit, but can you explain a little bit more? How does that work and how does it actually

Starting point is 00:17:14 make operations more secure? Yeah. Great question. So let me give you the 101 on cloud cryptography. So HSM, a hardware security module, is traditionally being kind of an appliance, a separate rack that does nothing but generate cryptopathy's so that use for encryption and decryption. And these things are generally really large prime numbers. If you want to really dump it down, you can think of it as a really expensive random number generator that will give you kind of uncrackable keys or very hard to crack keys.

Starting point is 00:17:52 And so in that model, you have this separate appliance. You use it when you have something really important to secure. So you'll talk to this machine, you'll get back your cryptography keys, you'll do the encryption, you move the data around. Someone will decrypt it. They'll talk back to that HSM machine.

Starting point is 00:18:07 as well. As you think about a world where you want more security, more defense in depth, more layers of this, essentially in a world where every transaction, every message, every API call, every database read will be encrypted, you can have more and more traffic on that. So moving that encryption decryption off a dedicated device, or maybe still there on a dedicated device for specific use cases, but embedding it in the server allows you to do it much fun. It allows you to even have scenarios where you may not even completely trust all of the components on the server. You'd have transaction between two processes on the same server encrypted between the traffic or something like that. And this is called ephemeral key cryptography, where you may be just encrypting and decrypting transactions.

Starting point is 00:18:58 It might just last seconds or milliseconds, but you're just essentially creating that secure chain between that transaction for the last. transaction for the life of that transaction, but it may be ephemeral. It's there for that transaction in memory, secure, and then gone away after that. So these sorts of chips support that kind of next level of security that we're really looking to bring. I feel so much more now. So, so Alistair, I mean, we've covered a lot. We've talked about some of the new announcements today from Microsoft at the Ignite conference here in Chicago. We talked about HSSs. We've talked about DPUs, cryptography. We've talked about water cooling. the environment. Timber, my gosh, everything. But maybe what is the one most important takeaway

Starting point is 00:19:42 that you want the average business leader listening in on what this actually means for them and their business moving forward? Yeah, I think the key takeaway here is that this cloud technology and this AI technology is rapidly becoming more and more optimized. And our vision is that every application, every user, every workload, every business is going to be using these AI capabilities. And to be kind of democratized or commoditized and just be part of the fabric of how we do business is going to require like a new type of infrastructure, a new type of architecture that we're building out. We need to build it out in a way that it will scale to every business, every user. And that's in line with our mission at Microsoft, right? It's what every user and every person on the planet, every person or organization on the planet to achieve more.

Starting point is 00:20:30 Right. And so that scale out model is really what we're thinking about with all. all of these decisions across software, hardware, all the way down to physical space plan. Wow. I think that was an extremely impressive and important way to wrap up today's show. So, hey, audience, next time you're out there, don't take what's happening under the hood at Azure for granted. There's a lot going on, and it's only going to get apparently faster, more secure, and cheaper. So thank you so much for tuning in.

Starting point is 00:21:01 make sure if you haven't already go to our website at your everyday AI.com. Alistair just dropped a whole bunch of knowledge on our heads. We're going to be breaking it down for you in our newsletter. So make sure you go check that out. Thank you for you back tomorrow. And every day for more everyday AI. Thanks, y'all. Thanks.

Starting point is 00:21:26 Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com.

Starting point is 00:21:56 And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 406: Boosting Performance - Azure's Proprietary Data Center Chips Unveiled

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.