In The Arena by TechArena - Sprinting Towards an AI Future with Microsoft’s Nidhi Chappell
Episode Date: June 11, 2024TechArena host Allyson Klein chats with Microsoft’s Vice President of Azure AI and HPC Infrastructure, Nidhi Chappell, in advance of Microsoft Build 2024. Nidhi shares how her organization is accele...rating deployments of critical technology to fuel the insatiable demand for AI around the world and how Microsoft’s AI tools including co-pilot, Open AI and more have been met with overwhelming engagement from developers. She also talks about Microsoft’s silicon plans and strategic collaborations with NVIDIA and AMD.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Allison Klein.
Now let's step into the arena. Welcome to the tech arena. My name is Allison Klein, and I am so excited because
we've got Nidhi Chappell back in studio with us, VP of Azure and AI infrastructure at Microsoft.
Welcome to the program, Nidhi. How are you doing?
Yeah, I'm doing well. Thank you, Alison, for having me again. I really enjoyed the last time
and I look forward to a good conversation this time. So Nidhi, the last time we chatted,
we talked about your work in detail on delivering the infrastructure for ChatGPT. And I think it
was one of my favorite episodes of 2023, if I have to tell you the truth.
But a lot has happened since then. And the speed of innovation is just stunning in terms of how
the industry is delivering on generative AI. Can you just bring us up to speed on where Microsoft
is today? Yeah, no, I'm glad to hear that our conversation added value to the community here.
So I think, again, a lot is happening in this community.
Even in the last, you know, I would say year plus, we have also diversified our own portfolio, right?
We have Microsoft AI Studio.
While we do offer Azure OpenAI that has the latest and greatest from OpenAI,
we're also seeing other companies like Meta announcing Lama. do offer Azure OpenAI that has the latest and greatest from OpenAI.
We're also seeing other companies like Meta announcing Lama.
We have Mistral coming on board.
We have our own models with FI.
And it's been really interesting to bring all of those models, be able to train and then, you know, be deployed at scale for commercial customers through Azure. So it's just been great to see that there is a wide variety of use cases that are using
these models and a wide variety of models that are coming in.
And ultimately, our job is to make sure that customers can use infra to actually train
these models quickly and then be able to scale these models for any
kind of enterprise use cases. So we are definitely seeing a lot of adoption across all of those. So
that's been very exciting. Now, Microsoft has been walking the talk by integrating the Copilot
features across the suite of applications. What has the response from customers been on
co-pilot? Oh, it's been phenomenal. I think I don't have some of the statistics on hand. I think
Amy mentioned some of them in our quarterly update. But the last time we had said was like
Azure OpenAI, which is the underpinning for a lot of our co-pilots. And then, you know,
we also offer it is actually one of the fastestpinning for a lot of our co-pilots. And then, you know, we also offer it.
It's actually one of the fastest growing service in the history of Azure.
And like that just tells you how quickly these things are growing.
And like, frankly, just from my point of view, I use co-pilot like all the time now.
And to me, it just seems like inevitable that people are going to get used to having Copilot as your co-assistant in some regard.
So we are definitely seeing a lot of great use cases emerge, a lot of enterprises adopting it at scale.
And then we're also seeing Azure OpenAI, which is not just a Copilot, but like, you know, the models as a service being adopted
pretty significantly too.
You know, it's interesting when you said that it's very true.
It's just something that's so integrated at this point into the way that you use the
application.
And it felt effortless in terms of the way that at least my adoption curve went.
And now I use it every day.
And I think that is the right
way of doing it right you don't want to have a big learning curve for this right this should be
something that is pretty and it's interesting because I think it has also made all of us into
prompt engineers like you know I I often do like a little bit of prompt engineering on that to say
oh I didn't like this.
How about like I give you some more context.
So in some regards, it's actually made all of us into, you know,
rag experts, prompt engineers, which is so silly to believe.
But, you know, with just a little bit of that effort,
we can now do a lot of things far more efficiently.
So, yeah, the integration has been really smooth.
And I think that's one of the reasons we are seeing such a good uptick on this.
Now, I do have to ask this question. One of the things that we talked about in last interview,
and I want to keep checking back in it, all of these AI models that you just mentioned are
hitting mainstream. We've also seen some unintended impacts of models reflecting human bias, maybe making some wrong calls in terms of bringing things out for companies in some unintended ways.
How is Microsoft learning from this and balancing the need to deliver market-leading innovation at the pace that companies are demanding with ensuring solutions are reflected
of the best of human thought?
Yeah, I'm glad you asked that question.
And I'm not an expert in responsible AI,
but I would just say one of the things
that is very strongly woven into the ethos at Microsoft
is security and trust, right?
You know, this reflects in the way we think about products, we bring products.
So when we started the Azure Open AI as a service, you know, one of the key tenant of that was that it would be responsible.
We'll actually think about the responsible AI part of it. And there is a whole team that, you know me is focusing on responsible ai from day one
allows you to actually catch this earlier versus putting it in the market and then finding out and
then fixing it kind of a thing right so that has been really interesting the other thing i would
just say is uh while you didn't ask me about this one, the other thing that a lot of customers care about
is how my data would be secure in all of this.
So there is a responsible part of it,
like, hey, I wanted to not do something stupid,
but I also don't want it to get exposed to data
that I didn't want it to
or use my data for unintended consequences.
And that's another big tenant of all of this
is to make sure that the customer is
always in control of what data is being used and how. And I think those are big, big pillars for
the growth of AI, right? How you control it, what outcome it would have and how ethical and
responsible it would be. So that's fantastic. The pressure for more training of large models goes unabated.
And, you know, one question that I have for you, I mean, you've got one of the most enviable
data center complexes in the world.
How do you keep up with the demand, given how much training across all of those models
is required?
Yeah, it's definitely an interesting challenge to be had, right?
You know, we are seeing unprecedented growth.
And a lot of this is coming down to how quickly you can bring power online.
You can bring data center online.
Internally, sometimes we joke that, you know, Microsoft has become a big construction company.
Like the number of construction people we have to hire is just phenomenal but i think i think this is one of the
the privilege uh of being at the epicenter of all of this is that you get to forecast how quickly
this will grow and so in some regards you know we are ahead of the curve in like planning for data center, planning for power, planning for like what would come globally. And, you know, we do still find ourselves sometimes behind the A ball. But in general, a lot of planning goes into what kind of data centers would we need? What is the bare minimum we would need to have in those data centers? Where in the world could we have access to those data centers?
And where in the world do we want to put these data centers, right?
So it's definitely a constantly evolving situation.
And one that I would say is very near and dear to everybody's focus.
So definitely have a lot of focus going on internally on that one too.
Now, one of the things that I think about is the importance of collaborations with ecosystem players.
And one of them that we saw play out and you talked about on stage was at GTC where you announced an enormous collaboration with NVIDIA.
Can you give some context about that and why that's such a strategic collaboration for
you? Yeah, you know, NVIDIA has been a great partner with us for a long period of time,
right? You know, we started off, I would say, as more of a customer for them, like, you know,
traditional, here's your roadmap, let us know what we can buy and stuff. And we very quickly moved
into a space where we were jointly defining what kind of GPUs we need, what kind of memory architecture we need, what kind of memory bandwidth we would need, what kind of backend network we would need for this.
So we've gone from, we'll just buy whatever NVIDIA offers to actually help shape the roadmap, actually help get every ounce of performance at scale with them, right?
And this is where there's a lot of joint design and joint work and joint debug that happens
at scale, right?
I mentioned this pretty often is that, especially for training, you have to be able to sustain
performance for a long period of time. And what that really means is you have to be able to have
a very reliable infrastructure.
Like, you know, you cannot fail.
Things fail, but you cannot fail as often.
And that really requires maniacal focus across your stack
to make sure there is no single point of failure.
There's no places where, you know, your performance will take a hit.
And that really requires us to have very, very deep partnership with NVIDIA on this.
And I mentioned this before, too, is like NVIDIA started, we started off as a vendor.
Then we became more of partners.
And now they're a big customer of ours, too.
They actually have a lot of their DGX cloud running on Azure.
So it is a very tight partnership in that regard.
Now we fast forward to, you know, we're in May.
It's time for Microsoft Build.
And, you know, one thing, the tech arena will be at Microsoft Build,
and I'm really looking forward to hearing from Satya
and hearing from, you know, the entire executive suite on exactly what's the direction from Microsoft more holistically.
But one thing that comes to mind is this event has always been about developers.
What is the message for developers in the era of AI, and how has that changed from what it might have been five years ago?
Yeah, I think the biggest thing for developer community
is our focus on models as a service, right?
We want our developers, developers worldwide actually,
to be able to take the latest and greatest models
and be able to integrate them in their workflows.
So you'll see announcements around there
on what kind of upgrades we are bringing in there how do
we make it easier for them to take the latest models whether it's in vision speech or any other
area and integrate them because ultimately we want um ultimately ai can only scale if we make
it super easy for our developers, enterprise developers or startup developers
to take APIs from us and integrate it quickly.
So you'll see a lot of that.
You'll also see some announcements around how across the entire stack, whether it's
silicon, whether it's system, we are trying to make it such that, again, we can eke out
every bit of performance from every component, right? And I think that's
where the developers don't see directly, but they benefit from it. Right. That makes a lot of sense.
Now, I did want to talk to you a little bit about industry and infrastructure, because obviously
you're looking to eke out every bit of performance. And it starts with that NVIDIA relationship in
terms of the training, but you're doing a lot of different types of infrastructure within Azure.
How do you see the industry delivering improvements?
Is it at the pace that you want?
And how does Microsoft's own silicon development fit into that picture? Yeah, so I think at the end of the day, the workload is evolving so fast that there are lots of pockets where we would want to have silicon diversity for a variety of reasons. And silicon vendors provide different design points and different optimizations, which lend themselves very well to different type of workloads, right?
So we have lots of efforts going on with AMD.
We've been actually working with them for many years now.
I think we're on the third or fourth generation of deploying AMD at scale.
And we'll continue that.
I think that's been a great partnership there,
Jude. We have a pretty strong roadmap ahead. And then we have our internal efforts where, again,
we are trying to see how we can completely shift the economics of especially inference,
because inference is going to be so ubiquitous that if we can hit a different economic point
for inference, that will actually be a game changer for everyone.
So that's something that we are going to continue to work.
You'll see some announcements around there too.
And in general, it's not just the silicon provider, right?
We have to look through everything in this stack,
whether it's the system, it's the cooling,
it's the network, backend network, the front-end
network, the NIC design, the optics. So every bit of this infrastructure component, you're always
looking to see, A, how can I bring a second source that may be actually more competitive?
B, replace it with something that may be completely groundbreaking and give us a new design point
or give us a new performance point.
So across all of those, we're always looking for new things and, you know, experimenting,
doing internal proof of concepts, and then, you know, we implement them at scale in production.
So we'll continue to do that.
When you, on your question of the pace of innovation, I think there is so much value that can be unleashed in AI that I think a lot of vendors are starting to see how they can be part of this providers, network providers, really embrace AI and rethink their place in this ecosystem, right?
So I do think there is a lot of innovation going on around here.
You know, it's funny.
I was at OCP Lisbon a couple weeks ago, and one thing that struck me was that it felt like in the past there would be one trend that everybody was following, right?
But what OCP Lisbon showed me,
and this is a regional conference,
is cooling, networking, storage,
chiplet innovation, et cetera.
There are so many vectors
where the industry is accelerating the pace of innovation,
and it's such an exciting time
to be following the data center.
I would assume that you sitting at ground zero,
you're having a lot of fun
in terms of talking to various pockets of the industry
about how they can deliver to something
that will help shape AI's future.
Yeah, and I think it's definitely a privilege.
Like you said, when you do something at this scale,
like my constant
theme is oh my gosh the scale we need we're gonna have to like really gut down the entire supply
chain and like make it 2x 3x 4x kind of a thing and it you can replace that with storage networking
like there are components that are there where i have to go back and say, no, no, no, no.
We need to go back to the vendors and have them start new factories because we're not going to be able to keep up.
So it's exciting.
It's definitely exciting.
And I think the biggest bit out of all of that is when you're a 10-year-old and goes and says, oh, look mom, I use this, you know,
chat media or something.
The piece of innovation comes
because the end customers ultimately,
not necessarily a 10-year-old,
but are adopting it.
And that's the most gratifying part of all of this
is to see end customers, enterprise,
or, you know, just users actually adopt the technology.
Now, I'm really glad that you brought that up because I think last time you were on,
I was asking you about what use cases you were seeing for generative AI, and you didn't
even want to predict them.
But I'm going to keep asking you the question.
What have you seen in terms of working with enterprise customers on
the standout use cases or the surprising use cases where you're like, wow, I didn't even think that
an LLM could do that? Well, I think I mentioned this last time too, like I think, and I'll stay
the same line is like, you know, it is really hard to say what all it can do. I think it's,
you asked me what stands out. I would just go and say from
my purview, I had never thought LLMs could be used by a car manufacturer to decide what parts
to manufacture. Just, you know, that was not something I would have thought about it. Or I
had another construction company talk about how they are using LLMs to parse through resumes for their recruiting, then decide
what questions to ask at the interview, then actually decide how the candidate performed
during that thing, and then actually decide how the productivity of those employees would be as
they come on board. And to me, that was not a space I would have gone to as, oh, this is where LLMs could be really useful.
But this is the part where I go back, you put something out in the ecosystem and then you nurture it.
But then you see it blossom in different scenarios, different conditions.
And you realize that, you know, different people have different ideas of how you can use them.
And that to me is like the most exciting part of all of this. I think what's cool about what you just said is
it told me that the thing that unleashes AI, despite all of the fears about AI, is the human
creativity about how to apply it and how to use it as a tool. Yeah. Absolutely. Well, I can't wait to keep talking to you about where you're going and how this is evolving.
But thank you so much for being on the show today.
I have only one more question for you.
Where can folks find out the latest from Microsoft on all of the tools that you're delivering,
all of the capabilities on Azure, and keep up with you and your team?
Yeah.
So the Azure website has all of this. So AI Studio has all information on the latest models and everything.
And then, you know, we have our VM offerings for customers
that are just looking for an IaaS.
But all of that is available through the Azure AI Studio online.
Nidhi, always a pleasure.
Great to see you and hope to have you back again soon.
Yes, thank you for having me, Alison.
As always, good to chat with you.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by The Tech Arena.