In The Arena by TechArena - Google Cloud’s Amber Huffman on AI, OCP & Future Innovations
Episode Date: October 21, 2024Live from OCP Summit, Google Cloud’s Amber Huffman shares insights on AI's future, open standards, and innovation, discussing her journey, data center advancements, and the role of collaboration at ...OCP.
Transcript
Discussion (0)
Welcome to the Tech Arena,
featuring authentic discussions between
tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome in the arena.
My name is Alison Klein, and we are coming to you from OCP
Summit in San Jose, California, and I am delighted to be with Amber Heffman of Google. Welcome to the
program, Amber. Thanks for having me, Alison. Amber, we've wanted to do this interview for a
long time. You're a principal engineer at Google, and you're also on the board of OCP, so it's a
huge week for you. You've also been involved in a lot of industry consortiums over the years, but this is your
first time on Tech Arena, so why don't you first just start and give a little bit about
your background and how you engage the industry.
Thanks for having me, Allison.
I feel like I'm coming home because I think my first podcast was Chip Chat with you, probably
15 years ago.
So I sell into standards.
I started at Intel as an intern
and even then into my third internship maybe.
I ended up in Rick Coulson's group at Intel,
which was doing standards for serial ATA.
And I fell into working with that team doing serial ATA.
Then in Intel Labs, it was kind of a golden era,
I would say,
of USB, PCI Express, all of these great standards. And I really learned the tools of the trade of
how do you deliver standards? How do you work well with the ecosystem and define the win-win?
And so I feel really just blessed that it fit my personality profile of being an extreme extrovert
and getting to work with a lot of people to find the win-win and do things together and change the industry. So I did serial ATA, then I went on to
open the NAND Flash interface, which is the component interface. If you can believe it,
there was never a standard for the NAND chips that we all use and take for granted in our SSDs today.
So that's building out the SSD industry. And then to NVM Express that I founded and still lead to this day. And if I roll back
three years ago, Google came knocking. I was excited to take my 25 years of experience at
Intel to the hyperscale and really learn from inside the building at Google what they do.
And so it's been an illuminating three years, and I've learned a lot of taking on a role of
doing data center ecosystem strategy from whether we talk power cooling at the rack level
to sustainability with things like green concrete
to networking to storage.
I've just learned so much
and I've been really blessed with amazing Googlers
to work with.
Now, you recently did a killer tech talk on compute and AI
that I think everybody should watch.
No matter how technical or non-technical they are,
I think it was just amazing.
What do you think it's so important to capture what this moment means for computing in terms
of the delivery of AI? I think we're in a really, really interesting time. And the piece that I was
trying to highlight is the heterogeneity that we're in. So it used to be that doing anything
other than a general purpose chip
didn't make sense because Moore's law was going to give you a doubling of performance. So just
wait for that next train. And by the time that you took your custom chip and really delivered it,
that 5x gain or 10x gain would be washed out by what Moore's law would deliver. In the slowing
of Moore's law, what we're seeing is this heterogeneity and very much a co-design.
And so I look at the hardware, software co-design and what we can do together at every optimist and every layer of the stack.
That's how you're getting these tremendous gains.
And so it's just a fundamental re-looking at how do we look at the whole stack of the system from the chip to the system to the rack. And you're seeing that with things like our TPU pods,
you're seeing that with the GB200 from NVIDIA, or we're looking at now they're going, no,
the GPU is no longer a GPU, it's now the rack. And so it's a pretty interesting time. You know,
if I go back 20 years ago, you know, I think you and I both came about in the renaissance of the
internet of, oh, what's the internet going to do? This is just an even bigger shift.
Now, you're delivering a talk on the same theme in terms of open innovation to fuel AI moving forward.
What does that talk about and how are you extending your theme from the TED Talk to the OCP audience?
Yeah, my theme at the TEDx talk that I delivered was about none of us is as smart as all of us.
And that is definitely what we're continuing to see here at Open to Pew Project.
So Google really invests at Open to Pew Project and I invest my personal time because it's where we can come together to deliver innovation quickly.
One of the things that I've seen over time is when you do the historical industry standards, which I still love,
it takes you 18 months to 24 months to agree on every single detail. AI is moving at a yearly cadence at the hardware level. And so you don't have two years to define and decide on every
single detail and then deliver it to market. So one of the things that we've been doing at
OpenTribute Project is coming together with key industry partners, depending on the field and the topic, and super quickly coming to agreement within three months,
six months, on 80% of what we've done. And that 80%, then we can go forward and go deliver together.
And that 80% is often good enough. And one common one that I point to is OpenRack v3 is something
that OCP is very proud of and having
delivered. Meta and Google both do Open Rack B3, but we put bus bars in different locations.
One of the things we've come together is Microsoft and Google and NVIDIA, we came together to define
GPU definitions on management interfaces, RAS, firmware update. Those were things that weren't
defined or written down. And one of the things that was happening is each time a new accelerator shows up, each hyperscaler has to go and really
figure out the nuances and that delays time to market. So these are the types of things we're
coming together, writing down, quickly aligning and making sure that we can have impact because
if you delay delivering that next GPU to a TPU by a month, it's sad.
Exactly. And I guess one question that I have for you, you might have
already answered, but let's clarify. Does that new approach of that 80% extend across hardware,
foundational software, and model delivery to you? Or where does that 80% alignment stop?
I think it can apply in all of those domains. My expertise doesn't extend as much to the model
area, but certainly you see a lot of diversity and a lot of people spearheading of, hey, let's try this Gemma or Llama, all these
different models and give ideas and how does it diversify. But it does apply to most areas. Now,
obviously, if you get down to, say, a PCI Express or a UCIE, if you're really trying to do full
compliance at a chip level, it does still matter
of coming complete to a standard. A critical one that I would say right now is how do we look at
power delivery? In 2016, we joined Open Compute Project as part of the 48 volts of the rack. That
was our first contribution to OCP. With fees, you're hearing about 150 kilowatt racks today.
They're going to become one megawatt racks.
When you think about a one megawatt rack, you're going, well, I need to up my game.
And so one of the things we're really excited about is coming together with the industry to work on plus minus 400 volt DC to the rack.
Wow.
And that's crazy.
You're packing in so many GPUs to the rack.
You need to battery backup.
You need to move rectifiers, all of that out of the rack into a sidecar rack that sits and feeds the rest of the racks. And you end up then
trying to get more efficient. So plus minus four-hour volt, it's really great of the electric
vehicle industry. We can leverage the supply chain that they've developed and really scale
immediately. And so we're super excited about
how all of these industries interact and interconnect so that we can do so sustainably
and efficiently and bring things together in the industry. I want to change pace a little bit.
Google is one of the only companies on the planet that's been delivering AI at scale for years,
embedding it in everything that you do. How do you see Gen AI evolving in the coming 24
months? I know that that's looking into a crystal ball. Where do you see the top use cases for
deployment growing? Great question. We have six different applications that have more than 2
billion users. And so Gen AI is happening everywhere. And as you point out, when you do a search, when you look up a video on YouTube, all of
those pieces have already been infused with AI for years.
What we see moving forward and is pretty exciting is Gen AI is going to change everything.
If we think about human health, if we think about science, if we think about different
use cases like sports, creativity, and you're already seeing many examples.
So one of the ones I love to think about when we think about human health is AlphaFold.
We're already on AlphaFold 3, and it was pretty cool to see the Nobel Prize awarded for the original AlphaFold work.
You're seeing how that's already altering human health and what we do.
If we look at, I'm a bit of a geek when it comes to sports.
So Moneyball, if you go back to that book and how statistics have changed sports, now you have,
for example, one of the customers we've been working with is Major League Baseball.
And taking a look at how do you compare live scenes and what does that look from your stat
database of has that play? What has occurred before, and how do you have that dynamically? So how do we bring new interactions to people? So it's just across the spectrum. So
when you say what happens in the next 24 months, I think about all of our cloud customers. Every
single one is looking at how does general AI help them? And I think one of the other big
opportunities I think we have to embrace is recently I was back at University of Michigan
for our National Advisory Board meeting for the Computer Science Department.
We were talking about how do we infuse utilizing these new Gen AI tools as part of our educational for students at Michigan.
We have 3,500 computer science graduates per year out of the University of Michigan alone.
And those students, they need to walk out knowing how to use Gen EI and the coding tools because that's really going
to push their productivity forward. Sure. And so I see it as an exciting time of us all figuring out
how to use it to accelerate our capabilities in every domain. I believe I didn't answer your
question very well because everywhere is not really a great answer. But you gave a lot of answers. And I think that gives people some
areas to dig into. I am so excited that the DeepMind team got that recognition because
we're just like starting to scratch the surface of what their work will actually deliver in terms
of health benefits to everybody. And so cool. How is OCP shaping? You talked about DC power as being one of the areas, but
how is OCP shaping the infrastructure that will deliver this next wave of AI? We talk about the
continued demands of performance for these AI workloads. How do you see the OCP community
tackling some of these things? I see OCP as similar to open source software. If we think about open source software,
we go back 30 years. I could not have imagined that the cloud, paper scale, everything we're
dealing with, a lot of our AI tools, it's all based on open source software and it's brilliant
minds coming together to deliver. An open compute project, that's really what's happening is
brilliant minds coming together and tackling a problem and finding the right experts to go tackle that problem.
So what I love is the open environment that allows people to come together and say,
this is a foundational problem and we're going to go tackle it as a team.
So just an example that I thought was really exciting over the past few years is Calypdra.
So it's an open source hardware root of trust.
So there was a while there where a few companies were arguing over what is the one true root of trust and my root of trust is better than your root of trust.
We all need to trust the cloud. Microsoft and Google coming together as fierce competitors to figure out, oh, let's write this spec.
We need to have a rooted trust that everybody agrees on because the cloud and ultimately AI are dependent upon trustworthy societal infrastructure.
And now we're actually building and have delivered an open source hardware implementation of Calypso that now the industry is utilizing.
And then we're building further upon it.
It's a flywheel.
I think about it as a snowball running downhill in a good way.
Hopefully it doesn't run us all over.
But in NVMe, now we're picking up Calypstra to utilize as hard of what we call OCP lock,
layered open source cryptographic key management.
What we're doing is how do we sustainably utilize Calypstra to then do key management
in a way that we trust that if somebody stole an SSD from the data center,
I know that they can't get at that data because I've infused it with some additional entropy from the cloud,
and they can't unlock it.
That was all from this genesis of Calypdra snowballed into this OCP lock capability.
And that OCP lock capability, what it really gets me is today,
because I can't trust that my drive is secure if somebody stole schooled through the data center, I have to shred my SSDs.
And that is a terrible thing for the environment.
It's terrible in so many ways.
So we can get away from that.
So I just see this way that it builds on each other where you think about in open source software, the Linux moment where you're like Linux seemed like a hobby project for Linus Torvald.
It's turned into this global phenomenon.
So I think that's what OCP is doing
is bringing exciting people together
to build and to create things.
And Amber, I know that this week
has been a perfect example of that.
OCP Summit, 7,000 people.
This has become the industry's center focus
on what's next for the data center
infrastructure world. I want to take you back to your roots in interconnects. And I think that if
I look at a motherboard, I'm wondering what interconnects you haven't touched. And as
somebody who also started my career in interconnects, that's like a geeky fun thing.
I know that OCP collaborates with a lot of standards organizations. I know that OCP collaborates with a whole lot of standards organizations.
I know that there is a tremendous amount of work going on in trying to fuel the movement of data
around a platform and between platforms. What are you excited about in the interconnect space? And
what do you see the industry really focusing on over the next year?
I see a lot of focus on one of our challenges is how do we get the data between an
AI collective to another AI collective? Like if we think about Google uses TPU pods where we do a
ton of work within 4,000 chips or 8,000 chips, but then we need to go to the next pod and that's
Ethernet. And that same paradigm happens with GPUs where you have a set of GPUs that you're doing,
and then they need to use Ethernet to move to the next one.
So UltraEthernet is very exciting,
how that's developing and bringing people together.
I believe in the next six months,
you'll see their 1.0 spec coming out.
One thing Google did is we provided the Falcon protocol.
So what we've already delivered at scale is a modernization of Ethernet.
And so providing some of those ideas for UltraEthernet
and other CUs, for example, congestion signalings, another thing that we're doing deeply at OCP and UltraEthernet in a collaboration to try to bring about whether you utilize Falcon or you utilize UltraEthernet or Brachy.
How do you use this congestion signaling so that we can get across and we can all utilize common switch technology and whatnot?
We get other things that's happening is Ultra Accelerator Link has formulated.
This is how do you get an open scale up with Interac.
That's pretty exciting because you'd like to use load store and you want to do as much locally before you go out and take a little bit of a bit of going on to Ethernet.
I find that these pieces are starting to come together.
Another one is CXL memory expansion.
That's another one that
I see that's very much taking hold. We did a contribution with Meta last year on how do you
build out a CXL memory expansion device so that you can take advantage of, hey, I've only got
eight channels of DRAM or maybe I've got 12 or 16, but I want to have much more memory capacity
and build that out. Those are some of the ones that I'm excited about. But obviously, going all the way down to the chip level itself, I think that chiplets and
UCIe is going to continue to be pretty exciting of how do I figure out that I want an I.O.
chip versus an HVM chiplet, and I don't have to build everything myself.
How do I get to composability?
So one of the big steps forward that was announced in August was UCIe 2.0, where what you have as part of that
is the management interfaces and the security. So, Groupon was very involved in that and trying to
have that foundational of it's not just about the interface, it's the manageability and the
telemetry and all of that at scale, whether it's at the GPU level of thinking about how do I do a
GPU collective, but at the chip level, how do I audit and understand
what's at my chip level and that everything is secure and that I can debug it all at scale.
That's amazing. And I am waiting for this open chiplet economy to burst out because it's going
to allow for so much acceleration and innovation when companies can come together and provide
their best technology for something that's greater.
I completely agree with you. And I think one of the things that is beautiful about the chiplet
economy and these standard interfaces, you and I both have worked in tech a long time. As an
engineer, there's nothing I don't want to change. And so the standard interfaces allows us to take
advantage of scale versus we all go in and tweak everything. So it's an amazing, beautiful time of
how does all of this diversity and heterogeneity come together. Amber, it's been a pleasure talking
to you. As always, you're one of my favorite people to talk to about tech. You're just one
of my favorite people to talk to. So there's that too. And it was so great to have you on
Tech Arena. I can't believe that it's been 15 years since you were on ChipChat. Let's make it
less than 15 years before we do the next.
I completely agree. And I'm just so excited to see what Tech Arena is doing. I really think
it's adding so much value to the ecosystem. So thanks for all you do, Alison.
Thanks so much for being on the show.
Thanks for joining the Tech Arena. Subscribe and engage at our website,
thetecharena.net. All content is copyright by the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by
the Tech Arena.