Computer Architecture Podcast - Ep 14: System Design for Exascale Computing and Advanced Memory Technologies with Dr. Gabriel Loh, AMD
Episode Date: December 6, 2023Dr. Gabriel Loh is a Senior Fellow at AMD Research and Advanced Development. Gabe is known for his contributions to 3D die-stacked architectures, memory organization and caching techniques, and chiple...t multicore architectures. His ideas have influenced multiple commercial products and industry standards. He is a recipient of ACM SIGARCH's Maurice Wilkes Award, is a Hall of Fame member for MICRO, HPCA, ISCA, and a recipient of the NSF CAREER award.
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting-edge work in computer architecture and the remarkable people behind it.
We are your hosts.
I'm Suvainai Subramanian.
And I'm Lisa Hsu.
Our guest on this episode was Dr. Gabriel Lowe, who's a senior fellow at AMD Research
and Advanced Development.
Gabe has had roles on both sides of the industry academic divide, having also been a tenured
associate professor in the College of Computing at Georgia Tech.
Dr. Lowe is known for his contributions to 3D die-stacked architectures, memory organization
and caching techniques, and chiplet multi-core architectures.
His ideas on these topics have influenced multiple commercial products and industry
standards.
Additionally, he is a recipient of ACM SIGARC's
Maurice Wilkes Award. He is a Hall of Fame member for Micro, HPCA, and ISCA, and he's a recipient
of the NSF Career Award. And to pile on top of that, he's a co-inventor on over 100 U.S. patents.
We were lucky enough to snag him for a conversation about system design for large
and complicated systems like the Exascale project at AMD. We also discussed memory technologies, navigating
Amdahl's law and the age of accelerators, and he dropped some unconventional wisdom about how it
can be great to be wrong and when imposter syndrome can be a good thing. Before we get to the interview,
a quick disclaimer that all views shared on this show are the opinions of individuals and do not reflect the views of the organizations they work for.
And with that, let's get right to it.
Gabe, welcome to the podcast.
Hey, thanks for having me.
We're so glad to have you.
So listeners to the podcast know that our first question is almost always,
what's getting you up in the morning these days?
Well, there's the technical and non-technical answers.
You know, the non-technical is just the kids and getting one off to school.
Technical these days, I've actually switched a little bit full circle.
I'm working
on CPU microarchitecture, which is where I started my PhD career before going off on a
smattering of all kinds of other topics. But it's kind of fun. We're back to where we started. And
while for, I think, a lot of the academic circles, traditional CPU microarchitecture may not have the
same flair and it's not quite the hot topic for some folks.
It's still a critically important aspect to everything we do. There's a CPU, it's still
the central processing unit, even in our larger heterogeneous systems. And Amdahl's law doesn't
go away even in the world of highly accelerated heterogeneous systems. And so we still need to keep pushing that,
you know, CP performance further and further
in tighter power constraints.
So there's a lot of exciting challenges there.
It's fun to get back to it.
That sounds super fun.
Yeah, I mean, microarchitecture is usually
what makes all of us computer architects
attracted to the field in the first place.
And so there are some people who would argue, though,
that, you know, at this point, we shouldn't be focusing on,
I mean, the age-old debate, right?
We don't need to focus on microarchitecture.
We should be focusing on things like, you know,
the network or whatever.
And so what kind of problems are you guys at AMD
trying to solve that sort of necessitates
digging really deep into the microarchitecture
at this point?
Yeah, to be fair or, you know, complete, it's, you know, microarchitecture, you know,
CPU microarchitecture is not the only thing that, you know, AMD is focusing on, right?
This is kind of the area of, you know, what I'm, you know, what gets me up in the morning,
what I'm currently working on.
But, you know, we've entered a world where, you know, there are no more, you know,
there's no more low-hanging fruit or there's no single bottleneck that needs to be solved.
You know, all of the easy problems, you know, none of them have ever been easy, to be honest, but the easy-ish problems have been picked clean and we need efforts on all fronts.
And so there are accelerators, there's a software, there's even software.
It's a gigantic mass of both applications down to the runtimes, compilers, tools, firmware,
et cetera.
And modern computer systems are incredibly complex, and you need all the parts working
together efficiently.
If one part isn't really doing its job or is out of sync with the rest of the design,
you can give up efficiency gains with this performance, power, cost, et cetera.
I think all of these areas are critically
important. We have to focus on the microtexture. We have to focus on accelerators and new domain
specific approaches. Accelerators themselves, they're not really general purpose. The whole
reason why you can accelerate things is that you're doing them in an application specific
manner. There's something about the domain, about the workload that you can take advantage of to do something,
you know, more efficiently, more effectively.
And that's where you get the performance power enhancements.
It's sort of almost an oxymoron, right?
They have a general purpose application specific accelerator.
All right, but the, you know, what goes along with that
is that, you know, no accelerator design
is going to solve the world, right?
We're just gonna solve all of our problems, right?
It's going to solve a subset of our problems.
It's going to provide some great enhancements in the subset of problems.
What do you do for all of the other problems that we still have computationally or more broadly, socially, whatever?
And so we need, I think we're headed in a direction already of increasing heterogeneity.
You look at the number of accelerators in mobile phones and things of that sort already.
They're incredibly heterogeneous with many different functions specialized for different tasks.
And, you know, you get fantastic gains in efficiency in that fashion.
But what you also look at is the corresponding software ecosystem that goes with the phones and across, again,
all the different layers. It's not just writing apps for your app store or whatever, but all the
way down to how the manufacturers have to put together all this IP and the overall methodologies.
There's a lot of infrastructure and all of this has to come together. And so the Muck architecture,
given my background and the things I've done, is certainly something very interesting. all this has to come together. And so, you know, the Muck architecture, you know,
I think given my background and the things I've done is, you know, certainly something very interesting, as you said,
I think it's a very interesting topic for many folks that start off on the
journey of computer architecture, you know, basic, you know,
how does a pipeline work? How does a cache work?
And those things are so important, right?
There's been a kind of these old jokes about how everything
in computer architecture comes down to,
let's say, three, four, five concepts.
It's like a pipeline and caching, speculation.
I think there's a couple more, right?
But those things still matter,
and there's still opportunities to continue
to improve and refine them.
I think what's also really exciting is the,
you know, we used to traditionally work a little bit more in silos or kind of within our own layers of the abstraction stack.
Right. Everyone who's taken an introductory computer organization course has seen one of these figures starting from the devices all the way up to the application with a bunch of horizontal lines in between. Right, and I think, you know, these days, while, you know, people talking about working across a stack
and interdisciplinary research, you know,
the buzzwords have been tossed around for many years,
but I think we're seeing a lot more of that,
you know, put into practice.
A lot more people are talking and thinking across the layers
and that's opening up a lot of new opportunities,
you know, for the microarchitecture,
but for the architecture, the ISA, and everything above.
And so, yeah, our problems, our challenges are getting tougher.
But at the same time, I'm're seeing across these works from academia, from industry,
from government labs, all manner of folks are contributing great ideas and we need to know
all these problems are just getting harder. Great, thanks a lot for that really good
context to our listeners. While we do currently live in the age of
accelerators and the shift towards heterogeneous computing, it's very important to realize that
system balance is very important. And as you mentioned, there's no one silver bullet solution
that's going to fix all of your performance bottlenecks, especially if you care about
end-to-end performance. I want to pick up on a couple of themes that you mentioned.
So you talked about how you're no longer sort of working in silos and there's a greater
need and greater demand to work
across different layers of the stack. So I wanted to pick your brain on what are some of the themes
that have peaked your curiosity in this particular realm? We have a confluence of multiple trends,
like emerging workloads, emerging technologies, emerging architectural themes like heterogeneity,
different software stacks and so on. So what are some themes here that have
peaked your curiosity?
How do you think about working cross-stack
in the current age?
I think one of the big challenges as well as,
you know, I think big opportunities is, you know,
where to rethink interfaces,
as well as where not to mess with things that work perhaps,
right, what is the right way,
especially as we get into high levels of heterogeneity and there's heterogeneity at different layers of the stack
the risks for kind of dealing with combinatorial explosion of how these different things interact
with each other becomes uh you know it's it's a potential issue that we're going to have to
you know collectively deal with if you've got just all these different combinations of hardware, how are they all going to communicate with each other? How do you design those interfaces
that remain efficient to those different needs while you're not becoming... You don't want to
be overburdened, right? Because if you want to have a master interface that could talk to everything
and everyone, it's probably going to end up not being as efficient, right? And so what's that right balance, right? And that, I think there's a lot of work to be done
in understanding the trade-offs, you know, there's going to be some science and probably a little
art to it as well, you know, drawing upon past lessons of, you know, where things have worked
well, different standards and approaches, but there's also new needs that have to be addressed.
And so I think that that's going to be a very interesting area to go through.
And then another aspect of that beyond just the interfaces is once you have all these
different components, hardware and software, you put them together.
And, you know, if things don't go entirely as planned, you know, there's some unexpected behaviors and such, you know, how do you test or anticipate or specify, you know, the expected
behaviors, you know, system validation, design verification at this scale, particularly if you have multiple different parties involved.
I take an accelerator from one company and another component from a university, et cetera,
put them all together.
Sometimes even with a well-defined specification, people still read the specification in different
ways.
And how do you guarantee that? Or guarantee is probably too strong of a word, but what can we do collectively from a methodology and a science standpoint to make these systems robust and as
successful as possible? So I think there's, I don't have the answers to all of those questions,
but these are the types of things that I think many of us
do need to think about going forward.
All this makes me sort of think about like the general value
of understanding like large system design, right?
Because earlier in your answer,
you mentioned something about cell phones.
So I know when I was at Qualcomm that we had,
these are small systems,
but they had lots and lots of different accelerators across them and lots of different
interfaces, lots of different ways for things to discuss. And then there was a lot of discussion
internally about, you know, how are we going to model this? And how do we, we've got these
siloed systems. What about the full system end-to-end design? And I imagine, you know,
in all the work that you've done with Exascale, you know, that's very similar. Exascale systems
are enormous. You've got lots of different parts
and probably spanning lots of different, as you said,
even organizations that are coming together,
you know, some IP from here, some IP from there,
contributors from here and there, everywhere.
And so across all of these,
it's a large system with lots of interconnecting components.
I know you just said you didn't have all the answers,
but maybe you can speak a little bit
to particularly your experience with Exascale, which is one of the things we want to focus on today, is like how you bring something,
you know, that's so large scale that the word Exa is in the word to make it all work together.
Yeah, I mean that was a really interesting, you know, a ton of fun on that journey, right, because
when we started off on this exascale endeavor,
it was at this point, you know, well over 10 years ago, I guess, getting close to 12 years at this point, where, you know, we're trying to predict what this machine a decade later
should look like, you know, with, you know, whatever we could see our crystal balls would tell us back in the early 2010s. And that span, you know,
a lot of different aspects, you know, the evaluation type of components that you mentioned
is just one piece of it, right? But it's a critical part, right? Because, you know, one can
dream up all manner of different things. And if you don't have a way to evaluate it and to call down the massive design space,
you're just gonna flounder and spin in circles
of chasing the latest shiny thing.
And so what we did was it was kind of a multi-pronged,
kind of multi-resolution type of approach.
Early on, we looked at some macroscopic trends, macroscopic projections, just where
did we think silicon technology would be 10 years into the future, given various projections
of Moore's law slowing down, things of that sort. Where do we project memory technologies
to be, both in terms of capacity and bandwidth and energy efficiency, so on and so forth. So we looked at multiple different axes
and made our best guesses and then came up with a vision
for, OK, well, how could we conceivably get to these exascale
requirements?
And exascale requirements is more than just raw compute,
more than just raw flops.
I think one of the aspects that the U.S. Department of Energy
really put into sharp focus from the get-go was that this is not just a benchmark machine.
Certainly, achieving a top position on the top 500 list was an important milestone that I think
everyone certainly wanted to see. But at the end of the day,
these machines are being used for real science. The vast, vast, vast majority of the scientists
working across the U.S. national labs, as well as many international partners that get
access to compute time on these supercomputers, They're not sitting there doing matrix multiplications all day
for benchmark scores. They're running computational experiments to try to discover new things about the universe, new chemicals, all manner of science. And so delivering a machine that could
actually address performance for real workloads was a very, you know, it was a common theme
throughout the entire program from day one.
And I think that was also very useful
from the AMD perspective of, you know,
thinking about a machine and not just the speeds and feeds
and, you know, how many instructions per cycle
you can crunch through your bandwidths,
but really thinking about, again, as Stephen
had mentioned, the end-to-end thinking of what
are we trying to accomplish?
And then so having set an initial vision for, OK,
how do we get there?
What are the important aspects of this machine?
Again, it's not just in terms of the top level numbers,
but what are other characteristics
that we want to ideally see from it? It can handle irregular access patterns of different
sorts. We have to think about that the programming model for the
scientists who, for many of them, programming is actually an annoyance.
They want to do science, they don't want to sit there debugging code. And so
you think about all these aspects, how do you put together? And then we sort of went through a kind of a multi-resolution, different tiers of evaluation,
right?
Some of the evaluation was almost really more kind of spreadsheet level, like, you know,
we idealistically, you put together all the numbers, do we get to, you know, the types
of memory capacities or bandwidths that we want to get to?
Do you get to anywhere close to an exaflop of compute does this thing require you know 16 uh nuclear power
plants to turn this machine on or can you actually get this to work within a reasonable power budget
you know so some of that stuff order of magnitude you could really do kind of just pencil paper
practically but then at different levels for different aspects of the design, we use a wide
variety of different tools. In some cases, you're dealing with program analysis and going through
kind of more traditional profiling of application behaviors, both on CPUs and GPUs. In other cases,
you're going down another layer to cycle level simulators or system emulation.
In other cases, you're doing network simulation.
And so for the different aspects,
just necessarily given the complexity of the problem,
we have to decompose and focus on the different parts.
And that helps you divide and conquer the problem
from an evaluation standpoint.
On the other hand, it does introduce some challenges
as you try to roll these things back up together
because ultimately none of these parts are standalone.
They all have to come together into a full system.
And so then you again sort of like start
bringing it back together through a mix of methodologies.
And some of our earlier Exascale papers
that we published in the
computer architecture conferences, we indeed showed some initial projections where
those projections are coming from a very diverse mix of inputs. Indeed, some of these
things are application analysis, some of these are from profile data, some bits from cycle level
simulation. And that's where you end up having to rely a bit on kind of the experience and intuition
of some of the more senior folks that have been designing and building supercomputers
for a long time where they have a sense of, okay, if you put these things together, yeah,
these kind of projections do make sense.
Or, yeah, that seems really odd if you, you know, we're seeing these kind of projections do make sense, or yeah, that seems really odd if we're seeing these kinds of results.
We don't expect things to scale that way
in this particular fashion,
and that may require further analysis,
digging in to what's going on.
Sometimes you find out,
oh, okay, we had misaligned assumptions about something,
and sometimes you discover something,
and like, hey, actually, this is different,
and indeed, it may perform better
in this fashion or whatever wow that's a really great detailed answer on how to sort of build up
a large system from starting starting small and at the lower layers and kind of like leafing up to a
big you know a big picture of how everything goes so one of the things that i thought you said that
was really interesting gabe was about relying on the sort of intuition and the experience of people who've been there for a long time.
My recollection is when I first joined AMD, you know, many years ago, even before the Exascale stuff started, just being sort of in awe of the people who were able to imagine projecting, you know, roadmap things four, five, four five six seven eight years ahead and just
being able to anticipate what they felt like was coming down the pipe and so you know in putting
together that kind of a team at amd you know it's been an incredibly successful long-running
project i'm curious to see about a like how you sort of project forward you know presuming you're
one of those people who was like doing the projections for you and like Mike Schulte and other people that I remember
looking forward. And then also, you know, how and when you decide you're going to say pivot,
like, oh, we might've been a little bit wrong about this one or something has changed,
you know, sort of keep the shift moving forward because that's the hard part, right? You want to,
you want to be able to look far enough ahead to do something meaningful that's in advance, but at the same time, you have to be able to adjust.
That's a great question. A couple of key aspects. It sounds very Spanish-Inquisitional-like.
First is that you have to be in communication with other folks. As researchers, we cannot live in an ivory tower
and just come up with our own big thoughts
and drink coffee and eat donuts all day.
And a critical aspect of our Exascale research program
was that we were sharing our ideas,
getting feedback from the folks on the product teams,
the folks who were planning our roadmaps,
getting good critical
input on things that they thought might pan out, things that they thought were risky,
things that they thought we were way off in left field on. And what also goes with that is
you need to have a team that can put ego aside, that we don't get too married or emotionally wedded to our ideas,
right? These are, you know, kind of ideas we came up with, but if we get good feedback,
we get new input. Hey, you have to be willing to throw things out and take that input and shift
and pivot, as you said, right? And we, you know, the fact that, yeah, I thought I had a good idea.
It just seemed really cool to me. And then now I found out that, okay, wait, that doesn't make sense anymore.
I have to be very objective and just take that information in and just, okay, that's
great input.
Let's go figure out what's the next thing we should do.
How do we adjust this?
How do we address those concerns?
And so part of the success, I believe, for our program is that we had a phenomenal team,
not just from a technical standpoint, but in terms of the success, I believe, for our program is that we had a phenomenal team, not just from
a technical standpoint, but in terms of the personalities and the lack of ego and really
this shared vision of how do we collectively build something successful, right? And that
objective was shared not just with the research team, but with the product folks as well, that
we all collectively wanted to see this happen. We amd to be successful and so there was not really like research versus the product team us trying
to sell them on something them trying to you know pull us back or whatever this really is about
coming together as you know one company is one effort to figure out you know what's the best
way forward for this right and so that also, I think, made it really fun, right?
Because we just, at that,
once you kind of establish that kind of culture,
you no longer have that fear, really, I think,
of throwing out what may be crazy ideas
that other people in some other context
might laugh you out of your room for, right?
And I think perhaps it's a perhaps it's a, you know, speaking from a position of, you know,
privilege of seniority and such.
But, you know, I've hit that point in my career where I really don't care, right?
I mean, the sense that I can say, you know, I can put out ideas.
And if it's not a good idea, if someone has a good reason why what i suggested
doesn't work like that's okay it's okay i'll move on no i'll move on to the next thing right in
general what i i feel like is that being wrong is actually a good thing right because if you're
wrong like why were you wrong it's you had some intuition some thinking that led you down a
particular path.
And when you find out you're wrong,
that meant that there was something in your intuition,
something about your assumptions that wasn't correct,
and it's a learning opportunity, right?
So if I'm wrong, I'm going to learn.
And in some sense, if I'm not wrong on occasion,
that means I'm not learning enough.
And also, especially as researchers, that may be indicative that i'm not
pushing aggressively enough either right you know part of the role of a research lab is to you know
push the envelopes to try to push on the frontiers and if you're being too safe you're actually not
really doing your job i think at least in the industrial research uh context right and it's
a careful balance right you can't go too crazy, otherwise, you know, you come up with, you know, ideas that no one will ever use.
I think there's a, you know, there's a natural
pull factor that brings you back to reality, right?
Because you, you know, so long as you're learning
and taking in the new information,
your ideas are going to get refined,
they're going to get better, and they're not going to go,
you know, they're not going to continue diverging out into some area that's not going to be useful.
Right, that's a very germane point on the importance of culture and being able to sort of fearlessly explore ideas, recognize when things are, when your assumptions are wrong, and then pivot and, you know, take your learnings forward and move to a different problem. I think that's a very, very important point in large teams, especially. I wanted to expand on
some of the, I think a lot of the retrospectives that you've written both on the Exascale
computing journey and some of the other things that you've done have been pretty refreshing
about the lessons that you've learned, what were the initial assumptions, how the field overall has
evolved. Because when you start out, you don't know how things are going to evolve, both on the
technical front and perhaps on other fronts like the commercial
realities as well for various technologies.
So I wanted to dig a little deeper on maybe memory technologies in particular, since you've
had experience with multiple kinds of memory technologies and have seen different trajectories
for them.
So ranging from 3D DICETACT, DRAM that eventually went into like HybridCube or HPM and things
like that to other technologies that have maybe not panned out to a commercially successful
point yet like NVRAM, phase change memory or resistive memories and so on.
Can you maybe sort of compare and contrast the journey and considerations with each of
these, maybe lend some flavor on how you were thinking about the problems, what are some
ground realities and how that sort of shaped your perspective on memory
technologies and other technologies in general,
how they intersect with both research and product roadmaps?
JOHN MUELLER- Sure.
I think there's a couple general trends
we could probably draw out from this, just thinking back.
Much of my 3D-related research started back
when I was still academic faculty.
I think especially in academia, you have
the license, perhaps the charter really to push even more aggressively than one can in industry.
And at the same time, frankly speaking, your levers for direct commercial impact are far
weaker from academia, or you publish papers and try to influence the thinking of other people.
You know, from that perspective, early on,
I didn't worry too much about the commercial,
it's not the commercial viability
of any of these technologies,
but it's more about the timeline of it, right?
So you talk to the folks in the industry,
you get the best information you can
to have the best assumptions for the research.
And then from there, I would encourage, you know, especially academic researchers to not worry too much about,
OK, you know, is this going to impact industry in two years, five years, seven years, et cetera?
Because there's just too many factors that go into it. I think what's important is that the kind of the way I approach a lot of this is to come up with ideas that, you know, assuming this technology hits a point of maturity, then these ideas would allow one to make better use, hopefully, of that technology in some fashion.
And, you know, it may be a little bit of wishful thinking as an academic, right? But I think part of the rationale
was that if one can demonstrate additional ideas,
additional benefit of these new technologies,
that may motivate industry to try to accelerate
the development of that technology as well, right?
I think from an industrial standpoint,
there's still a similar theme, a similar approach in that with any kind of new technology that we may be looking at or like a one-way street because if you come up with that
big new idea, indeed that may motivate, you know, the powers that be to try to accelerate
the availability of that capability, right? On the other hand, if, you know, people aren't doing
the research as aggressively enough, if everyone is waiting for, know a memory technology to hit a certain point or whatever
then you know the the natural kind of cadence of that progression or because the investment made
into that new technology you know may not be as great because they don't have that that big
motivating factor of oh if we can get this to market there's this you know 3x 10x or whatever
benefit um right and so there's just a little bit of a give and take.
And as a researcher, if you go too far and the technology is still 25 years away,
that's also a challenging sell as well.
So there is a bit of a balance to be had there.
My overall thinking or approach,
it's been very interesting watching
these different technologies develop over time
because from the perspective of a technologist and things you've thought about and worked on,
like you want to see this stuff become reality, right? You want to see, you know, whether it's
a new memory device, 3D stacking, et cetera. You know, these are all things that, you know,
you've worked on, you've thought about, and it's just like, wouldn't it be cool to see that in a
real product, right? And some of the 3D stuff, I mean, it's taken nearly, you know, two decades
for us to really see these things come into kind of full commercial capability. But on the other
hand, as, you know, someone who works for, you know, AMD as a shareholder, you know, and I think
this is true for, you know, all of, you know for most of the companies out there, if we can
make use of our existing capabilities and we can delay the adoption of some risky new
technology for another generation, it's I think always a worthwhile question to consider.
Should we actually take on that risk of bringing on that new technology now today right and so there's that there's that timing question of you know the these new technologies may in fact have
you know great merits but we might not need to do it you know right today at this moment right and
that's i think just the types of questions that uh are you know being asked you know by you know
everyone in the industry at any point in time you When do you adopt, when you introduce a new capability,
a new feature, is the market ready for it?
Is the ecosystem ready for it?
And it's a tricky thing.
I think that's more in the realm perhaps of folks
with the MBAs and whatnot.
If you jump into the water first,
you have the first mover advantage, you get a headstart over everyone else. But if you jump into the water first, you have the first mover advantage.
You get a head start over everyone else. But if you jump in the water and the water is shark infested, guess what?
You're also the first one to get chewed up. So what's the right strategy for that?
It's going to be different in every case. But bringing it back to the research side.
Right. This is a lot of those things are kind of well outside the realm of what your typical uh you know technical researcher is really
going to think too deeply about and you should have some awareness of these things right that's
just oh i have a great idea then everyone's going to come to me and uh everything's going to be
great everyone's going to adopt my idea and run with it right that's not you know typically how
new technologies get developed but at the same, especially as an academic researcher, I think if you try to
read the tea leaves too much, it's going to be a kind of a distraction. And many of those things
are kind of outside your control. And you really should, in my opinion, focus on the technical
innovation. What are the new ideas? What are the new capabilities? What new can we collectively do
with this? And that, that if anything I think would help
accelerate the adoption of new technologies faster than you know the
business concerns and those concerns obviously remain but if there are
sufficient motivation there's sufficient benefit or value I think you know
industry will find a way to to capitalize on that that was really
interesting discussion
because it sounds like, you know,
you're basically discussing the chicken and egg issue
that we have with new technology adoption, right?
Is it sort of the push-pull
where the researchers are saying,
hey, look, if you do this, it can be 10x.
And then of course, industry will come by and say like,
well, after you put in some reality,
it'll become 1.8x or whatever.
But then there's that part where there's like,
Hey, if you do this, it'll be this great new thing. And on the other hand, there's the whole,
you know, Henry Ford's quote, you know, are we going to build better horses? Are we going to
start building cars? And so there's that question of like, when are, when do you make the jump to
making cars and then stop like, I don't know, feeding your horses better food or whatever.
So that whole, that whole discussion there, that was like so interesting because from a
researcher's perspective, particularly a younger one, you know, there's
only sort of like the immediate problem around you. And as you get more senior, as you presumably,
or as you are now, you begin to have to think about market forces potentially more and more,
because then your impact is sort of dependent on whether or not things really get adopted, right.
And so this kind of ties into the next question, which is, you're at AMD Research, which is actually a relatively new lab,
and which has been phenomenally successful in going from, you know, a handful of people,
12, 13 years ago, to being this like sort of very effective research organization that has
close ties to its product team, as well as it had a lot of
research impact. And so I guess what I wonder, in an era where research labs sort of ebb and flow
and kind of struggle between being those researchers out there who can't get any product
people to listen to them, or being too in bed with product people and not really doing anything
particularly far reaching and out there, you know, how do you feel like AMD research labs in particular has sort of gone from this
tiny little group to this really sort of research powerhouse that's been really
effectual from both the research as well as the product impact standpoint?
I think a big part of that is always being clear with what the purpose of the
lab is, right? You know, from, from the get-go, you know,
the research organization was focused on, you know, the
fact, you know, we were very honest and truthful about it, right?
We are part of a for-profit company and what we do ultimately in some form or another needs
to deliver value back to the company, right?
Is, you know, for each individual individual for the organization as a whole are we
providing more value to the company than it's costing us uh keep us employed i mean that's kind
of what it all comes down to but that that honesty of your role in the organization i think helped to
really provide the focus that you know a lot a lot of research, it's almost,
you know, cliche that for, you know, you tell this for a lot of, you know, PhD students
and others is that, you know, defining the problem is often the hard part, you know,
how do you select the right research problem?
Once you've defined the problem, you know, people are smart, they can come up with solutions
that can innovate, right?
And a lot of the times coming up with the right problems is at the first almost unspoken step zero.
Having clear direction for the organization provides us a framework for prioritization of all the different ideas that we could pursue,
all the different research projects that one could imagine.
We don't have enough bandwidth to go after every possible thing,
right? I mean, that's true whether in academia or in industry, right? No one has, or most of the
time, most professors don't want to have a lab of, you know, 30 students. It's, you know, we all have
enough meetings as it is. And so, you know, how do you prioritize? How do you go about, you know,
choosing where to focus your effort, where to make your investments?
Clear direction in terms of having to do research that will provide value, will have impact
to the company, I think is central to why the lab has been successful.
And that said, as you mentioned, it's a balance where you don't want to be in a position where
you are effectively just additional engineering resources for the product team.
As a research organization, you do need to be looking further down the field.
And you do need to have that autonomy to take some risks and to push it further out. And so I think, you know, part of it is
having good leadership that has been able to walk that line and strike that balance, right? That,
you know, we have to have work that's not just ivory tower, pie in the sky, you know,
interesting, you know, intellectual exploration. But at the same time, you can't be really just
doing almost, you know almost just advanced development for
some features that the product team perhaps didn't have the resources to pursue.
I think it ebbs and flows over time as well.
I think that's where the leadership, both in terms of the management as well as the
technical leaders and them having a good relationship where you know
management can get the right technical input they can get the right business input everyone can talk
to each other again kind of in a very ego-free just you know what's the lay of the land what are
the you know best decisions for the organization and to be able to come up with those decisions of
you know where do we place our bets you know at the end of the day you know we are still speculating
it's you know it's the future we're doing research but At the end of the day, we are still speculating. It's the future.
We're doing research.
But you have to put the chips down somewhere.
And hopefully, we've collectively
brought in as much information together to maximize
our expected return.
That was a good perspective on how
you think about research labs within the context
of an industrial setting and having a very clear sense of purpose and vision
coupled with a healthy culture
that's ego-free and transparent
makes a lot of difference.
Maybe this is a good time to sort of wind the clocks back.
Since you were talking about AMD research,
can you tell our listeners,
how did you get to AMD research?
What was your journey like starting from the early days?
Because you've both seen academia and industry.
Maybe you can talk about the entire journey sure this is um i've been very fortunate in that i've had you know
just a lot of different opportunities along the way and i think one of the interesting aspects
is that and a lesson for some of the younger folks is to not be afraid or you'll be afraid how you feel is how you feel
but despite perhaps being afraid being unsure still taking some of those chances on you know
unexpected paths and routes uh and because i think my my career has taken a couple weird turns
over the years really really starting off from the beginning. Initially, when I was finishing up
my PhD, I kind of finished off a little bit weird on the job market cycle. And I'd been
originally wanting to pursue an academic position, which is what did happen. But my first time around
on the job cycle, I actually ended up with one interview for a position that might or might not appear.
It was an expectative interview from a department that was hoping to get a slot created that year,
and in the end, they didn't get the slot. And so I had basically zero academic job offers at the
end of that first cycle. Thankfully, or the way it worked out was that since my PhD
graduation or completion was a little bit off cycle,
that I was able to just stay on as a student for a while
and maintain my health insurance and things of that sort,
and then was able to reapply the following year.
And I had some other papers and things in the pipeline
at the time.
And by the second time around, things were more successful.
And so one thing is that certainly we all face rejections and failures of sorts, or as people like to say, deferred success.
And if it's something that you believe in, you want to go for, stick with it, Go for it. Again, maybe a little bit of survivor bias in these comments here
because I was lucky enough to get an offer on the second time around.
But in terms of kind of unexpected routes,
one of the things that came up was I had already received my offer
and actually for going to Georgia Tech. And at ISCA in 2002,
this was up in Anchorage, Alaska. I just, you know, happened to meet this guy, Brian Black,
who, you know, those who are familiar with die stacking kind of know him as, you know,
the godfather of 3D die stacking. And, you know, he was working at Intel at the time. And, you know,
he said, hey, you just come down, work for me for a year, you know he was working at intel at the time and you know he said hey you
just come down work work for me for a year you know learn a little bit more about you know things
on the industry side or whatever and you know basically that sounds interesting so i was like
okay that's wasn't what i was planning at all but you know i went back to georgia tech and asked
them like hey like is this like something we do can we like defer the start of my you know academic
you know career and you know it was a little bit complicated in terms of the the paperwork and like, hey, is this something we can do? Can we defer the start of my academic career?
And it was a little bit complicated
in terms of the paperwork and everything else.
But we basically managed to come up with an arrangement where
I basically joined Georgia Tech and went on a leave
of absence on the very first day.
It was kind of funny because I had already then started.
Once we made the arrangements, I had already started at Intel.
And I had to fly back to, you know, Atlanta to, like, sit through orientations on, like, 401k programs and things like that. Sorry.
But the thing was that, like, it wasn't, like, a normal path at the time. I think it's become a
little bit more normal now, but the lesson that I learned was, like, don't be afraid to ask, right?
Because if you don't ask, you're not going to get it. And so I asked and we found a way
to, you know, work through the situation and we got to work. And that was pivotal, right? Because
it's, you know, at Intel was actually where I first became exposed to 3D die stacking, right?
And honestly, just dumb luck. I was at the right place at the right time, having to be exposed to
right people and, you know, these new ideas. And new ideas. And part of it is that when you have the opportunity, you got to take
advantage of it as well. So subsequently, after going back to Georgia Tech, that became a
cornerstone of a lot of the research program, research agenda I put together. But I honestly
got incredibly lucky that I just happened
to be in the circumstance. But that circumstance also happened because I was willing to, you know,
take this weird detour on my, you know, on my career. This is, you know, year one of my career,
post PhD, and already kind of going off the planned path. But that can provide new opportunities,
new things that you learn about.
Another example of that,
after several years as Georgia Tech,
there was a new program that was just being started.
It was a dual master's program
between Georgia Tech and Korea University in Seoul.
And they needed a couple of faculty
to go to Korea for one semester to teach in person and so
you know it was they're kind of looking around and a little bit desperate to figure out like
know who to send and you know they came in the house if I'd be willing and you know I had some
you know hesitations right you know as I was you know my my partner would still be working from Atlanta back at home. I'd be over in another country for
four months where I didn't speak the language or anything, but it was also just kind of new
and different. And so at a certain point I said, well, why not? When is this kind of
opportunity going to come up again? And so I did it. That then opened up, again, just sort of by luck, circumstance,
whatever you want to call it, a variety of new opportunities as well. I think on one hand,
from the academic side of things, that basically was the start, the first step that eventually led
me to actually being co-general chair for ISCA 2016 in Seoul, Korea. Like that just would
not have ever have happened had I not spent those four months in Korea. I've just got to jump in
here, Gabe. And so being general chair of an ISCA sounds like a tough job. So I'm going to assume
that you regret going to Korea. Because if you hadn't, you wouldn't have been general chair
and had to go through all that stress, right?
It's, I think different people, you know, everyone's different, right? And so like,
yeah, general chair was a ton of work, but for me, it was also fairly rewarding.
This computer architecture committee has been very good to me over the years, And it felt good to be able to contribute back
and give something back.
It was definitely a lot of work,
but it was very different.
So, I mean, I know some folks,
especially a lot of academics,
look upon general chair as something that they would
not touch with a 10 foot pole
because there's practically zero that's technical about it.
To some extent- On top of that's technical about it.
To some extent- On top of that, you did it not in your home locale,
like most general chairs do it in their home locale.
You did it on the other side of the world,
which probably adds a whole other level of complexity to it,
which I'm just like in awe that you did.
Well, I mean, as with everything,
and this is, I think, a common theme, right,
is that you have to have a good team, right?
Everything we do, whether it's research, whether it's putting on a conference, having a good team, having the right relationships and trust among the team members is absolutely critical, right?
That could not have happened without having had a phenomenal co-chair. My co-chair, Professor Sun-Lin Min, who unfortunately passed away
a couple years ago from cancer. But he was just a phenomenal partner. And we were able
to, again, have very open discussions. He actually came not traditionally from the computer
architecture community. It was more from the advanced systems side.
And so we had a lot of good discussions in terms of conference expectations
and what types of events and venue styles and things of that sort.
And he was very open to listen about how we did things on the ISCA side.
But there's a lot of things that were local about how they do things in Korea
that I could not come in and assume I knew how things should be done.
Right. And both of us being able to put egos aside and really, again, thinking about like, how do we we both had a shared vision of how we put together the best ISCA ever.
Right. You know, I don't know if we did or not, you know, I'll let, you know, the attendees vote on that perhaps, but, you know, that's, you know, we kind of, you know, again, you want to
start off with a vision for what you want to accomplish, right? And once you have that vision,
then you can kind of drill down and work through, you know, what we want to do from a technical
standpoint, what we want to do in terms of the events, in terms of venue, about the experience,
where a lot of attendees, this is the first time they'd ever been to Korea.
What do we want to showcase and highlight about the country? And we just kind of just drill down through all of the different details. And for me, the way I looked at it and why it wasn't just
some horrible burden, but it was actually, I think, somewhat fun and rewarding was that
I viewed it as, hey, this, you know,
the computer architecture community is, you know, at least from an academic perspective,
you know, my home community, I'm now in a position where I could throw a giant party
for, you know, 700 of my best friends, and I don't have to pay for it. You know,
what can we do, right? And, you know know that's kind of the way i i viewed and
approached it and it's like you know how do we you know make sure that this is a great event
technically uh but also that it's memorable that it you know part of it is the community building
right we want people you know especially a lot of the students because maybe this is their first
time come to a conference right maybe they went to another conference that they didn't enjoy as much for some reason or whatever.
And they were maybe thinking about changing
to another area or something.
How do we give everyone a great experience so that, yeah,
this is where I want to be.
I want to be a computer architect.
If I have to bribe them with some good food or something
like that, hey, that's okay.
Human resource development, if it goes through the stomach, that works.
I've heard that's how they turned around micro in the early days, right?
They made sure they had good food.
I mean, you go to any college recruiting event or anything else, right? Food features prominently, right?
The students tend to respond to free food. And then once you've trained them, that habit doesn't seem to go away.
Well, I wanted to ask you a little more, Gabe, because, you know, so I've known you a long time.
And one of the things that I've always noticed about you, that's kind of remarkable. It's like,
I've never seen you perturbed. Like I've never seen you angry or perturbed
or anything like that.
And one of the enduring themes through you talking
about with us today has been about like being able
to put ego aside, you know, having a shared vision,
a clear vision for what is going ahead.
So one of the things that I've always thought about you
as you've sort of progressed through your career is just
like you make it look very easy, right?
You've always made everything look very easy. That's one of the things that I remember when I worked with you,
like a long time ago at AMD, is just that you have this talent for being able to just say like,
okay, this is what we're trying to do. This is how we frame the problem. We need to answer this,
this and this, and then you just go do it. And then like, you know, you just kind of,
it's, it seems very self evident, as if somebody is watching you do it. And so it seems like for
the progression of your career,
which you've been very successful,
is being able to put ego aside,
being able to construct good teams with good culture,
having clear visions.
And this thing, which just I'm observing about you,
this kind of unflappableness.
Can you talk a little bit about what you think it is
that has led you to
this? Because I also remember when I first met you, you were already faculty at Georgia Tech. And
let's just say that your behavior, I thought you were still a grad student because you were still
like fun and easygoing and all this stuff. I did. I had no idea you were already a professor,
which meant like you'd already done your time with Brian Black too. Like you were a couple
years out from graduation. And I was like, this guy is definitely a grad student. And then
you were a professor and I was like, oops, I made a wrong assumption. So maybe, you know, I don't
know, just going back over your career from like a, how you manage yourself over the course of
career to get to the point where you are, it sounds very clear that you haven't chased career
accolades. You've just sort of chased what interests you
and then the career accolades have come.
Can you maybe talk a little bit about that?
Sure, and I think, you know, in terms of, you know,
do I have my stuff together and, you know,
unflappable or whatever, there's,
yeah, I think it's really important for everyone,
especially for the younger folks,
but this doesn't go away with time, I don't think,
is to always recognize that what you see of other people is very different from what they are experiencing, right? Each and
every one of us has a mask, has a filter, right? You don't, you know, whatever things that
didn't work out for me, whatever challenges I had, you know, it's typical human nature that you,
you know, shield and you filter these things out from everyone else around you.
And so were there challenges along the way?
Absolutely.
Do I go necessarily advertising all of them?
Probably not, as most people tend not to.
There's this old saying that when you look at everyone else, what you see is the movie
trailer.
You see all the highlights, the big Michael Bay explosions or whatever. When you look at yourself, you see the the movie trailer, right? You see all the highlights, the big, you know, Michael Bay explosions or whatever.
When you look at yourself, you see the blooper reel, right?
You see all the outtakes, right?
And it's actually one of those things that I recognized this at some point.
And that recognition alone, I think, was very useful in terms of,
you know, not getting too shook up about, you know,
challenges and things of that sort.
Because when you look at everyone around you, everyone else looks like they're so successful
and everything else, right? But then when I realized that, okay, well, that's because they're
not showing me their outtakes either. I'm not showing them theirs. And so why should they show
me? You know, it's only fair, right? But once you realize that there's that kind of asymmetric
information access about what you know about yourself versus what you know about others.
It's easier to be easier on yourself really, right?
And so I think that helped me in terms of,
you know, some level of that,
you call it unflappable in this,
I don't think that's quite accurate,
but you know, some emotional stability perhaps, right?
In terms of just worrying about, you know,
what's going on in one's job, in one's career, one's research, whatever it may be.
Certainly everything that we do is also viewed through a lens of survivor bias.
Right. For every paper that I've published, for every project that's been successful, you know, there is a gigantic heaping pile of rejected papers of ideas that crash and burn and everything else.
And, you know, I don't put
them up front and center and like neither do most other people. Right. But again, you know, we all
see the stuff that makes it, we don't see all the stuff that, you know, that didn't make it, but are
still important stepping stones to get to that point, right. You know, how do you refine and
adapt your ideas? What are you learning along the way? And another kind of thing that I've learned
over time, and it's one thing to,
you know, intellectually understand it, it's another thing to be able to emotionally accept it,
is being able to recognize what things do you have control over, what things are outside of
your control, right? Because basically to get upset, get worked up, anxious, whatever, about
the things that are outside of your control
is pretty much by definition, a waste of energy, right? Because they're outside of your control.
Anything you do is not going to affect that. And learning to recognize those situations,
those circumstances, it's taken years and obviously, you know, no one perfects this.
And it's something I still work on, know learning to recognize the situations and then be able to take a step back and sort of to emotionally diffuse yourself a bit and then
then re-channel your energy on the things that you can do something about that you can be productive
about you know that has also helped but that's you know this is something that's taken you know
it's one decade's worth of practice and obviously you know still not entirely there it's you know I
don't think something that anyone can perfectly achieve in a lifetime,
but one can continually improve.
So I think that's also been just kind of a practice
that I've kept in mind over time.
Another, I think, really important aspect of this,
and again, I think it's important for some of the younger folk.
Again, it's related to that whole bad the movie you know outtakes versus the highlight
reels but what often comes up in these you know types of discussions is a whole like imposter
syndrome type of thing as well right you look around you look at all like the senior people
and like oh my god like you know like like you never see any of them having imposter syndrome
right but the fact of the matter is everyone does there's again but we don't show it right that's buried in our outtakes real and initially it's one of those
things earlier on in your career i think you know kind of hits you a bit harder right you're you're
thrust into new situations and you're just like oh my god like why did they ask me to do this i have
no clue what's going on i'm going to fail miser miserably, et cetera, et cetera. All the standard imposter syndrome thoughts creep in. But one thing that I've learned over time,
and it's helped with these types of situations, is that the reason you have imposter syndrome,
the reasons make you struggle. One of the factors of having imposter syndrome is that you're
actually being put into new situations, right? It's because you're being challenged.
People trust you and are giving you opportunities to grow, right?
And so it's kind of flip side of it.
If you never feel imposter syndrome, you may be stagnating, right?
You may be in a place where you're not getting the opportunities, right?
And so to me, imposter syndrome has now become a somewhat positive thing or at least
i tell myself that right because it means okay i'm in a position where i'm going to do something new
i'm going to learn right that someone trusts me with something new right so like it's actually
a sign that things are going well if you you know have imposter syndrome right and again i think
a little bit of this may be just me me to convince myself to make the situation, you know, a little bit easier to handle. But I think there's
some truth to that, right? Like if it, you know, if you're just always comfortable, that means you
probably are not growing on the trajectory that you potentially could be going at. So I mean,
a lot of it also comes back to your original question, you know, how do you, you know,
keep it all in check and whatever. It's like, you don't always. You don't always, but I think most of us have learned to
filter and at least hide it out a little bit. That's just human nature. But that said, there
are ways to improve how one handles it over time and how to focus your energy on the things that will be most productive that
you do have some control over. And I think that's worked reasonably well-ish for myself over the years.
Yeah, that's great. Thanks a lot for that perspective, Gabe, and also very valuable words of wisdom,
I'm sure, for young researchers and people further along in their career.
Speaking about focusing on energy into the future and so on,
maybe I can end on what's on the horizon for you?
What's your vision, both on the technical side
and maybe also in terms of how do you build up our community further?
Because that's one of the things that you've been really good at,
sort of mentoring and community building.
So both on the technical side and the community side,
what's on the horizon for you?
What is exciting for you? What would you like us to see more uh do uh do more of yeah i mean one of
the things from the technical side and i think the career architecture committee has been you know
pretty good at this and you know i really want to encourage uh you know especially those in in
positions of whether it's you know program committees or funding, research grant proposals, things of that sort,
is to continue to keep a broad mind,
a broad definition of what computer architecture is.
I think we've had everything from the traditional
CPU microarchitecture all the way up to the data centers
and now machine learning systems
and all manner of different things.
And keeping that very broad perspective
and a very broad or loose definition
on what is a computer, what is architecture,
has been very good for this community.
It's allowed us to grow, to adapt.
The technology keeps changing at an incredibly fast pace.
And if we get overly narrow, overly prescriptive of our definition of what is
and isn't computer architecture, I think we risk becoming irrelevant over time, right? If that
definition misses, you may miss the boat on some really impactful areas that other communities may
maybe take a run with. And so I think from a technical perspective, we want to keep our view very broad.
And as we enter the twilight of Moore's law,
it's in some sense kind of exciting in that it now forces
our hand into looking at a more diverse set of solutions.
And we are working across these different,
we talked about earlier on,
working across the stack, being more interdisciplinary.
And part of it is like, abstractions are not bad, right?
Like they're good, they provide productivity
and traditional Moore's law,
it was actually, I think, better to have cleaner layers
between everything where everyone can kind of work
more focused in their layer of the stack
but we're now in a world where you know we cannot continue to scale traditional you know performance and capabilities of our systems in in that approach right so we just have no no choice
really that we have to work in a broader more collaborative fashion uh but that's again I think
a very exciting so many new things to learn right I? I mean, when I joined AMD, you know, the bad dad joke was that I couldn't even spell HPC, right?
And the time that I had with the Exascale program, you know, I learned a ton through that process.
And a lot of what I learned was not perhaps what you would call traditional computer architecture, right?
But that helped make my computer architecture,
you know, thinking better.
You know, for myself personally,
with all of the new stuff coming along and, you know,
of course, machine learning is, you know,
taking up a lot of the attention,
but there's all sorts of other fascinating topics that,
you know, as you learn more about,
it expands your mind and can help you think of, you know,
new solutions, new approaches,
or even the more traditional problems that many people may be working on.
So I think from a technical perspective, I do see increasing heterogeneity, increasing end-to-end thinking of design.
Even if my day job is still looking at CPU microarchitecture, I'm still thinking about these things in a much broader context than I perhaps would have 20 years ago where everything was just how do we optimize spec 95 or whatever.
It's a different world, but for the better.
From a community standpoint, I think it echoes a lot of that.
We're going to see our community continue to grow. I do think it's vitally important for the computer architecture community to continue to increase access to everything we do to more and more people.
It's just very competitive in terms of trying to attract and recruit talent into our master's programs, our PhD programs, etc. There's a lot
of people out there that are looking at many different other competing topics. There's a huge
amount of competition with the software world, even as software and hardware become more intertwined.
And to me, that says we have to maximize accessibility to everyone.
And that's across all different types of factors.
So there's all the dimensions that get the greatest attention in terms of gender and ethnic makeup.
But there's also socioeconomic.
Are we really making the best usage of all the different universities we have across the planet. There's
talented people everywhere. Are we getting everyone into positions where they can succeed
and contribute to what's just becoming harder and harder technical challenges and problems?
It kind of goes back to the ego comments, right comes with ego, you know, comments, right? Like, you know, none of us should think so highly of ourselves that, you know, this, you know, some small cadre of us are
the ones who are going to come up with all the answers for, you know, the next 10 to 20 years
of computing, right? The next big idea or the handful of, you know, medium ideas that, you know,
often actually make things move at a faster rate than the big ones, which sometimes could be more
controversial. You know, those could be coming from all over the place right from all over the
planet from all types of people and so you're really making sure that we can get that next
generation of computer architects you know identified and excited fed if need be um you
know i think that that's a really important aspect of of the community the community
is the people right we we are the community and uh last i checked i keep getting older so
uh we're we need to recruit you don't look at that you don't look at it that's only because i shaved
my head when i when i started when i started shaving i discovered it stopped my aging right
because my my hairline keeps receding, the hair keeps turning gray, but no
one can tell.
Yeah, you look the same. And it's been over a decade. And
you're indistinguishable from your early photos. Yeah, early
photos and how I remember you from the early days when I
thought you were a grad student.
I mean, this has been a really, really wonderful conversation,
Gabe, I think we've spanned, you know, from technical to career to really sort of like kind of deep thoughts about how to think about ourselves and the community.
And so we really appreciate you coming to talk with us today.
And we're so glad to have had you.
Great. Thanks a lot. This was kind of fun.
Yeah, thank you, Gabe. It was a real pleasure talking to you.
And to our listeners, thank you for being with us
on the Computer Architecture Podcast.
Till next time, it's goodbye from us.