Computer Architecture Podcast - Ep 4: Cross-layer Optimizations and Impactful Collaborations with Dr. Mark D. Hill, University of Wisconsin-Madison / Microsoft
Episode Date: March 11, 2021Dr. Mark D. Hill is a professor emeritus of computer sciences at the University of Wisconsin Madison, and currently a Partner Hardware Architect with Microsoft Azure. He has made numerous contribution...s to parallel computer system design, memory system design, computer simulation, and more. Â He is well known for his advice and collaborative work style, having published papers with 160 different co-authors. He talks to us about cross-layer optimizations, impactful collaborations, and visioning for computer architecture research.Â
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting-edge work in computer architecture and the remarkable people behind it.
We are your hosts.
I'm Suvini Subramanian.
And I'm Lisa Hsu.
Today, we have with us Professor Mark Hill, who is a Professor Emeritus of Computer Science
at the University of Wisconsin-Madison, where he taught for 32 years.
He has made numerous contributions to parallel computer system design,
memory system design, computer simulation, and more.
He's well known for his advice and collaborative work style,
having published papers with 160 different co-authors.
He is the inventor of the widely used three Cs,
model of cache behavior,
co-inventor of SCDRF memory consistency model,
and a recipient of the Eckert-Mockley Award in 2019.
He's currently completing his term as a member of the Computing Community Consortium, where he was
chair from 2018 to 2020. He's here today to talk to us about industry-academia synergy and his
vision for computer architecture research in the years to come. Since taping this episode, Mark has
joined Microsoft as a partner hardware architect
with Azure, reporting to corporate vice president Steve Scott. A quick disclaimer that all views
shared on the show are the opinions of individuals and do not reflect the views of the organizations Mark, thanks so much for joining us today. Welcome to the podcast.
Thank you.
So in broad strokes, tell us what gets you up in the morning these days.
These days, I am particularly excited about how really for the first time in my career,
cross-layer optimizations are both necessary and possible. With the end of Dennard scaling and the slowing of Moore's law, at least
in two dimensions, you know, if we're going to give the illusion or the reality of performance
energy improvements, some of that is going to have to come from getting the fat out
and doing cross-layer optimization.
So, you know, many years ago, 20 years ago,
if you wanted to propose a non-binding prefetched instruction,
people would say, oh, no, you can't.
You can't change the interface to do that.
And now much more is on the table.
So I find that very exciting because I like the kind of work that
connects multiple things together. Do you find that that's more possible these days because
there is a bit of a trend towards verticalization, meaning that it's not just because
there's this necessity because of all these things slowing down, but changing the interface across layers and across companies is really, really difficult.
It seems like these days there's a little bit of, you know, Google is building their own silicon.
And do you think that's part of it as well?
I actually think the revertical integration of industry is a consequence of the end of Bernard scaling and the slowing of Moore's law, right? In the 20th
century, we were moving so fast and so well conventionally that it made sense to put blinders
on and concentrate on your own layer. And now there's just so many opportunities that are,
you know, from the technology requiring you to cross layers, and I guess also to some extent from top down.
And so you're absolutely right.
What you said, Lisa, is that there was a time when there was equipment manufacturers
and semiconductor equipment manufacturers, and there was the Google Intels and the AMDs,
and then there was the Dells, and there was the Microsofts, and there was the SAPs,
and everybody was in their layer.
And now we're seeing software companies doing hardware and hardware companies doing software.
In many ways, the software companies going down have almost been more successful at doing that
than hardware companies trying to get and really deeply understand software.
But that's a longer story.
So picking up on the theme of cross-stack work that needs to be done in today's age,
tell us a little bit about how you think about doing this kind of cross-stack work
when you need people from, I guess, different expertise from different domains
to maybe come together and work on problems.
Yeah, so it's a difficult challenge.
I mean, it depends how far you cross.
If you're crossing just, you know, sort of one layer, it's easier.
It does take time to get people to talk to each other.
You know, one of the reasons, as I mentioned, I had 160 co-authors so far, is that, you
know, I want to work with people who have other expertises.
So, you know, at Wisconsin, I worked a lot with David Wood, who's at the same layer, but also with, you know, like Jim Laris and Ross Bodick and Mike Swift, who are at different layers.
You know, I think what's key is identifying sort of what are the opportunities across the layers.
So let me give you the example of direct segments.
Now, this actually hasn't been adopted, so you could argue that it's a failure.
But I thought it's a really interesting research project.
So the issue here was that we saw that large servers running things like key value stores,
et cetera, were losing a ton of performance to TLB, Translation Look-Aside Buffer, misses.
And yet they weren't really benefiting from much of what virtual memory has to offer.
Like, for example, they never swapped because the key value store was sized to the physical memory available.
And so you paid all this overhead for no benefit.
And so what direct segments did, and we won't have to go into the details, but it said,
here's a way to get rid of the overhead for the main data structures, but yet keep backward compatibility all the way. And that required, you know, students that could cross layers and supervision from Mike Swift on the operating system side and me on the hardware side sort of do the work.
I think that's and it started with analysis, right? Analysis precedes synthesis.
Right. So we analyze that there was this problem and that there was also this opportunity.
You didn't have to solve all problems if you solve this really important special case.
Although you weren't doing much swapping of the main data structure.
I remember that work. What do you think was the main reason why it didn't get widely adopted? Because I think there is a lot of code running,
particularly in the cloud, that doesn't swap.
You think it'd be useful?
It's a non-trivial change to the virtual memory hardware.
And that's a difficult thing.
We did a later version that didn't mess with the level one TLB.
It was outside the level one TLB in order to make it easier to adopt because you weren't messing with the super critical timing paths of the L1 data TLB and L1 cache.
I think it was just too radical. And also there was, you know, pretty good progress on trying to make sort of big pages work and
Avercheck Battergy's, you know, fine work on some, you know,
ways of automagically creating bigger pages and fusion in the TLP.
And that was just sort of more incremental.
Yeah. So that kind of begs the question then, you know,
this cross layer optimization, we are approaching that point where it's, or if not already there, where it's really, really needed and necessary. But as you point out
in a lot of your sort of expository and written work, that's really hard, right? Because you kind
of have to get seven different pieces to move at once. You seem to have been able to do that
successfully a number of times in your career by kind of having broad impact even at places like companies.
So what kind of advice do you have with respect to exactly that, you know, moving all seven dwarves at once?
Well, I think part of it is becoming more possible with this revertilent integration of industry, right? You know,
so companies like Apple have no trouble saying, hey, we want to incorporate accelerators in
our iPhone operating system. And so we're going to do it, right? Whereas if, you know, the hardware manufacturer is AMD or Intel and the operating system manufacturer is a Microsoft, you know, Microsoft may not see the value in supporting something in their operating system for some benefit that some third party gets.
And so I think it's going to be easier when companies are
controlling the sort of whole value chain. And the other thing is, remember, it's so obvious in
retrospect, but when Moore's Law and Denard Scaling were cranking forward, doubling every two times,
that just was like this tsunami that swept away many, many good ideas. So you had
this crazy idea to build this thingy, you know, by the time you got it deployed, you know, the chips
were four times faster. And so, you know, I think now as things slow down, we're going to, people
are going to have to consider more alternatives, and the vertical integration is going to make that possible.
Google controls their cloud infrastructure.
Amazon does it quite a bit.
Microsoft does quite a bit.
If they want to do something in that space, it's going to be more possible.
Do you think there is any drawbacks to that kind of verticalization model? Because I mean, like you were saying in the old, you know, 20 years ago, every layer could go hyperspeed in
its own silo and it would all kind of work together because everybody had their agreed upon
interfaces that were kind of standardized across the industry. And now with this verticalization,
it seems possible that now, like, because Apple does control its entire stack, it's very hard to
do anything to impact Apple except what they decide to do for themselves.
And similarly, all these major cloud providers that you just named, Amazon, Facebook, Google,
Microsoft. And so what kind of drawbacks do you potentially see from this shift?
I mean, we've heard the potential gains that you just said.
Now everybody can kind of slow down and do cross-layer optimization,
but what should we be watching out for?
Sure, there's obvious drawbacks.
I mean, the great thing about the layers was it was a divide and conquering of complexity. Everybody could work almost in parallel on their items. And that
facilitated many things. By the way, when I say verticalization, I don't mean like
throw everything out. I mean much more selective punch throughs for important things.
In terms of companies, you know, the advantage of the layers and companies operating in layers is
that you had sort of more competition.
You would buy your CPU chips from more than one vendor and things like that.
And so if you're if you're integrated in one company, you know, you're getting this from, you know, your company's group that's doing this work.
And if the group is not facing, you know, tremendous competitive pressure, maybe they won't advance as quickly.
And, you know, sometimes you see in organizations, by the way, they're so big that even though they theoretically should be able to do cross-layer optimization, you know, it's sufficiently far apart in the org that it doesn't happen as much.
So there's no panacea.
But I'm actually old enough to remember when the industry was vertically integrated before.
And there was IBM and DEC and some other companies that were vertically integrated.
So I suspect that this is a circle of reinvention going on.
Right.
So do you see some echoes from the past
and learnings that we could apply to the current age?
Like when people are trying to do like cross-tack optimizations,
for example, you talked about selective puns-throughs, right?
Like how do you go about identifying the right avenues
where you can make these changes?
How do you go about identifying the right partners to collaborate can make these changes? How do you go about identifying
the right partners to collaborate with and move things forward? Well, let me give you two answers.
First of all, you know, there's a model that it's just going in a circle. I think it's much more
spiral. So if you look in a certain way, it looks like it's a circle, but it doesn't actually go
back to the same place. It advances.
What was the other part of your question? How do you look through for punch-throughs? I mean,
first of all, what's really important in research is to identify problems and to do analysis first.
So the most important thing, whether it's cross-layer or otherwise, is to identify a problem that if you can solve it, people will care about it.
And you have some reason to believe that you can solve at least part of it.
And, you know, by focusing on problems first and not solutions first, don't play Jeopardy.
I've written this somewhat, you know, this somewhat. People have this tendency to say,
I got a new mechanism. Let me find some use for it. I wouldn't do that. I would look to see where
there was pain. So in that direct segments work, we saw that there was pain in TLB miss rates that were ridiculously high. And so then you start thinking hard about
how to do that. Can I tell the rage story? I mean, this is not my work, but I think it's a
really good example where if you frame a problem correctly, the solution becomes a puzzle,
almost a simple puzzle. So Patterson and Gibson and Garth Gibson and Randy Katz were observing that classically,
this is in the 1980s, the cheapest place to store a bit was on a washing machine size
disk drive that the super duper mainframes had okay but then personal computers
came along and eventually had enough demand for small disk drives that the small disk drive volume
went through the roof okay and it actually became more cost effective to store a bit on a PC drive than on one of these big drives.
So couldn't you store your big data on PC drives?
That was the question they asked.
Because that was this opportunity that was caused by this differential change and they were looking for a problem and the answer was no because the pc disks didn't meet a very high
reliable reliability standard individually because they were going to users not enterprises and if
you put a whole array of them the meantime the failure just killed you it just didn't work
so could you do something to use these disks and make the mean time to failure be really excellent?
Arbitrarily high.
And so once you start thinking that way, you think, well, we need some kind of error correction codes.
And since the disks actually, when they fail, they let you know they're not there.
Then you turn to erasure codes.
And it really wasn't brilliant to do the erasure codes, right?
What was brilliant was to be tracking this trend where it's a different place to store data.
And then the problems fell out.
You know, the opportunities fell out.
And so, you know, and not all my research projects are that clean,
but I think that's a perfect example of, you know, when you're looking for the research to do,
you're looking for the problems and the trends and the inflection points. You're not starting with,
boy, I know how to do erasure codes. What should I use it for?
Right. So let's expand on, you know, picking those opportunities or looking for the
right problems. So I'll start from the academic side. So what can academics do? You know, how do
they find out avenues where they can get access to these problems or, you know, understand that
there are these problems that are there? I mean, you've talked about giving talks and things like
that, but how do you think about how academics should go about scoping out these problems?
So a couple of things.
One is you should watch trends and try to reflect on whether, you know,
you're going to get an inflection point, right?
Like an example of those PC disks becoming cheaper per bit than the big disks.
Or there was a time when people talked about killer microprocessors because in the old days, obviously, a big iron machine was going to be better than this wimpy calculator based microprocessor.
But that flipped.
And so you can, a good way to do that is to have your pulse on industry, which you can do by holding industrial affiliates meetings, by sending students on internships and by doing sabbaticals in industry.
The last is, you know, basically the high reward, high cost method.
But, you know, I have found it pretty effective.
And, you know, like when your students go to industry, you know, remind them the same thing for sabbaticals.
You cannot come back from industry with their solutions.
That is their intellectual property.
Right.
You solve some problems for them because they're paying you and they may want to hire you.
But you also look for problems.
Right. Look for problems that
are just emerging. Look for problems that are being band-aided right now because they're not
important. But your gut tells you they're going to become more important. And I just think if you do
that, you're just more likely to load the die in your favor to pick important problems.
So just flipping the script on the other side, so you talked about how academics can engage
with industry and look for these problems. From an industry practitioner's perspective,
how should they think about engaging with academics? Why is it important for them also
to do that in this particular era? Okay, well, I'm not completely qualified
to answer that question because I haven't spent a great time in industry. I mean, Lisa perhaps can influence their work, they might do work more relevant to your
product. The other thing, and Cliff Young sort of really advocates this, is that industry sometimes
has solutions and they don't even necessarily completely understand like why it works and what's possible.
And academics take it back and generalize it and study it and, you know, look at the possibilities and then bring it back.
And then that can be used by industry to sort of do even better, that we can really play complementary roles.
And I think that's the reason for engaging. The problem with engaging,
as with most things, is time, right? You only have so much time. But I think it's important
to remember whatever your job is, you can't spend all your time running in the direction
you're running. You also have to spend some time asking, are you running in the right direction?
So are there elements that capture a good collaboration,
things that you look for?
Well, I mean, you look for somebody that compliments you, right?
So many other than David Wood,
a lot of people compliment me because of those areas.
David Wood and I compliment each other based on our working styles.
You can, the caricature of that as a tortoise and a hare.
I'm a tortoise and he's a hare.
And so I would plod along and get things done and get a framework and he would make it better at the last minute, which was not my temperament.
And so he's very creative, too.
So you look for people that compliment you.
I think it's important also is that it's, there's a natural tendency to overestimate your own
contribution, right? If you, if you asked a bunch of people who collaborated to honestly,
unbiased estimate their contribution, and you sum them up, you get way more than 100%.
And so I think it's important to always
try to correct for that you know assume your contribution is smaller and you know
be humble in the credit because the goal really is to have renewed collaboration
right and if somebody is not a good person to collaborate with, they take all the credit.
They they have their elbows sharp in this net. You finish the collaboration or maybe you abort it.
But then, you know, it's no thank you to more collaboration.
Conversely, if you ever see evidence of somebody doing recurring collaboration, you can always be sure from the outside that both were contributing
because nobody is going to do recurring collaboration carrying the other person.
Yeah, that's a great way to picture it, Mark. I wanted to follow up really quickly on something
you were saying before about trusting your gut and deciding that there's going to be an inflection
point and this is going to become important. I think one of the things that I've observed among people who tend to do most well
in our field or any field rather, are those who have the confidence to trust their gut and not
be like, oh, look, everybody's running this way. Let's run this way too. And so some of the skill
is A, doing exactly that,
like, am I running in the right direction, spending some time figuring that out and be
having sort of the strength to run elsewhere. And, and then finally like having the wherewithal
and ability to sort of say every, some people should be running with me this way instead.
And you seem to be particularly good at that kind of really meta stuff. So can you talk a little bit about that and how you maybe developed your nose?
Well, remember, I mean, I think Doug Berger has a nice quote.
I don't know if I have it at hand, but, you know,
basically the most profound stuff is when you have a contrarian prediction
that turns out to be correct.
As a professor, I like to think that you really want to develop a
research portfolio, right? Some of the things are shorter term and safer, and some of the things are
sort of longer term and more radical. And that allows you to do some contrarian stuff,
because you also have something, you know, sort of more straightforward coming.
I think the biggest thing you want to resist is jumping into something because you just saw a couple of good papers written on it.
I mean, that guarantees that you're going to be, at best, a fast follower.
You know, Guri Sohi, who went on to get an Eckhart Mochley Award and get into the National Academy of Engineering.
I mean, he was working on processors in the late 80s and 90s when everyone else was saying processors are fast enough.
It's all about memory and storage and interconnect. And he said, no, no, no, processors are not fast enough.
And, you know, that turned out to be right. I'm not sure I have perfect operational advice for
you, Lisa. But, you know, it does come sometimes from some watching these trends. And the key
thing with watching trends, especially as an academic, is you don't have to get the timing
right. Right. I mean, you have to just, you know, like I worked on multiprocessors
because I thought they were important.
And maybe they were important in niche, but I was wrong for like 20 years.
Okay?
And then suddenly multicore makes them really important.
And it's not like I anticipated, you know,
Denard scaling,
forcing us to multi-core. And so there's no magic there. But if you just follow others,
there's just a limit. So, you know, think about can you make some of your work, you know,
not following others. Yes. So this seems like a good transition time to go into discussing some
of your visioning work, because, you know, the ability to do this sort of visioning is highly
dependent on your ability to kind of detect these trends, whether, you know, just in time or very
early on and say, hey, no, this is, I think, the direction broadly that the field should go.
Yeah. So you're talking about the Computing Community Consortium,
which is essentially an NSF-funded think tank.
I did not found it. It predated me.
So what happened was early in their existence, in 2012,
I somehow got roped into leading a white paper called
21st Century Computer Architecture.
And we worked with about 10 people who were like the leaders of SIGARCH and TCCA and a few other organizations.
And we put together a white paper because NSF was itching to do something,
you know, in what they were already seeing as a post-Moreslaw era wasn't there yet, but it was coming.
And so we did a white paper really at a perfect time because they wanted to do something.
So it's like there were clouds that just needed to be seeded and then it would rain. And this resulted in NSF programs starting with XPS and they keep changing the order
of the letters and things.
You know, $16 million a year and still going, which, you know, for industry types, that's
like nothing, but that's a big deal for academics.
And so since no good deed goes unpunished, people thought, wow, maybe we
should put him in this organization because, you know, he herded these cats to get this
white paper done. Actually, we did it in like two months, which is lightning speed.
And so then, you know, I was on the council and then executive committee and then eventually
kept getting promoted, if you will. And, you know, I was department chair. And then I was on the council and then executive committee and then eventually kept getting promoted, if you will.
And I was department chair.
And then I decided after being department chair that this was a really good thing to
do.
And it's fascinating because I did things well beyond computer architecture.
So in terms of this herding of cats, I remember when we were at AMD together, it often came
up.
It's just like herding cats.
We're all herding cats all the time. And I find that the more senior I get, the more older I get that
there is a lot of this technical work aspect. And then there's just the cat herding aspect.
And then maybe this plays into what you're saying before about, you know, there's this cost of time.
There's reward, but you have to kind of invest your time because you want to be able to draw the ideas out of,
draw ideas and understanding and communication
and like shared vision across this large group of disparate people.
And so when it came to that white paper,
that it's, you know, to do something like that in two months
with all these leading lights,
what are kinds of some skills that you think would be useful?
Because I think what we, as a field,
we teach our grad students a lot about technical stuff,
but we don't necessarily teach them about cat herding.
There's no cat herding one-on-one, right?
So what kind of advice do you have to give about that sort of a task where not
only do you have to hurt 10 people,
but 10 people who are sort of leading lights in the field,
which probably just makes it that much harder.
Right. So I'm not sure I have simple rules.
I mean, one thing is that everyone wants to be treated with respect and listened to.
And so, you know, I try to listen to people. My wife says I don't, but
everyone else doesn't do. You know, and, you know, you consider what they have to say and you
sincerely make everyone believe, because it's true, you know, that they're going to have an
investment in this product and the product is going to be good.
And that motivates people to sort of make their contribution and want to be part of it.
By the way, the thing I didn't say to the previous answer with visioning is that
so often the vision is not actually brand new. In fact, that report was something that like all the architects knew,
just the story hadn't been told and it wasn't known elsewhere. And so the goal was really,
in some sense, it was to discuss what we saw as the present to everyone else who considered it
the future. And so William Gibson, the great
science fiction writer says, you know, the future is here. It's just unevenly distributed. And,
you know, that's, that's my experience. Interesting. I like that quote. I think one
other thing that we were curious about since you, through your long, long you know 32 year career at UW you've taught a lot of
students um what kind of skills do you try to make sure to impart upon them I sometimes get
questions from professors like what are we not teaching our students and um you know like we
kind of just mentioned before there's usually pretty good technical education but uh what kind
of meta skills do you find are important to impart?
And before they answer that, let me answer a question about selecting students.
I find the most important thing to look for, and Lisa, it'll be interesting when you think
about this for selecting employees, is fire in the belly.
People who really want to do work.
And you can give them like little tests and go off and read this paper and come back and discuss it. And you can see if they have fire in the belly. People who really want to do work. And you can give them like little tests
and go off and read this paper and come back and discuss it. And you can see if they have fire in
the belly. You want people who can program pretty well. You want people that are, you know, pretty
intelligent. But, you know, there's all kinds of standardized things that sort of filter that by
the time they get it. It's fire in the belly that you really want. So I think what's the mistake is,
to come back to your question,
is advisors who think their job is to teach technical knowledge, right?
Once you've got that technical knowledge, you're done.
You really want to teach people a lot about the process,
thinking about how you think about problems,
thinking about how you design experiments, that sort of stuff.
I can't emphasize enough how important written and oral communication are.
Our job is not just to discover things, but it is to communicate it.
And if you can communicate it better, people will want to work with you more.
People will think more highly of your work.
It's a total win-win.
But once again, communication takes effort.
We can talk about that if you want.
The other thing I really emphasize is that you have to learn to take criticism, even criticism that you think is wrong.
You have to listen to it and you have to say, is what this person said true?
If it's not true, why did they come to this misunderstanding?
How did I miscommunicate that led them to do this?
And you want them to be able to take criticism from their friends and colleagues because they're certainly going to get it from the referees.
Referee number two, you have to be about separating criticism of work and not internalizing that as criticism of self, right? Because if you take of it as a criticism of you personally,
then there's this natural defense mechanism that kicks in.
And by the way, the flip side of that,
you were asking about working with people and cats of all levels.
You never, ever criticize a person. You only criticize actions.
And you can even, it's often helpful to compliment that by saying, you know, let's say you were somebody who was a grad student who was sloppy with some data analysis and showed you results that were bogus and you found some flaws in them.
You know, you say that you really need to double check things, you know.
For someone as gifted as you, I expect more.
Okay.
And so if you criticize the action and compliment it with a personal compliment, that works well.
If you do a personal attack, you are an idiot, you know.
They're going to say inside, no, you're an idiot.
Even if they don't feel comfortable saying that to you, it just doesn't work.
Yeah, yeah.
Those are wise words.
And I find the longer I work, that the more those kinds of skills become important, even
the balance of the pie in terms of how, what you can accomplish tilts more towards being able to do that
than just being able to program a bunch of stuff.
You've spent like several decades in the architecture community
and seen like various inflection points and so on.
So what do you think the architectures community is doing well today?
And where do you think we need to focus more or what we can do better?
So first of all, let me say that I think, you know, on balance, our community functions
quite well.
We get a lot done.
We are much more connected to industry.
And I think that's a good thing than many other practical fields.
And we really advance things and we change topics, right?
You don't see, you know see microarchitecture dominating things.
Now you see neural network accelerators dominating things.
Now maybe we respond too much to trends,
but I still think it's important to respond to trends.
I'm going to say what we do less well,
especially now that we have so many conferences
and people are writing so many papers,
people are writing too many papers in my judgment,
with the belief that the more papers you have,
the more impact you'll have.
I mean, there's plenty of people actually less well-known than me that
have twice as many papers. As I've always tried to write fewer papers, I try to make each one good.
Okay. Not complete success, but, you know, and you can't, you know, you can't write, nope. You
can't really take that to the extreme because your students need some amount of papers to get jobs.
But don't think in terms of writing a lot of papers.
And because we're writing so many papers, our reviewing process is under tremendous strain.
And when it's under strain, then a couple of things happen.
Randomness increases a little bit, which allows people to think they should just submit and hope for the best, which is unfortunate.
The other thing is I think we take too many of the incremental safe, flawless papers.
And we're very good at rejecting a paper which the program committee disagreed on whether or not it was important.
I'm convinced that if you took one or two sessions worth of the safest papers in a conference and replaced them with one or two sessions of out-there papers at that conference,
that it would be a better conference.
And some of those out-there papers will end up being seminal,
and some of those out-there papers will be forgotten.
But probably all those incremental papers were forgotten. And we fear accepting a paper that
somebody later finds flawed and not important, much more than we fear moving too slowly accepting
sort of really safe papers. Yeah, I think in our second episode, when we talked to Bill Dally, he was
saying one thing about, you know, because he's had experience on both academic and industry, that
in terms of writing a paper and the spectrum of getting an idea to getting something into
production, the paper is only like 10% there. I can't remember the number exactly, so I'm going to just apologize to Bill in advance.
But, you know, it's very early.
There's a whole lot of work to do thereafter.
So if the focus is just on doing that first 10% of work
and if the ultimate goal is industrial impact,
there should be a longer follow-through, I feel.
Absolutely, there should be a longer follow-through.
I mean, that's one of the reasons why I pushed,
you know, internships and sabbaticals and also graduating students to industry.
I think the other thing about the follow through is that the reality of anything in computer architecture is way more complicated than the paper.
And therefore, if your idea is pretty complicated in the paper, it has much less chance of impact.
And so many of the ideas that I'm most proud of are sort of ridiculously simple, really, like the three C's.
It's so obvious in retrospect that you can't even believe it needed to be invented.
And so, you know, striving for simplicity is a good thing.
Of course, it can be hard because referees are not always fond of simplicity.
Just picking up one of the themes there.
So you talked about once you have an idea in academia and the time it takes by the time it actually gets into a real product
or is pretty long so how do you think about the time constant for you know engagements or
collaborations that you have especially if the end goal is you know getting to some impact like
what are like neat are there like cutoff points or you know do you engage more with students things
like that you know i try hard to have impact.
I write papers.
I give talks in industry and put out students.
You can't completely control what industry does,
and sometimes ideas are not adopted for reasons that are completely non-technical.
Also, there's a whole patent thing going on so that industry doesn't necessarily want to acknowledge that the idea came from you.
There's an incentive to sort of reinvent it to protect themselves.
So I think you want to put those ideas out there and control what you can control.
And so what you control is sort of doing good work that people find interesting.
And I personally believe it's a contribution, even if somebody,
you know, reads some of my papers and it makes them think hard and they do something different.
You know, one of the questions is, you know, what is success, right? And, you know, there's two extremes. One is completely external recognition.
And the other is internal satisfaction.
And, you know, it's hard to live completely on one of those extremes.
But I think more people would be better off striving for internal satisfaction and some external recognition.
Internal satisfaction and some external recognition. internal satisfaction and some external recognition.
That's what I've tried to do.
And it's both more what you can control
and often by keeping the focus on doing good stuff,
some of that external recognition may come.
And if it doesn't, you know, so be it.
Maybe this is a good point to wind the clocks back a little bit.
Tell us about your path to University of Wisconsin, how you got interested in computer architecture,
how you got to where you are today, any interesting episodes or inflection points in your life
leading up to this point. Yeah, so when I was in ninth grade in middle school, somehow I designed a mechanical binary adder,
which involved a bunch of housing address labels rotating on a spool with an arm coming down to do the carry proposition.
And, you know, I was just fascinated by computers.
And this was totally radical at the time. This was before personal computers. So the general public had no idea what a computer was. couldn't earn a living as a mathematician, but correctly that I couldn't earn a living
as a mathematician.
And so I went into engineering and computer science.
I think I got turned on by computer architecture because sort of we're in the illusion business.
I mean, we create this machine that, you know, all it's doing is, you know,
comparing some numbers, swapping this and that. And it, it, it,
it does magic and, you know, we build, you know,
we do a lot of work in memory hierarchies. We build a memory hierarchy.
Who's, you know, average latency, for example,
is better than any technology that we build it out of, right, for caches and virtual memory.
And I just had an irrational love for that, right?
It wasn't that it was a, you know, sort of a devious Machiavellian plan.
This is a good place to work.
As it turns out, you know, we have been riding this exponential curve of innovation, a lot of it supported
by Moore's Law, but, you know, everything keeps changing.
It's fascinating.
And you look back and say, yeah, changes are pretty big, but you know what?
The future changes are even bigger.
And, you know, that observation has been true for my entire career.
And boy, what a blessing.
And people say, how did you decide to stay at Wisconsin for 32 years?
Well, of course, I didn't decide to stay for 32 years.
There were several points where I decided not to leave, but didn't really stay in the same
job because, you know, things keep changing, right? Jim Goodman used to joke, my former
colleague, that in the PhD qualifying exam, we don't change the questions because we change the
answers. And, you know, that's what's been just sort of totally fascinating these many years.
And I also see architecture.
One of the things I notice is that in the stack, the hardware software stack,
people tend to move up.
Like many of my students, for example, some work in architecture,
but some work in, you know, very low level software and up,
it's very hard to move down. And so architects are in a very good position to
be part of architecting big software systems. And so whereas pure software, people have a real hard
time contributing close to or at the hardware. So it's been just fascinating.
And, you know, computing has gone from a niche thing.
You know, I went to college when I used a typewriter to write most of my papers.
And, you know, and the World Wide Web didn't exist.
And it's just hard to imagine the impact that we've had.
And so it's just been totally fun.
Thanks for that story, Mark.
I think I didn't know the thing.
I don't know how widely known this is, but that whole mechanical adder thing.
I always enjoy hearing stories like that. But at the same time, I find sometimes that, you know, we often have leading lights in
various fields and industries give talks and maybe origin stories about how they got where
they were.
And a lot of times it exposes a deep affinity or a deep, an early detected affinity for
the topic at hand. And sometimes I find that that can potentially discourage people if they're like,
oh, well, I wasn't doing that until I was 18.
It's too late.
I can never be like Mark Hill.
Do you have anything to say to people like that?
Well, yeah, I mean, many people take sort of many different paths. And, you know, I think, you know, you should never put your heroes on pedestals because, you know, people are not perfect.
They have lots of flaws.
You can do it, too.
And, you know, I mean, Susan Eggers, another Eckert Mockley winner who was my office mate many years ago.
I mean, she was then called a secretary at Yale University.
And she moves on to being Eckhart Moffley winner in computer architecture, partially because the story she told was that her boss, who was a professor, had her look at some Fortran or something like that over the weekend.
People often take circuitous paths.
Mine, I think, is a relatively boring path.
I think the thing that's unusual about my path is that I'm the first in my family to get a bachelor's degree.
To me, that's a testament to the United States, despite all of the United States' obvious failure to completely live up to its ideals.
But, you know, some things work okay. And so you could have argued, you know, my parents really pushed us to go to college,
but, you know, why is it that I never stopped and spent, you know, almost 40 years at universities?
That's unexpected in some ways.
Yeah, I didn't know that either.
Thanks for sharing that.
And I guess for someone where, like, you're the first person to get a bachelor's in your family,
what was the allure of academia and university environment for you?
Well, first of all, when I went to college the first time, this was at the University of Michigan, I just loved it.
I just loved the classes, the other extracurricular activities, you know, from classical music to football.
And, you know, I love the environment.
I ended up, I guess, in retrospect, not wanting to leave for a long time.
But it wasn't like a long, long-term plan.
I mean, when I was, when I got into Berkeley for graduate school, you know what my father
said?
He said, we're not paying anymore. And so,
you know, I went to Berkeley for a master's degree. And, you know, then this is kind of
fascinating stuff. I got involved in some research. So I did a PhD. And then when I was finishing,
you know, I said, well, I don't know if I want to go to academia or industry, but, you know,
the academic interviewing season is
sort of rigid and you can interview with companies afterwards. So I'll try my hand at that.
When I was an assistant professor, I really said, okay, I actually, my daughter was born
a couple months after I started as an assistant professor. This is a really bad example of
pipelining. I got my Ph.D.
Two months later, moved to Madison.
Two months later, my wife gave birth to our daughter.
You want more sequentiality in life.
But that forced me to be very disciplined about my time.
And I said, you know, I'm going to work to get tenure.
And if I don't get it working limited hours and seeing my family I'm going to quit and get a pay raise
and
I was fortunate
that year
I was blessed to have that attitude
because either outcome
I think I would have been better off
and it turns out that I was able to get tenure
and I had
pretty good success
and sort of never looked back.
Mark, thanks so much for being here with us today. It's always an absolute pleasure to talk to you
and hear your words of wisdom as you pontificate about various topics,
both about computer architecture, the practice of computer architecture, and beyond. Well, thank you.
And it's a pleasure seeing you again.
It's too bad we're not working together closely
like a decade ago.
And to our listeners,
thank you for being with us
on the Computer Architecture Podcast.
Till next time, it's goodbye from us.