Computer Architecture Podcast - Ep 4: Cross-layer Optimizations and Impactful Collaborations with Dr. Mark D. Hill, University of Wisconsin-Madison / Microsoft

Starting point is 00:00:00 Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Suvini Subramanian. And I'm Lisa Hsu. Today, we have with us Professor Mark Hill, who is a Professor Emeritus of Computer Science at the University of Wisconsin-Madison, where he taught for 32 years. He has made numerous contributions to parallel computer system design,

Starting point is 00:00:25 memory system design, computer simulation, and more. He's well known for his advice and collaborative work style, having published papers with 160 different co-authors. He is the inventor of the widely used three Cs, model of cache behavior, co-inventor of SCDRF memory consistency model, and a recipient of the Eckert-Mockley Award in 2019. He's currently completing his term as a member of the Computing Community Consortium, where he was

Starting point is 00:00:51 chair from 2018 to 2020. He's here today to talk to us about industry-academia synergy and his vision for computer architecture research in the years to come. Since taping this episode, Mark has joined Microsoft as a partner hardware architect with Azure, reporting to corporate vice president Steve Scott. A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations Mark, thanks so much for joining us today. Welcome to the podcast. Thank you. So in broad strokes, tell us what gets you up in the morning these days. These days, I am particularly excited about how really for the first time in my career,

Starting point is 00:01:48 cross-layer optimizations are both necessary and possible. With the end of Dennard scaling and the slowing of Moore's law, at least in two dimensions, you know, if we're going to give the illusion or the reality of performance energy improvements, some of that is going to have to come from getting the fat out and doing cross-layer optimization. So, you know, many years ago, 20 years ago, if you wanted to propose a non-binding prefetched instruction, people would say, oh, no, you can't. You can't change the interface to do that.

Starting point is 00:02:19 And now much more is on the table. So I find that very exciting because I like the kind of work that connects multiple things together. Do you find that that's more possible these days because there is a bit of a trend towards verticalization, meaning that it's not just because there's this necessity because of all these things slowing down, but changing the interface across layers and across companies is really, really difficult. It seems like these days there's a little bit of, you know, Google is building their own silicon. And do you think that's part of it as well? I actually think the revertical integration of industry is a consequence of the end of Bernard scaling and the slowing of Moore's law, right? In the 20th

Starting point is 00:03:05 century, we were moving so fast and so well conventionally that it made sense to put blinders on and concentrate on your own layer. And now there's just so many opportunities that are, you know, from the technology requiring you to cross layers, and I guess also to some extent from top down. And so you're absolutely right. What you said, Lisa, is that there was a time when there was equipment manufacturers and semiconductor equipment manufacturers, and there was the Google Intels and the AMDs, and then there was the Dells, and there was the Microsofts, and there was the SAPs, and everybody was in their layer.

Starting point is 00:03:47 And now we're seeing software companies doing hardware and hardware companies doing software. In many ways, the software companies going down have almost been more successful at doing that than hardware companies trying to get and really deeply understand software. But that's a longer story. So picking up on the theme of cross-stack work that needs to be done in today's age, tell us a little bit about how you think about doing this kind of cross-stack work when you need people from, I guess, different expertise from different domains to maybe come together and work on problems.

Starting point is 00:04:25 Yeah, so it's a difficult challenge. I mean, it depends how far you cross. If you're crossing just, you know, sort of one layer, it's easier. It does take time to get people to talk to each other. You know, one of the reasons, as I mentioned, I had 160 co-authors so far, is that, you know, I want to work with people who have other expertises. So, you know, at Wisconsin, I worked a lot with David Wood, who's at the same layer, but also with, you know, like Jim Laris and Ross Bodick and Mike Swift, who are at different layers. You know, I think what's key is identifying sort of what are the opportunities across the layers.

Starting point is 00:05:09 So let me give you the example of direct segments. Now, this actually hasn't been adopted, so you could argue that it's a failure. But I thought it's a really interesting research project. So the issue here was that we saw that large servers running things like key value stores, et cetera, were losing a ton of performance to TLB, Translation Look-Aside Buffer, misses. And yet they weren't really benefiting from much of what virtual memory has to offer. Like, for example, they never swapped because the key value store was sized to the physical memory available. And so you paid all this overhead for no benefit.

Starting point is 00:05:51 And so what direct segments did, and we won't have to go into the details, but it said, here's a way to get rid of the overhead for the main data structures, but yet keep backward compatibility all the way. And that required, you know, students that could cross layers and supervision from Mike Swift on the operating system side and me on the hardware side sort of do the work. I think that's and it started with analysis, right? Analysis precedes synthesis. Right. So we analyze that there was this problem and that there was also this opportunity. You didn't have to solve all problems if you solve this really important special case. Although you weren't doing much swapping of the main data structure. I remember that work. What do you think was the main reason why it didn't get widely adopted? Because I think there is a lot of code running, particularly in the cloud, that doesn't swap.

Starting point is 00:06:50 You think it'd be useful? It's a non-trivial change to the virtual memory hardware. And that's a difficult thing. We did a later version that didn't mess with the level one TLB. It was outside the level one TLB in order to make it easier to adopt because you weren't messing with the super critical timing paths of the L1 data TLB and L1 cache. I think it was just too radical. And also there was, you know, pretty good progress on trying to make sort of big pages work and Avercheck Battergy's, you know, fine work on some, you know, ways of automagically creating bigger pages and fusion in the TLP.

Starting point is 00:07:36 And that was just sort of more incremental. Yeah. So that kind of begs the question then, you know, this cross layer optimization, we are approaching that point where it's, or if not already there, where it's really, really needed and necessary. But as you point out in a lot of your sort of expository and written work, that's really hard, right? Because you kind of have to get seven different pieces to move at once. You seem to have been able to do that successfully a number of times in your career by kind of having broad impact even at places like companies. So what kind of advice do you have with respect to exactly that, you know, moving all seven dwarves at once? Well, I think part of it is becoming more possible with this revertilent integration of industry, right? You know,

Starting point is 00:08:28 so companies like Apple have no trouble saying, hey, we want to incorporate accelerators in our iPhone operating system. And so we're going to do it, right? Whereas if, you know, the hardware manufacturer is AMD or Intel and the operating system manufacturer is a Microsoft, you know, Microsoft may not see the value in supporting something in their operating system for some benefit that some third party gets. And so I think it's going to be easier when companies are controlling the sort of whole value chain. And the other thing is, remember, it's so obvious in retrospect, but when Moore's Law and Denard Scaling were cranking forward, doubling every two times, that just was like this tsunami that swept away many, many good ideas. So you had this crazy idea to build this thingy, you know, by the time you got it deployed, you know, the chips were four times faster. And so, you know, I think now as things slow down, we're going to, people

Starting point is 00:09:41 are going to have to consider more alternatives, and the vertical integration is going to make that possible. Google controls their cloud infrastructure. Amazon does it quite a bit. Microsoft does quite a bit. If they want to do something in that space, it's going to be more possible. Do you think there is any drawbacks to that kind of verticalization model? Because I mean, like you were saying in the old, you know, 20 years ago, every layer could go hyperspeed in its own silo and it would all kind of work together because everybody had their agreed upon interfaces that were kind of standardized across the industry. And now with this verticalization,

Starting point is 00:10:20 it seems possible that now, like, because Apple does control its entire stack, it's very hard to do anything to impact Apple except what they decide to do for themselves. And similarly, all these major cloud providers that you just named, Amazon, Facebook, Google, Microsoft. And so what kind of drawbacks do you potentially see from this shift? I mean, we've heard the potential gains that you just said. Now everybody can kind of slow down and do cross-layer optimization, but what should we be watching out for? Sure, there's obvious drawbacks.

Starting point is 00:11:01 I mean, the great thing about the layers was it was a divide and conquering of complexity. Everybody could work almost in parallel on their items. And that facilitated many things. By the way, when I say verticalization, I don't mean like throw everything out. I mean much more selective punch throughs for important things. In terms of companies, you know, the advantage of the layers and companies operating in layers is that you had sort of more competition. You would buy your CPU chips from more than one vendor and things like that. And so if you're if you're integrated in one company, you know, you're getting this from, you know, your company's group that's doing this work. And if the group is not facing, you know, tremendous competitive pressure, maybe they won't advance as quickly.

Starting point is 00:11:50 And, you know, sometimes you see in organizations, by the way, they're so big that even though they theoretically should be able to do cross-layer optimization, you know, it's sufficiently far apart in the org that it doesn't happen as much. So there's no panacea. But I'm actually old enough to remember when the industry was vertically integrated before. And there was IBM and DEC and some other companies that were vertically integrated. So I suspect that this is a circle of reinvention going on. Right. So do you see some echoes from the past and learnings that we could apply to the current age?

Starting point is 00:12:33 Like when people are trying to do like cross-tack optimizations, for example, you talked about selective puns-throughs, right? Like how do you go about identifying the right avenues where you can make these changes? How do you go about identifying the right partners to collaborate can make these changes? How do you go about identifying the right partners to collaborate with and move things forward? Well, let me give you two answers. First of all, you know, there's a model that it's just going in a circle. I think it's much more spiral. So if you look in a certain way, it looks like it's a circle, but it doesn't actually go

Starting point is 00:13:02 back to the same place. It advances. What was the other part of your question? How do you look through for punch-throughs? I mean, first of all, what's really important in research is to identify problems and to do analysis first. So the most important thing, whether it's cross-layer or otherwise, is to identify a problem that if you can solve it, people will care about it. And you have some reason to believe that you can solve at least part of it. And, you know, by focusing on problems first and not solutions first, don't play Jeopardy. I've written this somewhat, you know, this somewhat. People have this tendency to say, I got a new mechanism. Let me find some use for it. I wouldn't do that. I would look to see where

Starting point is 00:13:54 there was pain. So in that direct segments work, we saw that there was pain in TLB miss rates that were ridiculously high. And so then you start thinking hard about how to do that. Can I tell the rage story? I mean, this is not my work, but I think it's a really good example where if you frame a problem correctly, the solution becomes a puzzle, almost a simple puzzle. So Patterson and Gibson and Garth Gibson and Randy Katz were observing that classically, this is in the 1980s, the cheapest place to store a bit was on a washing machine size disk drive that the super duper mainframes had okay but then personal computers came along and eventually had enough demand for small disk drives that the small disk drive volume went through the roof okay and it actually became more cost effective to store a bit on a PC drive than on one of these big drives.

Starting point is 00:15:08 So couldn't you store your big data on PC drives? That was the question they asked. Because that was this opportunity that was caused by this differential change and they were looking for a problem and the answer was no because the pc disks didn't meet a very high reliable reliability standard individually because they were going to users not enterprises and if you put a whole array of them the meantime the failure just killed you it just didn't work so could you do something to use these disks and make the mean time to failure be really excellent? Arbitrarily high. And so once you start thinking that way, you think, well, we need some kind of error correction codes.

Starting point is 00:15:55 And since the disks actually, when they fail, they let you know they're not there. Then you turn to erasure codes. And it really wasn't brilliant to do the erasure codes, right? What was brilliant was to be tracking this trend where it's a different place to store data. And then the problems fell out. You know, the opportunities fell out. And so, you know, and not all my research projects are that clean, but I think that's a perfect example of, you know, when you're looking for the research to do,

Starting point is 00:16:31 you're looking for the problems and the trends and the inflection points. You're not starting with, boy, I know how to do erasure codes. What should I use it for? Right. So let's expand on, you know, picking those opportunities or looking for the right problems. So I'll start from the academic side. So what can academics do? You know, how do they find out avenues where they can get access to these problems or, you know, understand that there are these problems that are there? I mean, you've talked about giving talks and things like that, but how do you think about how academics should go about scoping out these problems? So a couple of things.

Starting point is 00:17:08 One is you should watch trends and try to reflect on whether, you know, you're going to get an inflection point, right? Like an example of those PC disks becoming cheaper per bit than the big disks. Or there was a time when people talked about killer microprocessors because in the old days, obviously, a big iron machine was going to be better than this wimpy calculator based microprocessor. But that flipped. And so you can, a good way to do that is to have your pulse on industry, which you can do by holding industrial affiliates meetings, by sending students on internships and by doing sabbaticals in industry. The last is, you know, basically the high reward, high cost method. But, you know, I have found it pretty effective.

Starting point is 00:18:05 And, you know, like when your students go to industry, you know, remind them the same thing for sabbaticals. You cannot come back from industry with their solutions. That is their intellectual property. Right. You solve some problems for them because they're paying you and they may want to hire you. But you also look for problems. Right. Look for problems that are just emerging. Look for problems that are being band-aided right now because they're not

Starting point is 00:18:30 important. But your gut tells you they're going to become more important. And I just think if you do that, you're just more likely to load the die in your favor to pick important problems. So just flipping the script on the other side, so you talked about how academics can engage with industry and look for these problems. From an industry practitioner's perspective, how should they think about engaging with academics? Why is it important for them also to do that in this particular era? Okay, well, I'm not completely qualified to answer that question because I haven't spent a great time in industry. I mean, Lisa perhaps can influence their work, they might do work more relevant to your product. The other thing, and Cliff Young sort of really advocates this, is that industry sometimes

Starting point is 00:19:38 has solutions and they don't even necessarily completely understand like why it works and what's possible. And academics take it back and generalize it and study it and, you know, look at the possibilities and then bring it back. And then that can be used by industry to sort of do even better, that we can really play complementary roles. And I think that's the reason for engaging. The problem with engaging, as with most things, is time, right? You only have so much time. But I think it's important to remember whatever your job is, you can't spend all your time running in the direction you're running. You also have to spend some time asking, are you running in the right direction? So are there elements that capture a good collaboration,

Starting point is 00:20:30 things that you look for? Well, I mean, you look for somebody that compliments you, right? So many other than David Wood, a lot of people compliment me because of those areas. David Wood and I compliment each other based on our working styles. You can, the caricature of that as a tortoise and a hare. I'm a tortoise and he's a hare. And so I would plod along and get things done and get a framework and he would make it better at the last minute, which was not my temperament.

Starting point is 00:21:02 And so he's very creative, too. So you look for people that compliment you. I think it's important also is that it's, there's a natural tendency to overestimate your own contribution, right? If you, if you asked a bunch of people who collaborated to honestly, unbiased estimate their contribution, and you sum them up, you get way more than 100%. And so I think it's important to always try to correct for that you know assume your contribution is smaller and you know be humble in the credit because the goal really is to have renewed collaboration

Starting point is 00:21:39 right and if somebody is not a good person to collaborate with, they take all the credit. They they have their elbows sharp in this net. You finish the collaboration or maybe you abort it. But then, you know, it's no thank you to more collaboration. Conversely, if you ever see evidence of somebody doing recurring collaboration, you can always be sure from the outside that both were contributing because nobody is going to do recurring collaboration carrying the other person. Yeah, that's a great way to picture it, Mark. I wanted to follow up really quickly on something you were saying before about trusting your gut and deciding that there's going to be an inflection point and this is going to become important. I think one of the things that I've observed among people who tend to do most well

Starting point is 00:22:31 in our field or any field rather, are those who have the confidence to trust their gut and not be like, oh, look, everybody's running this way. Let's run this way too. And so some of the skill is A, doing exactly that, like, am I running in the right direction, spending some time figuring that out and be having sort of the strength to run elsewhere. And, and then finally like having the wherewithal and ability to sort of say every, some people should be running with me this way instead. And you seem to be particularly good at that kind of really meta stuff. So can you talk a little bit about that and how you maybe developed your nose? Well, remember, I mean, I think Doug Berger has a nice quote.

Starting point is 00:23:14 I don't know if I have it at hand, but, you know, basically the most profound stuff is when you have a contrarian prediction that turns out to be correct. As a professor, I like to think that you really want to develop a research portfolio, right? Some of the things are shorter term and safer, and some of the things are sort of longer term and more radical. And that allows you to do some contrarian stuff, because you also have something, you know, sort of more straightforward coming. I think the biggest thing you want to resist is jumping into something because you just saw a couple of good papers written on it.

Starting point is 00:23:58 I mean, that guarantees that you're going to be, at best, a fast follower. You know, Guri Sohi, who went on to get an Eckhart Mochley Award and get into the National Academy of Engineering. I mean, he was working on processors in the late 80s and 90s when everyone else was saying processors are fast enough. It's all about memory and storage and interconnect. And he said, no, no, no, processors are not fast enough. And, you know, that turned out to be right. I'm not sure I have perfect operational advice for you, Lisa. But, you know, it does come sometimes from some watching these trends. And the key thing with watching trends, especially as an academic, is you don't have to get the timing right. Right. I mean, you have to just, you know, like I worked on multiprocessors

Starting point is 00:24:48 because I thought they were important. And maybe they were important in niche, but I was wrong for like 20 years. Okay? And then suddenly multicore makes them really important. And it's not like I anticipated, you know, Denard scaling, forcing us to multi-core. And so there's no magic there. But if you just follow others, there's just a limit. So, you know, think about can you make some of your work, you know,

Starting point is 00:25:19 not following others. Yes. So this seems like a good transition time to go into discussing some of your visioning work, because, you know, the ability to do this sort of visioning is highly dependent on your ability to kind of detect these trends, whether, you know, just in time or very early on and say, hey, no, this is, I think, the direction broadly that the field should go. Yeah. So you're talking about the Computing Community Consortium, which is essentially an NSF-funded think tank. I did not found it. It predated me. So what happened was early in their existence, in 2012,

Starting point is 00:25:58 I somehow got roped into leading a white paper called 21st Century Computer Architecture. And we worked with about 10 people who were like the leaders of SIGARCH and TCCA and a few other organizations. And we put together a white paper because NSF was itching to do something, you know, in what they were already seeing as a post-Moreslaw era wasn't there yet, but it was coming. And so we did a white paper really at a perfect time because they wanted to do something. So it's like there were clouds that just needed to be seeded and then it would rain. And this resulted in NSF programs starting with XPS and they keep changing the order of the letters and things.

Starting point is 00:26:51 You know, $16 million a year and still going, which, you know, for industry types, that's like nothing, but that's a big deal for academics. And so since no good deed goes unpunished, people thought, wow, maybe we should put him in this organization because, you know, he herded these cats to get this white paper done. Actually, we did it in like two months, which is lightning speed. And so then, you know, I was on the council and then executive committee and then eventually kept getting promoted, if you will. And, you know, I was department chair. And then I was on the council and then executive committee and then eventually kept getting promoted, if you will. And I was department chair.

Starting point is 00:27:28 And then I decided after being department chair that this was a really good thing to do. And it's fascinating because I did things well beyond computer architecture. So in terms of this herding of cats, I remember when we were at AMD together, it often came up. It's just like herding cats. We're all herding cats all the time. And I find that the more senior I get, the more older I get that there is a lot of this technical work aspect. And then there's just the cat herding aspect.

Starting point is 00:27:54 And then maybe this plays into what you're saying before about, you know, there's this cost of time. There's reward, but you have to kind of invest your time because you want to be able to draw the ideas out of, draw ideas and understanding and communication and like shared vision across this large group of disparate people. And so when it came to that white paper, that it's, you know, to do something like that in two months with all these leading lights, what are kinds of some skills that you think would be useful?

Starting point is 00:28:27 Because I think what we, as a field, we teach our grad students a lot about technical stuff, but we don't necessarily teach them about cat herding. There's no cat herding one-on-one, right? So what kind of advice do you have to give about that sort of a task where not only do you have to hurt 10 people, but 10 people who are sort of leading lights in the field, which probably just makes it that much harder.

Starting point is 00:28:51 Right. So I'm not sure I have simple rules. I mean, one thing is that everyone wants to be treated with respect and listened to. And so, you know, I try to listen to people. My wife says I don't, but everyone else doesn't do. You know, and, you know, you consider what they have to say and you sincerely make everyone believe, because it's true, you know, that they're going to have an investment in this product and the product is going to be good. And that motivates people to sort of make their contribution and want to be part of it. By the way, the thing I didn't say to the previous answer with visioning is that

Starting point is 00:29:36 so often the vision is not actually brand new. In fact, that report was something that like all the architects knew, just the story hadn't been told and it wasn't known elsewhere. And so the goal was really, in some sense, it was to discuss what we saw as the present to everyone else who considered it the future. And so William Gibson, the great science fiction writer says, you know, the future is here. It's just unevenly distributed. And, you know, that's, that's my experience. Interesting. I like that quote. I think one other thing that we were curious about since you, through your long, long you know 32 year career at UW you've taught a lot of students um what kind of skills do you try to make sure to impart upon them I sometimes get

Starting point is 00:30:32 questions from professors like what are we not teaching our students and um you know like we kind of just mentioned before there's usually pretty good technical education but uh what kind of meta skills do you find are important to impart? And before they answer that, let me answer a question about selecting students. I find the most important thing to look for, and Lisa, it'll be interesting when you think about this for selecting employees, is fire in the belly. People who really want to do work. And you can give them like little tests and go off and read this paper and come back and discuss it. And you can see if they have fire in the belly. People who really want to do work. And you can give them like little tests

Starting point is 00:31:05 and go off and read this paper and come back and discuss it. And you can see if they have fire in the belly. You want people who can program pretty well. You want people that are, you know, pretty intelligent. But, you know, there's all kinds of standardized things that sort of filter that by the time they get it. It's fire in the belly that you really want. So I think what's the mistake is, to come back to your question, is advisors who think their job is to teach technical knowledge, right? Once you've got that technical knowledge, you're done. You really want to teach people a lot about the process,

Starting point is 00:31:39 thinking about how you think about problems, thinking about how you design experiments, that sort of stuff. I can't emphasize enough how important written and oral communication are. Our job is not just to discover things, but it is to communicate it. And if you can communicate it better, people will want to work with you more. People will think more highly of your work. It's a total win-win. But once again, communication takes effort.

Starting point is 00:32:11 We can talk about that if you want. The other thing I really emphasize is that you have to learn to take criticism, even criticism that you think is wrong. You have to listen to it and you have to say, is what this person said true? If it's not true, why did they come to this misunderstanding? How did I miscommunicate that led them to do this? And you want them to be able to take criticism from their friends and colleagues because they're certainly going to get it from the referees. Referee number two, you have to be about separating criticism of work and not internalizing that as criticism of self, right? Because if you take of it as a criticism of you personally, then there's this natural defense mechanism that kicks in.

Starting point is 00:33:13 And by the way, the flip side of that, you were asking about working with people and cats of all levels. You never, ever criticize a person. You only criticize actions. And you can even, it's often helpful to compliment that by saying, you know, let's say you were somebody who was a grad student who was sloppy with some data analysis and showed you results that were bogus and you found some flaws in them. You know, you say that you really need to double check things, you know. For someone as gifted as you, I expect more. Okay. And so if you criticize the action and compliment it with a personal compliment, that works well.

Starting point is 00:33:57 If you do a personal attack, you are an idiot, you know. They're going to say inside, no, you're an idiot. Even if they don't feel comfortable saying that to you, it just doesn't work. Yeah, yeah. Those are wise words. And I find the longer I work, that the more those kinds of skills become important, even the balance of the pie in terms of how, what you can accomplish tilts more towards being able to do that than just being able to program a bunch of stuff.

Starting point is 00:34:31 You've spent like several decades in the architecture community and seen like various inflection points and so on. So what do you think the architectures community is doing well today? And where do you think we need to focus more or what we can do better? So first of all, let me say that I think, you know, on balance, our community functions quite well. We get a lot done. We are much more connected to industry.

Starting point is 00:34:56 And I think that's a good thing than many other practical fields. And we really advance things and we change topics, right? You don't see, you know see microarchitecture dominating things. Now you see neural network accelerators dominating things. Now maybe we respond too much to trends, but I still think it's important to respond to trends. I'm going to say what we do less well, especially now that we have so many conferences

Starting point is 00:35:30 and people are writing so many papers, people are writing too many papers in my judgment, with the belief that the more papers you have, the more impact you'll have. I mean, there's plenty of people actually less well-known than me that have twice as many papers. As I've always tried to write fewer papers, I try to make each one good. Okay. Not complete success, but, you know, and you can't, you know, you can't write, nope. You can't really take that to the extreme because your students need some amount of papers to get jobs.

Starting point is 00:36:06 But don't think in terms of writing a lot of papers. And because we're writing so many papers, our reviewing process is under tremendous strain. And when it's under strain, then a couple of things happen. Randomness increases a little bit, which allows people to think they should just submit and hope for the best, which is unfortunate. The other thing is I think we take too many of the incremental safe, flawless papers. And we're very good at rejecting a paper which the program committee disagreed on whether or not it was important. I'm convinced that if you took one or two sessions worth of the safest papers in a conference and replaced them with one or two sessions of out-there papers at that conference, that it would be a better conference.

Starting point is 00:36:58 And some of those out-there papers will end up being seminal, and some of those out-there papers will be forgotten. But probably all those incremental papers were forgotten. And we fear accepting a paper that somebody later finds flawed and not important, much more than we fear moving too slowly accepting sort of really safe papers. Yeah, I think in our second episode, when we talked to Bill Dally, he was saying one thing about, you know, because he's had experience on both academic and industry, that in terms of writing a paper and the spectrum of getting an idea to getting something into production, the paper is only like 10% there. I can't remember the number exactly, so I'm going to just apologize to Bill in advance.

Starting point is 00:37:46 But, you know, it's very early. There's a whole lot of work to do thereafter. So if the focus is just on doing that first 10% of work and if the ultimate goal is industrial impact, there should be a longer follow-through, I feel. Absolutely, there should be a longer follow-through. I mean, that's one of the reasons why I pushed, you know, internships and sabbaticals and also graduating students to industry.

Starting point is 00:38:10 I think the other thing about the follow through is that the reality of anything in computer architecture is way more complicated than the paper. And therefore, if your idea is pretty complicated in the paper, it has much less chance of impact. And so many of the ideas that I'm most proud of are sort of ridiculously simple, really, like the three C's. It's so obvious in retrospect that you can't even believe it needed to be invented. And so, you know, striving for simplicity is a good thing. Of course, it can be hard because referees are not always fond of simplicity. Just picking up one of the themes there. So you talked about once you have an idea in academia and the time it takes by the time it actually gets into a real product

Starting point is 00:39:05 or is pretty long so how do you think about the time constant for you know engagements or collaborations that you have especially if the end goal is you know getting to some impact like what are like neat are there like cutoff points or you know do you engage more with students things like that you know i try hard to have impact. I write papers. I give talks in industry and put out students. You can't completely control what industry does, and sometimes ideas are not adopted for reasons that are completely non-technical.

Starting point is 00:39:38 Also, there's a whole patent thing going on so that industry doesn't necessarily want to acknowledge that the idea came from you. There's an incentive to sort of reinvent it to protect themselves. So I think you want to put those ideas out there and control what you can control. And so what you control is sort of doing good work that people find interesting. And I personally believe it's a contribution, even if somebody, you know, reads some of my papers and it makes them think hard and they do something different. You know, one of the questions is, you know, what is success, right? And, you know, there's two extremes. One is completely external recognition. And the other is internal satisfaction.

Starting point is 00:40:32 And, you know, it's hard to live completely on one of those extremes. But I think more people would be better off striving for internal satisfaction and some external recognition. Internal satisfaction and some external recognition. internal satisfaction and some external recognition. That's what I've tried to do. And it's both more what you can control and often by keeping the focus on doing good stuff, some of that external recognition may come. And if it doesn't, you know, so be it.

Starting point is 00:41:03 Maybe this is a good point to wind the clocks back a little bit. Tell us about your path to University of Wisconsin, how you got interested in computer architecture, how you got to where you are today, any interesting episodes or inflection points in your life leading up to this point. Yeah, so when I was in ninth grade in middle school, somehow I designed a mechanical binary adder, which involved a bunch of housing address labels rotating on a spool with an arm coming down to do the carry proposition. And, you know, I was just fascinated by computers. And this was totally radical at the time. This was before personal computers. So the general public had no idea what a computer was. couldn't earn a living as a mathematician, but correctly that I couldn't earn a living as a mathematician.

Starting point is 00:42:07 And so I went into engineering and computer science. I think I got turned on by computer architecture because sort of we're in the illusion business. I mean, we create this machine that, you know, all it's doing is, you know, comparing some numbers, swapping this and that. And it, it, it, it does magic and, you know, we build, you know, we do a lot of work in memory hierarchies. We build a memory hierarchy. Who's, you know, average latency, for example, is better than any technology that we build it out of, right, for caches and virtual memory.

Starting point is 00:42:47 And I just had an irrational love for that, right? It wasn't that it was a, you know, sort of a devious Machiavellian plan. This is a good place to work. As it turns out, you know, we have been riding this exponential curve of innovation, a lot of it supported by Moore's Law, but, you know, everything keeps changing. It's fascinating. And you look back and say, yeah, changes are pretty big, but you know what? The future changes are even bigger.

Starting point is 00:43:19 And, you know, that observation has been true for my entire career. And boy, what a blessing. And people say, how did you decide to stay at Wisconsin for 32 years? Well, of course, I didn't decide to stay for 32 years. There were several points where I decided not to leave, but didn't really stay in the same job because, you know, things keep changing, right? Jim Goodman used to joke, my former colleague, that in the PhD qualifying exam, we don't change the questions because we change the answers. And, you know, that's what's been just sort of totally fascinating these many years.

Starting point is 00:44:09 And I also see architecture. One of the things I notice is that in the stack, the hardware software stack, people tend to move up. Like many of my students, for example, some work in architecture, but some work in, you know, very low level software and up, it's very hard to move down. And so architects are in a very good position to be part of architecting big software systems. And so whereas pure software, people have a real hard time contributing close to or at the hardware. So it's been just fascinating.

Starting point is 00:44:48 And, you know, computing has gone from a niche thing. You know, I went to college when I used a typewriter to write most of my papers. And, you know, and the World Wide Web didn't exist. And it's just hard to imagine the impact that we've had. And so it's just been totally fun. Thanks for that story, Mark. I think I didn't know the thing. I don't know how widely known this is, but that whole mechanical adder thing.

Starting point is 00:45:19 I always enjoy hearing stories like that. But at the same time, I find sometimes that, you know, we often have leading lights in various fields and industries give talks and maybe origin stories about how they got where they were. And a lot of times it exposes a deep affinity or a deep, an early detected affinity for the topic at hand. And sometimes I find that that can potentially discourage people if they're like, oh, well, I wasn't doing that until I was 18. It's too late. I can never be like Mark Hill.

Starting point is 00:45:56 Do you have anything to say to people like that? Well, yeah, I mean, many people take sort of many different paths. And, you know, I think, you know, you should never put your heroes on pedestals because, you know, people are not perfect. They have lots of flaws. You can do it, too. And, you know, I mean, Susan Eggers, another Eckert Mockley winner who was my office mate many years ago. I mean, she was then called a secretary at Yale University. And she moves on to being Eckhart Moffley winner in computer architecture, partially because the story she told was that her boss, who was a professor, had her look at some Fortran or something like that over the weekend. People often take circuitous paths.

Starting point is 00:46:58 Mine, I think, is a relatively boring path. I think the thing that's unusual about my path is that I'm the first in my family to get a bachelor's degree. To me, that's a testament to the United States, despite all of the United States' obvious failure to completely live up to its ideals. But, you know, some things work okay. And so you could have argued, you know, my parents really pushed us to go to college, but, you know, why is it that I never stopped and spent, you know, almost 40 years at universities? That's unexpected in some ways. Yeah, I didn't know that either. Thanks for sharing that.

Starting point is 00:47:41 And I guess for someone where, like, you're the first person to get a bachelor's in your family, what was the allure of academia and university environment for you? Well, first of all, when I went to college the first time, this was at the University of Michigan, I just loved it. I just loved the classes, the other extracurricular activities, you know, from classical music to football. And, you know, I love the environment. I ended up, I guess, in retrospect, not wanting to leave for a long time. But it wasn't like a long, long-term plan. I mean, when I was, when I got into Berkeley for graduate school, you know what my father

Starting point is 00:48:22 said? He said, we're not paying anymore. And so, you know, I went to Berkeley for a master's degree. And, you know, then this is kind of fascinating stuff. I got involved in some research. So I did a PhD. And then when I was finishing, you know, I said, well, I don't know if I want to go to academia or industry, but, you know, the academic interviewing season is sort of rigid and you can interview with companies afterwards. So I'll try my hand at that. When I was an assistant professor, I really said, okay, I actually, my daughter was born

Starting point is 00:48:57 a couple months after I started as an assistant professor. This is a really bad example of pipelining. I got my Ph.D. Two months later, moved to Madison. Two months later, my wife gave birth to our daughter. You want more sequentiality in life. But that forced me to be very disciplined about my time. And I said, you know, I'm going to work to get tenure. And if I don't get it working limited hours and seeing my family I'm going to quit and get a pay raise

Starting point is 00:49:26 and I was fortunate that year I was blessed to have that attitude because either outcome I think I would have been better off and it turns out that I was able to get tenure and I had

Starting point is 00:49:42 pretty good success and sort of never looked back. Mark, thanks so much for being here with us today. It's always an absolute pleasure to talk to you and hear your words of wisdom as you pontificate about various topics, both about computer architecture, the practice of computer architecture, and beyond. Well, thank you. And it's a pleasure seeing you again. It's too bad we're not working together closely like a decade ago.

Starting point is 00:50:13 And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.

Your Ad Here

Computer Architecture Podcast - Ep 4: Cross-layer Optimizations and Impactful Collaborations with Dr. Mark D. Hill, University of Wisconsin-Madison / Microsoft

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.