Computer Architecture Podcast - Ep 15: The Hardware Startup Experience from Business Case to Software with Dr. Karu Sankaralingam, University of Wisconsin-Madison/Nvidia

Episode Date: March 28, 2024

Dr. Karu Sankaralingam is a Professor at the University of Wisconsin-Madison, an entrepeneur, inventor, as well as a Principal Research Scientist at NVIDIA.  His work has been featured in industry fo...rums of Mentor and Synopsys, and has been covered by the New York Times, Wired, and IEEE Spectrum. He founded the hardware startup SimpleMachines in 2017 which developed chip designs applying dataflow computing to push the limits of AI generality in hardware and built the Mozart chip. In his career, he has led three chip projects: Mozart (16nm, HBM2 based design), MIAOW open source GPU on FPGA, and the TRIPS chip as a student during his PhD. In his research he has pioneered the principles of dataflow computing, focusing on the role of architecture, microarchitecture and the compiler. He has published over 100 research papers, has graduated 9 PhD students, is an inventor on 21 patents, and 9 award papers. He is a Fellow of IEEE.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Suvaneh Subramanian. And I'm Lisa Hsu. Our guest for this episode was Karu Sankaralingam, who is a professor at the University of Wisconsin-Madison, an entrepreneur, an inventor, and also a principal research scientist at NVIDIA.
Starting point is 00:00:27 His work has been featured in industry forums of Mentor and Synopsys and has been covered by the New York Times, Wired, and IEEE Spectrum. He founded the startup Simple Machines in 2017, which developed chip designs applying data flow computing to push the limits of AI generality in hardware and built the Mozart chip. In his career, he has led three chip projects, Mozart at 16 nanometers and HBM2-based design, Miao, an open-source GPU on FPGA, and the TRIPS chip as a student during his PhD. In his research, he has pioneered the principles of data flow computing,
Starting point is 00:01:09 focusing on the role of architecture, microarchitecture, and the compiler. He has published over 100 research papers, has graduated nine PhD students, is an inventor on 21 patents, and has published nine award papers. He's a fellow of the IEEE. Now, Simple Machines was founded before the huge boom in AI chip startups and has since folded. And so we really wanted to pick Kari's brain about his thoughts on running a chip startup. And he was kind enough to join us and share his thoughts on building a business case, understanding the user experience, and how much software is necessary to build a hardware startup. Now, before we get to the interview, a quick disclaimer that all views shared on this show are the opinions of individuals and do not reflect the views of the organizations they work for. With that, let's get right to it. Karu, welcome to the podcast. I'm thrilled to be here. Thank you so much for talking to me today.
Starting point is 00:02:04 We're so excited to have you. And as listeners know, our first question is always, what's getting you up in the morning these days? Well, everybody's life is different. So mine is I have a young family. I have an 11-year-old and seven-year-old. So what gets me up is making sure they both go to school and our houses just until they have reached their school is getting them out the door and in the past few years I have become a all-season biker so summer winter I don't have a parking permit anymore what gets me up is getting them there and man my very very short-minute bike ride to school. And I'm not even a real biker.
Starting point is 00:02:49 I go on an e-bike. So real bikers don't be offended by me calling myself a biker. So that's what gets me up, getting my kids out the door and my very enjoyable all-weather bike ride to work. That's really amazing i think you bike to work in wisconsin and there was another guy philip wells i remember philip white his bike every day to work and i was like you guys i mean you live in wisconsin like what are you talking about you work every day that's my yeah my inspiration actually ben, Ben lived there. I got here in 2007. Ben grew up, I think, in Pennsylvania or something.
Starting point is 00:03:31 Then he went to grad school in Stanford, and then he moved here. So he would bike. I never got it until I myself started biking. I'm like, yeah, actually, like with all the gear, it doesn't really matter what the temperature is. It's very enjoyable, that 15 minutes of crisp air in your face. Oh, that's amazing. That's amazing.
Starting point is 00:03:53 When I was at Michigan, I remember walking to the office and some days it was so cold that I wouldn't have described it as crisp air. I remember telling people it felt like someone was slapping me the whole way. And so I don't find that fun. But yeah, okay. So that gets you in the morning. And now you're in your office at Wisconsin. So Simple Machines, that's sort of the lead here. What's going on with Simple Machines now? For those of our listeners who don't know much about it, maybe you tell us a little bit about it. Yeah, so SuperMachine was a startup I founded with work done as part of my research and a handful of my graduate students. What we were trying to do, it's kind of becoming more relevant now, a product market fit was maybe a couple of years too ahead.
Starting point is 00:04:43 We were trying to go after a small batch inference and trying to build a chip that would run at extremely high utilization, 60 to 70% of hardware utilization, even when you didn't have a lot of batch parallelism. The second angle we were going after was this observation that algorithms were changing at a rate much faster than the chip design cycle. So our goal was how do I build something general? And more recently, we've coined this term called saying we are in the era of efficient generalization, not really the era of specialization.
Starting point is 00:05:29 Now specialization implies I have a thing, I'm going to build silicon for that thing. That thing is changing at a rate that is really, really fast. And if it takes you 18 months to build silicon, by the time you build your specialized thing, the thing has changed. So we were putting behavior principles that with software could really be modified to run very, very efficiently. So that was our value prop. And we built out, we'll get into this more, the hardest part. I mean, many hard parts. The business part was probably absolutely the hardest part. I mean, many hard parts. The business part was probably
Starting point is 00:06:05 absolutely the hardest. In my opinion, until you do it, nothing prepares you for it. Doing it the first time prepares you for doing it the second time. So what do you mean specifically by the business part? Do you mean coming up with a value proposition
Starting point is 00:06:21 or do you mean the actual running of the business? What do you mean by that? It's, people sometimes use this term, it's a little cliche, the whole concept of a product market fit. Eventually you have a technology, it does well on benchmarks and so on and so forth. Ultimately, a customer has to feel,
Starting point is 00:06:43 I'm going to pay you money to buy this thing. I'm going to pay you instead of somebody else. And the value you bring is so much better than something else that's already mainstream. And especially in deep learning, obviously, NVIDIA had the absolute has, had absolute dominance there. And the barrier to entry was, this was our biggest learning, and it was what drove the company. And even as we had more and more conversations, we kept realizing how important it was. Basically, the user experience expectation. And Nvidia has raised the bar so high, which is you show up with a model
Starting point is 00:07:30 written in a high level language and it will run. That's it. You write your 50 lines of PyTorch, you run it on the hardware, that's it. That's all you need. You don't need to do anything else. And the user expectation for every customer was pretty much that, which is, yes a great compiler a great technology unless that level
Starting point is 00:07:51 of user experience is there it becomes very very hard for them to get convinced that the 5x 10x speed ups all the other metrics kind of become secondary after the user experience and that that's number one and second is the user experience need to then match with some thing that the customer is not able to do with whatever is available currently. And that was a very intriguing, under retrospect, somewhat obvious lesson, which is when we started, which was 2017 and we kind of had our first product 2019 timeframe, it was still the beginning of AI adoption. So everybody would be coming up with a use case that generated value. That's the only reason why any company, pick a company, right? Home Depot, Bank of America, insurance company, Geico, whoever, right?
Starting point is 00:08:54 They would be internally building a product internally or a feature that was providing value from adopting AI because it actually worked. And their internal platform would be a GPU because those are the only things they have access to. So there was a very kind of intriguing, what is it called? Searching under the streetlight is not the right metaphor, but it's more of
Starting point is 00:09:26 they were building nails because what was available was a hammer. Which totally makes sense because why would you do anything else? And those of you who are into woodwork know screws are better. You shot up with a screw, they like i don't know i don't
Starting point is 00:09:47 have a screwdriver what am i going to do with the screws better it can hold more strength every which way i look at it this is a better thing but i have built a nail because available to me was a hammer right and so that was kind of one of the things that, and then from kind of like people talk about this in political conversations and so on, the framing becomes very hard, which is you built a value prop, which is going after something the state of art is not good at,
Starting point is 00:10:22 but the customer is designing products optimized for what the state of art is not good at, but the customer is designing products optimized for what the state of art is good at. So the framing takes time to help. You may have more and more conversations to say, here are the things where things can go, and when things go there, it could bring value, right? So that was the most difficult part, just kind of getting customers to, and as a startup, you need to start generating revenue, having conversations, help you raise money at some point, you have, if you, if you pop all the way up, there's a lot of discussion about how modern companies these days where we're all very short term cited, right? Because you need to hit your quarterly estimates or whatever. And so whether you're a startup or whether you're a humongous company, I would hear this discussion at various companies that I've worked at as well,
Starting point is 00:11:20 where you're talking about how well, you know, but if we do this long-term investment, the value that it could create in 10 years is really, really long. But if we keep doing what we're doing with this, you know, I don't want to say the word cash cow exactly, because that's never a term that I heard, but like my interpretation is we've got this thing that works. To do that migration, that's a huge investment. And then particularly if the vendor is a small, unproven startup it's like well what happens if they go away what happens so that becomes a very very difficult conversation i can imagine that being the thing and then you know to our students who are students and professors in the field you know we're we're all trying to publish papers right and the published paper
Starting point is 00:12:01 you want to have the graph that shows the 10% and you're like, that should be enough, right? 10%. I mean, back in the heyday, 10% IPC increase. Let's just do it. I know. I know. And there's another aspect to it, which is, which is later about this, which is fundamentally, how do you build a, which is something I've learned a lot and I
Starting point is 00:12:23 think it's super, super important for the chip industry and semiconductor industry as a whole, which is something I've learned a lot and I think it's super, super important for the chip industry and semiconductor industry as a whole, which is how do we think about completely scalable business models? Because the easiest thing as a chip startup or whatever is to say, I'm going to build a thing, I'm going to sell it to somebody and then they'll buy it. And then I want to build another thing three years later, sell it to my customers. In general, the software world kind of has moved on from that. There is no software startup and most VCs are not used to this type of conversation anymore or where your revenue is
Starting point is 00:13:05 selling things there's even a term annual recurring revenue and so on where fundamentally what they want is subscription-based services where you don't have to go through this cyclical thing and you have every year you're locking in newer customers and i don't know if your older customers are going away. Right. So most fundamentally, everybody wants to be in some kind of services based business model. And that takes a lot of kind of like technology, product, business model conversations to try and figure out how do you move there. And that was one of the directions we were taking the product and trying to get it into market, which is essentially show up as a cloud service.
Starting point is 00:13:58 And everything else kind of becomes secondary, which has its own challenges. Right. So you touched upon a few different themes there. In particular, you talked about user expectations on the seamless adoption for whatever workload they have on a new technology. And I want to expand on a couple of themes. One is, you were talking about how do you connect the value proposition of an underlying technology, let's say in a chip or a new architecture to the user's context. And specifically, you talked about the software stack that you need in order to enable that and make that value connection, right? So can you talk a little bit about your experiences and how do you think about bootstrapping the software architecture when you have a chip technology or an architecture technology where it makes sense for a user who's trying to adopt this
Starting point is 00:14:48 or play around with it and see what value it can actually deliver? Yeah, I think to me, there were two aspects to it. One is how do you create something that matches enough customers so there is enough people who can who can use it and our default at the time was anyways that will support tensorflow and pytorch and then we had some uh some mechanisms by which if we didn't support a set of operators we would have a relatively clean fallback so we wouldn't crash and burn we would still be able to run the application to completion so that was a very kind of engineering uh solution so you were able to run stuff as opposed to like okay i can't run this anymore.
Starting point is 00:15:46 So from a, what does that mean from a practical standpoint, right? It just means a huge, like a massive amount of money you need to spend as a company to have a software team. We're just writing, writing like a boatload of software. And if we take remove forget papers forget ideas forget technology right it's just like a boatload of software engineering and you just need and then like with all these studies right there was like 50 60 lines of production software can be written in a day whatever so it's and for a startup it's time and money.
Starting point is 00:16:29 Just need to hire the people and build up that stack. And this was, we were doing all this four, five years ago where many of the things that are there even in TARS 2.0 today didn't quite exist in terms of easy way to get the operator graph and so on. So we were doing kind of a concrete example we actually had to do like a basic memory allocator so we have tensors that show up on the host side every tensor needs memory allocated on the device set right and the simplistic way to do this is a one-on-one allocation so when you have very large networks,
Starting point is 00:17:05 you end up sometimes with extremely high demands on memory on the device side, because operator one needed a tensor or produced an activation. By the time operator 10 came along, that activation is dead from a data liveness standpoint so you need a way to reclaim that memory so you can allocate it to some other tensor when operator 10 came along so these are all things like we used to write like in the 80s and 90s right basic memory allocators and so on
Starting point is 00:17:43 so these are all like i I mean, we're a chip company. We just have right software doing stuff like this, right? And it's time and money, right? And unless you do those things, you don't have a turnkey user experience compliance
Starting point is 00:18:00 thing that you can put in the hands of a customer and they can try it because you can show up with kernels and so on, but it doesn't quite move the needle. And I think this is kind of in my, at least as far as I can tell, whether big companies, small company, everybody's experience, right. Which is just the user experience has to be flawless and which is this other paper kind of we have upcoming or as plus work which also shows that how diverse that operator stack is
Starting point is 00:18:38 just the sheer number of operators across many applications is just It's just too numerous. It's just a lot of engineering work. And there doesn't yet exist an auto compiler. They'll just produce all of that when you push about. Yeah, that's very interesting. And you mentioned a few themes here, a couple of things. The first one was, you know, kernels and the sheer diversity of the operator graph. And you briefly talked about like compilers as well in that in that in that exchange.
Starting point is 00:19:09 So how do you think about the trade offs about hand optimized or human optimized kernels that a lot of people write versus building out a compiler stack. And obviously, there are trade offs here a human written compiler library, you can quickly write something, someone can analyze it for that specific operator or specific operating point of an application. But a compiler stack is more general, but it also takes time to build out those abstractions and make it robust enough. So how do you think about the trade-offs between these two things? Do you think there's one right investment for a company that's building out a new architecture?
Starting point is 00:19:45 So I don't know. So I think I'll be super candid in my view here, right? Which is there are, let's say there are at least two pieces or three pieces to a compiler. One is the front-end parsing that takes a set of operators and decides here are the operators that can be fused and once fused here's the back-end code i need to generate right and then there's the actual generation of that back-end code when we started simple machines our dream was that back- backend would be automatic, semi-automatic, things would all just work, right? And as we made more and more progress toward it,
Starting point is 00:20:35 it dawned on us that building that completely automatic compiler would take more time, would not be as robust as a intermediate approach, which is build a library of the most common thing and then have a kind of like a application, application specific compiler or dynamic programming search to figure out which is the best library to call which is kind of basically what the cutlass and kudian and
Starting point is 00:21:11 approaches whereas a set of specialized libraries implementations rather for particular gem shapes which are generated partially automatically partially by humans tuning those libraries so we took a similar approach and for us we felt at the time it was the right approach and in terms of time and money required to build out a massive compiler team that was a more productive approach. And fast forwarding to now, something really fascinating is happening, which is why I keep saying this efficient generalization. To get better performance, the microarchitecture is fundamentally getting, let's say, more and more complex. And as we all know, inevitably for performance reasons, this is not getting properly exposed in the architecture. The architecture is very opaque,
Starting point is 00:22:14 fundamentally obfuscating what is needed to extract performance. And tools like performance counters and so on are good, but not quite at the level that even a phd level programmer let alone average programmer right like a phd computer architect it's very hard to go and fine-tune and extract the most out of the underlying object i have a weird metaphor for this it's not a great metaphor i view it as kind of where coherence and consistency protocols were in the mid-80s,
Starting point is 00:22:50 where people in industry understood them, they were implementing them. But I think academics took a step back and came up with really rich formalism that helped kind of nail some of the correctness and performance issues. So I think this performance portability or performance exposure is a really fundamental thing, which it's not a compiler thing, it's not an architecture thing, it's like, how do we deal with better ways for programmers to understand what the hardware is doing without expecting everyone to just become a ninja programmer and ninja architect, right? Yeah, so it sounds like essentially what you're saying is there's a lot of
Starting point is 00:23:35 different ways to write software right now in the world, right? You can have the ninjas that are super low-level understand and extract every last bit of performance, and then there's sort of like the the mass market case which is we have to make it easy we have to make it easy and it's just not a work you know forget about uh again extracting an extra 50 percent of performance like it just has to work and even the ninja level stuff with this kind of new world of stuff that we're building there you you can't there's not enough information and there's not enough broad concepts of how to expose the information so that we may produce more ninjas because i think we all have accepted through the our careers like we were never going to make
Starting point is 00:24:17 everybody a ninja that's never going to happen you're always going to have some, but the way the world is right now, we're cruising toward a world of none. Yeah. And I think plus there's the other aspect. I don't know how much of this is fundamental to the way computer hardware is because of pattern protection and IP and so on. We don't want to expose the micropotential. Yeah. Yeah.
Starting point is 00:24:45 Right. Yeah. Yeah. Right? Yeah. And if you, and your fundamental thing is, I don't want anybody to know the secret sauce I put in, then you're kind of left with libraries are the way, and the libraries will be closed source. And if somebody wanted to write, and this is where the AI stuff comes in, right?
Starting point is 00:25:06 If somebody came up with a new operator that's very different from a transformer then who's gonna write its bare metal code the deal scientist simply doesn't have the skill and if we want micro architecture to be let's say more and more uh less exposed which yeah we want we want to be less and less exposed. How do we create that co-evolution, I think, is a huge tension. Yeah, I think I completely understand the tension that you're talking about, because in the era of specialization or even, you know, efficient generalization, as you called it, the microarchitecture is a key part of the secret sauce for a lot of companies.
Starting point is 00:25:46 And but there's a huge software stack that runs on top of that there's a very close to bare metal, you know, intrinsics, libraries, and so on. But then once you write that there's an entire stack that sits on top as well. Like in the ecosystem that we have for software today, what are places that you think are places where we can make reasonable progress as a community, right? Where we don't have to expose the lower innards, but you still think there are gaps in the ecosystem that could be filled with sufficiently general frameworks and so on. For example, in the recent past, we've had frameworks like MLIR that are taking off to help
Starting point is 00:26:22 make code generation for domain specific accelerators a lot more easier. I'm sure like, you know, wind the clock back 20 years back, the compiler stack was also equally opaque and then we got LLVM come up and sort of going through a similar transition. So in the entire stack, like where are places that you think we can make meaningful progress and it would be great if people actually took a step and try to build out this ecosystem so that you have a way to abstract away the innards. So you have the lower level that is still under the control of a particular company or a particular startup, but the upper layers of the stack are still relatively amenable. You can bring in things from the overall ecosystem without having to re the wheel or write things all the way from scratch so i have a little bit of a controversial view on this right
Starting point is 00:27:10 which is i think as a community at least with deep learning we may have unnecessarily caught ourselves as academic community caught ourselves in a little bit of quicksand and again this is my own prejudices and things i've learned from doing my startup and now being at InMedia Research as well. So I view fundamentally what is academic research best at? It's taking an idea and in my view,
Starting point is 00:27:38 doing the absolute bare minimum to demonstrate the idea is actually meaningful. Building out full products and the whole ecosystem is great if you want to do it, but I actually think it's not necessary. So, and if I go back to this whole compiler thing and software stack thing, what exists right now is not super composable. So I come up with a clever technique to do fusion i have an academic i should not be responsible for showing it works with tf it works with touch
Starting point is 00:28:12 works with jacks work with this network that might work this whole plethora of operators and all of that stuff because if it was that all encompassing, right, that thing is probably be there already, right? It almost will be, and kind of looking at the generative AI hype now, right? And if the stuff doesn't work for it, doesn't applicable for LLMs, so what? LLMs have been special case optimized because they have to be, they're such an important workload. So I think I'm almost going to make a case for identify the one thing and then without having to boil the ocean
Starting point is 00:28:53 for composability and end-to-end in every way, can we demonstrate that technical value prop in a way that companies and research labs who are arguably more equipped to go and push it into their turnkey composable flow will then pay attention. So to give you a simple example, TensorRT is great. It's a bit of a challenge to use, depending on the workload you're using. And if I come up with a fusion technique, I don't know if I completely agree. It's my responsibility to have a stack bar that shows this technique does X with tensor
Starting point is 00:29:35 RT and my technique is better or worse. How do we come up with a way that I can take my insight and then the widget or the artifact I build demonstrates the intellectual contribution of insight without doing all of the composing. Again, that's kind of where I landed. I can see how others have a completely different view, which is unless you do the whole thing your stuff is useless right yeah sometimes it seems like it depends on always depends on what reviewer you get right yeah some reviewers want you to boil the ocean and some are like this is insightful like who cares like we've got to get this information out there so subani's original question was like where in the software stack can we maybe make some grounds and I think
Starting point is 00:30:26 what you just said was not quite an answer to that but also very very interesting which is you know like as an academic community you know let's not necessarily think about the whole software stack end to end because that shouldn't be anybody's one job. The job of the academic community is to figure out where is there something interesting so that the companies or people who are actually looking for customers and actually looking for users can go. I think the thing with the end-to-end stack that you were talking about before that I find really interesting is there's a lot of chip startups out there right now. And from a chip person and a chip startup's perspective, you always think that you have to have some sort of special sauce in the
Starting point is 00:31:11 chip and that's, you know, whatever the microarchitectures or some sort of selling point. But as you point out, like the reality is of trying to get someone to pay you money for your stuff is that all of these chip startups probably have to end up being way more vertical than they had intended to be right because what you the idea starts at the in some ways the bottom of the value stack it's the very first step and in the end you have to take it all the way to the top so i guess what i wonder is from your experience of like running simple machines, like, do you, do you think anybody has a real chance of displacing NVIDIA? Because if you have to take it from some sort of special sauce thing without the resources of a now $2 trillion company and without, you know, a historical stack building that has gone on for a very, very long time. And now you have to ask,
Starting point is 00:32:01 you know, something that has maybe just gotten series A or just gotten series B and like a few people whose primary reason that they're working on this startup is a hardware idea to say, okay, and now we've got to do the whole step. You know, whether it is libraries or a hybrid or whatever, like that whole middle section up to the top has to be great. So, you know, what are the chances of the, I think a couple of years ago, there was a New York Times article that said there were 55 AI chip startups, and this was a couple of years ago. So what are the chances? One flippant view of this, I think many companies wanted to be like, so NVIDIA was the NVIDIA of deep learning.
Starting point is 00:32:42 They wanted to be the, or AMD was the second to Intel. They wanted to be the AMD of deep learning, but I mean, guess what? AMD is now the AMD of deep learning. So I mean, MI300X is an amazing silicon. It is awesome. I think the Rackham stack is great. So I actually think the answer to your question is, I kind of go back to,
Starting point is 00:33:10 I think the opportunity comes from taking a step back and looking at business value. Like what would, let's take a use case like automotive, for example, right? Or maybe healthcare is another example, which is what are places where there is business value experiment first. You don't even need to build your chip, right? You have business value. Can you value test the business value
Starting point is 00:33:52 with existing hardware, existing software, and so on? And that should be the first kind of intellectual, internal trial run, right? Instead of saying, I'm 10x better, 5x better than somebody. Can you prove that business value? Because the whole saying i'm 10x better 5x better than somebody can you prove that business rather because the whole 5x 10x better the problem with that is it'll take you some time to get your product out the door and in that time the competitors also get better and the gap will close and all of that stuff right so i think in a weird way, I actually believe because Moonslaw has ended, the opportunities are potentially higher now than even five years ago. The business model is I am going to sell a better product to a hyperscaler compared to what NVIDIA or AMD can. I'm very skeptical about that type of business model
Starting point is 00:34:48 and how it could lead to a viable product. I think that type of business model is looking for an exit very quickly at some point because you have something, but now all the hyperscalers have bootstrapped their own hardware teams. The exit strategy doesn't seem viable either. Yeah, that
Starting point is 00:35:08 makes sense. So I think there was a book, The Innovator's Dilemma. Have you read this? Yeah. And so what he talks about is in order to be truly disruptive, you sort of have to come in from the bottom on the side of something. I mean, I'm
Starting point is 00:35:24 totally paraphrasing the idea, but like how Arm got started is like, they're doing tiny little chips somewhere on the side and then suddenly, boom, wow, they're selling, you know, billion chips a year. And then they move up from there, right? And so it sounds like what you're saying is like, somebody should figure out, okay,
Starting point is 00:35:39 there's some side market that's not the stuff that's in the center of tech. It's not, you know, selling in the center of tech it's not you know selling xeons to microsoft or whatever but it's like yeah the stuff that's going in the medical devices that are scanning whether or not your spinal cord is good or your knee is good or whatever laparoscopic surgery type stuff and then build something from there because there's more opportunity and then from there you you know, build up. I mean, I think the model and innovators dilemma is like,
Starting point is 00:36:08 you gain market share there, you build up your revenue, you use that revenue to actually fund moving up. And there's another thing, which was, I remember having this conversation. What do they call it? They call it the business model canvas. It's a pretty standard template where they say, what's your unfair advantage?
Starting point is 00:36:26 What are your channels? And then what's your revenue model and so on. And this totally applies for software startups. What they say is you should at least fill out 50 versions of a business model canvas. Or like a very large number before you decide, this is what my company is going to do. Right.
Starting point is 00:36:50 And then, yeah, so that leads to, you know, how did you decide to start Simple Machines? Like that? I've been curious about that. Like how does, how does one decide to start a company? I think honestly, it was a little bit of luck, timing and overconfidence of where my lab and my students were. So I think at the time, some of the goal, this was 2016, 2015, some of the ideas of
Starting point is 00:37:20 specializing for CNNs had just come out and people had had papers on production trees and so on. So we just got in our stream data flow, which is a data flow array, and then we put a stream engine and we were like, oh, holy crap, we solved everything. And I became convinced that this is a solution and our biggest advantage is nobody will believe how good this is. So we should go build it. And then at the time we were trying to raise money, Google just made public their TPU chip. So I remember I would go have the stocks to people saying, they would initially be like
Starting point is 00:38:04 quizzical, what are you talking about like i don't understand why do we need new chips what do you mean we don't have enough compute and then i pointed to the google thing and they were like holy crap this guy what this guy is saying makes sense and so that kind of helped us raise raise some money initially to uh you get stuff going but it was really we had this belief that you needed a way to programmatically this sounds a little bit cliched but in a programmatic way move data in and out and simple loads and stores were not enough and once you did that you needed a data flow array that could morph into what the operators were. So we built those two, and we were able to show some kernel execution and so on.
Starting point is 00:38:54 And then this was a little bit flippant and serious because I'm a tenured professor. I call myself the entrepreneur without any risk, which is a total oxymoron. It's like worst case scenario go back and become a professor right right my students were very very excited so i wanted to circle back to another thing that you talked about like what is academia really good at it is taking a step back you know figuring out what are some common patterns and how to formalize them into useful concepts and idioms. So from your vantage point right now, are there certain gaps that you see in this formulas and landscape or in this principle landscape? I think I really enjoyed reading your papers that, you
Starting point is 00:39:40 know, distill the principles around specialization, you know, starting from your HBCA 2016 paper and so on. In a similar vein, fast forward a few years since that paper, are there places in the overall ecosystem where you see some formulas in GAP? You talked about the analogy with coherence and consistency, you know, maybe 20 years back. Are there similar flavors of problems that you see today? Things that you feel people know sort of the mechanics of certain solutions, but it still does not have a very elegant sort of, uh, sort of idioms that people can understand and then build on top of and boil ideas into, okay, here are the key principles that we're building on.
Starting point is 00:40:18 I mean, this is completely my own intellectual prejudice and my own biases on where I think industry is whatever I had that industry as a leadership vantage point in and where academia can help the thing I feel we can and should do more of in academic research is almost and I'm not saying this because I've done a lot of clean slate work in my research, right? I don't want this to come off as don't do clean slate research. For better or worse, DL architecture seems to be coalescing to work, a draw core, a vector processor, a matrix engine, and some kind of thing I'm going to call a stream engine since that's the name we used in our research right other people have different names for it right mi300x ampere hopper whatever has been disclosed about mia tpu at some level they all look like this different
Starting point is 00:41:20 companies have different trade-offs on should i have a big matrix engine or a small matrix engine and so on so the way i'm getting to is this seems pretty good and i think there have been lots of other work that i've looked at if i just take the efficiency of an alu and everything else went away right the tops per watt of just that thing compared to silicon you can buy it today the gap's not huge it's about 5x or so which teaches us something which is 5x is simultaneously very large simultaneously very small if i don't change algorithm, I'm not going to be able to build some new silicon that's 20x better than what NVIDIA is shipping today or AMD is shipping today. So going back
Starting point is 00:42:12 to then your question, I think going back and looking at principles that say, if I do this, I can improve stuff by 20%, 30%. It's something that I would love to see more of because everybody is trying to write the 10x, 20x better papers, which are all, I mean, they're great. I'm not criticizing
Starting point is 00:42:37 any of them. Many of them are bringing a sledgehammer where the insights are very hard to extract out and then even if there's an insight i have to do so much work as a reader to figure out what is there when i set aside all of your clean plate stuff that i can then extract out and put back into what is looking like a pretty robust uh thing people have converged on right so i think that's kind of what I, I'd love to see more of, which is whether you do CleanSlate or do something that's applicable, I'd love to see every paper almost have a subsection that says, how might I take these insights and apply it to silicon as it exists today? It sounds kind of boring, but I think in a weird way, that might be one of the most impactful things we can do, particularly because Moore's Law has ended and ideas are like crazy, right? If you have a good idea, people will adopt it.
Starting point is 00:43:43 Yeah, I think, Sune, is this similar to what Norm was saying in our interview with him? So we did this special episode for the 50th anniversary of SIGARC and ISCA and all that a while back. And one of our guests was Norm Jopie. And I think we talked a little bit about what academics versus industry should be looking at and I'm almost certain that he said something very similar to this which is that academics should take sort of like almost like I don't think he used the word small ideas that wasn't and that's not the intention but more like something very well contained and just figure the heck out of it because who is more equipped to actually figure out what's going on end to end and like make a full product it's it's industry that's for sure you know academics don't
Starting point is 00:44:33 have access to reams of data they don't have access to especially when you're talking about hyperscaler type research like they don't have access to the same number of machines they don't have access to you know proprietary stuff and then the end, there's so much like experience tells, I'm sure anybody who works in this industry, so much that has to go into actually making something work that are behind the scenes or special sauce that's in the company that's not exposed. And so. No, I think the sentiment has been echoed by multiple people. Although we have seen about five years of development of different accelerators,
Starting point is 00:45:03 plus a plethora of papers in this particular space, I think it's a good time to sit back and as a community maybe develop a pedagogy around this. Like you said, for example, today most of these accelerators for deep learning in particular have converged around having a vector engine, a matrix engine, and a few different data flow primitives as well. So can we distill those things and say clearly, here are the kind of ideas that could move the needle and here's your upper bound on what you could achieve over a reasonably well-defined
Starting point is 00:45:33 or well-engineered architecture, right? So that we understand where are the rooms for improvement and where are gaps where we don't know how to tackle certain kinds of problems. There are certain problems that you already know how to tackle reasonably well, and you might be within 2x, maybe 5x of a well-engineered solution that's there in industry.
Starting point is 00:45:51 But it would be good to have that crystallized and say, here are the set of ideas. Here's where you can move the needle. Here's the maximum upper bound of these things. And I think that would be incredibly valuable now that we have the benefit of hindsight and looking back on all of these papers. I think a good survey paper, not just a survey, but a survey combined with some quantitative
Starting point is 00:46:11 analysis or a lens through which to view all of these different ideas, I think would be very valuable. Yeah, I mean, we've been thinking about this. My student and I have been writing up, we don't know what to do with it. We kind of, it's currently called the top 10 myths of deep learning hardware. And it's intentionally written that way. And the first myth is GPUs are inefficient for deep learning. I mean, that's a catchy title, that's for sure.
Starting point is 00:46:42 I mean, it sounds like you should keep going with it. I mean, those are the kinds of things that people remember, right? I quickly say that I've generally enjoyed your unrestrained and colorful critiques, various ideas, you know, starting from, you know, your favorite simulator considered harmful or the power struggles between risk and risk and so on. So I think those are very catchy titles, but a kernel of truth in them as well. Yeah. Please continue.
Starting point is 00:47:11 Yeah, no, no, I was just saying, I think one of the things you were trying to distill there is I'm glad that at least from both of you, I'm hearing some validation for that thread. It originated a little bit from my disgruntlement for not even the reviews I was seeing, just core reviews of how people were approaching papers. I was like, I think we need to do something else.
Starting point is 00:47:36 So that was kind of what triggered this. And yeah, so we'll see what shape it'll take and try to figure out what exactly to do with it so it sounds cool um maybe maybe a cigark blog post yeah that's probably the best venue because it's kind of like it's uh i think it gets a lot of a lot of uhhip, even more than an actual paper. Right, right, exactly. The end goal is for people to read it, not necessarily for me to have a paper published, right? So maybe this is a good time to wind the clocks back and talk a little bit about your journey. So what got you interested in computer architecture? How did you end up at the University of Wisconsin-Madison?
Starting point is 00:48:25 Maybe any interesting inflection points along the way during that journey so my journey is probably a slightly odd one and some of you may may know this may not so in the early 80s there was a among the original wave of personal computers there there was something called the BBC Micro. So, and this was announced, I think, late mid-80s or so. And I was, I grew up in India. So in my school, I remember very, very vividly that when I was in fourth grade, nobody knew how to teach computer science.
Starting point is 00:49:03 They had this other whole organization come and teach computer science they had this other whole organization come and teach computer science with the bbc micro and i remember then it just had basic and i think that what else uh yeah i think it was just basic at the time i was just hooked at that time i was like oh wow i just love this right and that's how it started and then i was incredibly lucky to get to work with steve cackler for my phd who was he's amazing and and then timing worked out they were starting the trips project at the time so i got to uh work on that which was great to be able to build a chip as a PhD student. And then after that, I got to Wisconsin. It's a great place to be.
Starting point is 00:49:52 And I actually even started enjoying all the four seasons there. After I had kids, they got into skiing, then I got into skiing. So that's my journey. And then after about 10 years at Wisconsin is when my simple machines journey came along to say, well, these ideas look pretty interesting. I should do a startup and see what I learn. So I never, I know Lisa asked this. So the company itself, at a certain point, we were contemplating, we got our first product done, we had to raise money for the second one. We didn't quite see customer revenue and customer uptake. So we were like, this is not quite going anywhere.
Starting point is 00:50:39 So we decided to fold the company. So, and the, I mean, like I said, I think it was four years. That's all I did. I went on full-time leave. I wasn't really publishing. Every minute of it, I loved. It was such an incredible journey, very different from being an academic.
Starting point is 00:50:58 Lots of gray hair, lots of pressure that I didn't have as an academic. But now you're responsible for people's paychecks, right? So do you think you would ever start another company? I'm actually going to say yes, because I absolutely love the journey. And even though there's a lot of heartbreak and pain and uh my wife sometimes jokes when i i had to after simple machine she was like and i don't want this to come off as uh
Starting point is 00:51:35 kind of rude but she was like dude the amount of highs and lows you had in a day running simple machines you're not going to have in a year as a professor. Yeah. I think it's mostly the thing, the thing that's very different. I've tried to recreate it as best as I can in my group. Just this intensity of like 10, 30 incredibly talented people all working on this one thing and they all completely believe it absolutely right and it's such a kind of like like from a it's almost like endorphins are in your brain all the time right you just love going to work You forget how long you've been working.
Starting point is 00:52:25 There's all this stress, but it's very different. It's almost like it's not quite us against them, but you completely believe and there's no finger pointing. Everybody's just working on this thing. And you completely believe you've got the answer. And I think that's the reason why I do it again. That sounds cool. It does sound cool.
Starting point is 00:52:54 Yeah, it's hard to get that sort of feeling sometimes in a large company because you need massive levels of alignment, right? Yeah, yeah, yeah. And also because I was very, very lucky with the investors i remember uh afri shut down i was i was having ice cream with my with my kids somewhere here and one of my investors walking by i was like oh god what's he going to say and it was really funny he was like how are you doing good good i'm glad you're with your family. Next startup, call me. I'm in.
Starting point is 00:53:28 Well, that's an endorsement. That's great. Yeah. I recently read a book about venture capital in the United States. I forget what it was called. Oh, shoot. It was really, oh, The Power Law. It's called The Power Law.
Starting point is 00:53:43 Oh, yeah, yeah, yeah. Have you read it? Yeah, oh, The Power Law. It's called The Power Law. Oh, yeah, yeah, yeah. Have you read it? Yeah, yeah. And I mean, one of the main premise of it is that like they fully expect many of their ventures to not work out, right? That's the name of the game. But some will. And but as they look for that some,
Starting point is 00:54:00 they're also just looking for talent, right? Yeah. People who they would bet on again. So it sounds like you're one of them. Yeah. Well, we'll see. Well, cool. So I think maybe one of our final questions would be like,
Starting point is 00:54:14 you know, what's coming up next for you? You know, it sounds like you would start another company, but there's nothing on the horizon, or maybe there is, or what's exciting you these days? You know, maybe you're not having the same kind of highs and lows as a professor, but, you know, presumably something is getting you to take your bike through the Wisconsin
Starting point is 00:54:34 winters to get to the office. I mean, I'm working on a couple of things. One, I think I also, I'm at NVIDIA Research part-time. So there are many aspects of that I can't talk about, but the work is super, super exciting. I'm very thrilled to have that opportunity so I can talk more broadly about my thing I'm working on at the university.
Starting point is 00:54:57 So I'm beginning to look mostly at the intersection of really like economics. What happens when moves's law ends? And it has ended. How should architects think about it? And what should fundamentally the chip design cycle look like? And we have this new project. We're basically trying, inspired by other fields
Starting point is 00:55:20 that are hugely data-driven. We don't have a lot of data on how we design an architecture. So we're trying to kind of, and this is based on some of the learnings on my startup and other things, which is how would we control the, and actually control, how do we react to this new economics
Starting point is 00:55:37 and how do we kind of extract more data? How do we do it in a privacy preserving way and so on? So those are the things i'm uh spending a lot of my time on to see what does kind of almost what is forced more architecture right what should we as a field be thinking about uh just expanding on that like many academics i know that you're also very passionate about teaching. So, for the coming crop of students, especially in the changing landscape with the demise of Moore's law, as well as exciting new architectures and new application paradigms and so on. What are things that might be different that you would want to teach the students compared to our standard pedagogy of computer architecture curriculum? No, I mean, that's a great question. And that's actually something we've been working on.
Starting point is 00:56:27 What I would like to do is to find a way to, in a weird way, teach more and also find a way to teach more efficiently. So without students after taking three courses, how do I teach them enough about what's the basic in order processor and also expose them to something beyond that, right? So, and some of this is inspired by, I think things that have happened in software engineering and programming languages where the productivity of
Starting point is 00:57:01 programming languages is just exploded, right? What are you going to do with 50 lines of Python? It's crazy. So can we find a way to introduce not quite new languages, but how do we just get them quickly to be able to have like a turn? This is one of the things we've built kind of looking at is how to deploy a turnkey RISC-V processor they can build, put it on FPGA, run a program like full AlexNet on it and do all that in about two-thirds of a semester. And so they kind of built out an entire RISC-V processor that's running real stuff and running a very large
Starting point is 00:57:41 regression and then add more to it that exposes them to ideas beyond that, right? So I think that's kind of, I don't have, it's been a thing I've been thinking about a lot, which is what do we want undergrads who take maybe one or two architecture courses? What are all, what's the maximum amount of things we can teach them and also teach it more efficiently so they can get to more depth, right? The grad one, honestly honestly is the harder one because what I've struggled with is it's unclear to me which of the classic papers on like 90s and so on, on process design and so on, what principles exactly they are teaching them and how do we motivate
Starting point is 00:58:26 students to extract out the right principles from those papers. And the reaction to that is just read more recent papers. So I tried out an experiment on trying to do something in between where everybody would be required to pick one topic and articulate. And I gave them five, 10 papers on that topic. And the goal of the course is they have to write a one pager that identifies a new sub problem in that topic.
Starting point is 00:59:00 They don't need to do any experiments, no simulation, no nothing. All they need to do is to identify a new research problem. They don't even need to have a solution, right? This is a problem that exists, right? In my mind, that was a way of kind of teaching them the skills of becoming a researcher in a different way than reading the canon of what is architecture. So I'm still figuring out what I learned from
Starting point is 00:59:27 that experiment. So students seem to love it, but I'm still trying to figure out how to improve it. Yeah, that sounds like a fascinating direction of inquiry, both into pedagogy and also how do you teach students. I like that I know one other professor at Stanford who conducts a similar exercise with the students where the essay or one page paper that you write is not describing a solution, but just describing a problem. You don't need to have a solution, but just pointing out that, hey, here's an interesting problem that I've noticed. And I don't think I've seen any good solutions for it. Or here's what I've just observed so far. And people generally, students really enjoy that entire process. so I'll give a plus one to that particular yeah it was very interesting because they all said universally this was much much harder than anything they've done
Starting point is 01:00:15 but less time consuming that sounds efficient. Efficient generalization of hardware architecture, efficient generalization for students. Yeah, that's wonderful. So maybe before we close, you can ask any words of wisdom to our listeners, any advice that you would give to students, researchers, professionals in this particular space.
Starting point is 01:00:45 Oh, wow. Words of wisdom. I'll tell you something flippant. Be contrarian and try to be right. That's very efficient as well. Be contrarian and try to be right. I mean, those are two key sides, right? Like be contrarian and just be a jerk
Starting point is 01:01:12 and just like say no. Yeah, you need to be contrarian, right? Don't work on deep learning. Cool. Well, thanks so much, Kar. We're so glad to have had you on the show today. Awesome. It has been a wonderful conversation. I think, you know, given that our audience is, you know, pretty much all computer architects and we all love hardware, we love hardware architecture, to think about what it takes to translate, you know, some of the ideas that are published in the academic world to make it in a idea of like a hardware startup, really, really fun.
Starting point is 01:01:46 And to talk about some of these business aspects too, has been really fun. And I think in the end, you know, I remember once I went on vacation in Costa Rica and we were walking around these coffee farms and they were saying, cause there's a lot of coffee beans that are sourced in Costa Rica and that the farms were like the bottom of the value add chain so they would say like you know here this giant bushel of beans which somebody has like grown and picked and whatever it was like two dollars for like 20
Starting point is 01:02:17 pounds of beans or something like that and I was like wow it totally stinks to be the bottom of the value add chain you can't even buy a cup of coffee at Starbucks for $2. And yet all these beans and like so much work, it's so much infrastructure, it's so much effort. And somehow the value gets added up and up and up the chain. And I remember thinking that it was similar, like hardware was like that, right? So much investment, so much infrastructure. And then in the end, in some ways, it's a commodity. And then the final value add is this user experience that you talked about. And I don't
Starting point is 01:02:50 think we necessarily think about that as a community. Yeah. So thanks. Awesome. This is great. Thank you. That's a very nice metaphor. I'm going to steal it next time. Coffee beans and hardware. Yeah, sounds good. Sounds good. Thank you so much for joining us today. Great. Thank you, Lisa. Yeah, I like Lisa's sentiment. It was wonderful talking to you. Thank you for your candid thoughts as always. And to our listeners, thank you for being
Starting point is 01:03:16 with us on the Computer Architecture Podcast. Until next time, it's goodbye from us.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.