Lex Fridman Podcast - Jim Keller: Moore’s Law, Microprocessors, Abstractions, and First Principles

Starting point is 00:00:00 The following is a conversation with Jim Keller, legendary microprocessor engineer who has worked at AMD, Apple, Tesla, and now Intel. He's known for his work on AMD K7, K8, K12, and Zen micro architectures, Apple A4 and A5 processors and co-author of the specification for the X86, 64 instruction set and hyper transport interconnect. He's a brilliant first principles engineer and out of the box thinker and just an interesting and fun human being to talk

Starting point is 00:00:32 to. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, give it 5 stars and apple podcasts, follow on Spotify, support on Patreon, or simply connect with me on Twitter. Alex Friedman spelled F-R-I-D-M-A-N. I recently started doing ads at the end of the introduction. I'll do one or two minutes after introducing the episode and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by CashApp.

Starting point is 00:01:06 The number one finance app in the App Store. I personally use CashApp to send money to friends, but you can also use it to buy, sell, and deposit Bitcoin in just seconds. CashApp also has a new investing feature. You can buy fractions of a stock, say $1 worth, no matter what the stock price is. Roker's services are provided by CashApp investing, a subsidiary of Square and Member SIPC. I'm excited to be working with CashApp

Starting point is 00:01:32 to support one of my favorite organizations called First, best known for their first robotics and legal competitions. They educate and inspire hundreds of thousands of students in over 110 countries and have a perfect rating and charity navigator, which means that donated money is used to maximum effectiveness. When you get cash out from the App Store, Google Play, and use code Lex Podcast, you'll get $10 and cash out will also donate $10 to first, which again is an organization that

Starting point is 00:02:02 I've personally seen inspire girls and boys the dream of engineering about a world. And now here's my conversation with Jim Keller. What are the differences in similarities between the human brain and a computer with the microprocessor is core? Let's start with the philosophical question perhaps. Well, since people don't actually understand how human brains work, I think that's true. I think that's true. So it's hard to compare them. Computers are, you know, there's really two things.

Starting point is 00:02:54 There's memory and there's computation, right? And to date, almost all computer architectures are global memory, which is a thing, right? And then computation where you pull data in, you do relatively simple operations on it and write data back. So it's decoupled in modern, in modern computers, and you're you think in the human brain, everything's a mesh, a mesh that's combined together. Well, what people observe is there's you know, some number of layers of neurons, which have local and global connections and information is stored in some distributed fashion and people build things called

Starting point is 00:03:32 neural networks in computers where the information is distributed in some kind of fashion you know there's a mathematics behind it. I don't know that the understanding that is super deep or some mathematics behind it. I don't know that the understanding that is super deep. The computations we run on those are straightforward computations. I don't believe anybody has said a neuron does this computation. So today it's hard to compare them. I would think. So let's get into the basics before we zoom back out.

Starting point is 00:04:05 How do you build the computer from scratch? What is a microprocessor? What is the micorearchitecture? What's an instruction set architecture? Maybe even as far back as what is a transistor? So the special charm of computer engineering is there is a relatively good understanding of abstraction layers. So down to bottom you have atoms and atoms got put together in materials like silicon or

Starting point is 00:04:32 gulp silicon or metal and we build transistors on top of that. We build logic gates right and then functional units like an ader of subtractor or instruction parsing unit and then functional units, like an adder, a sub tractor, an instruction parsing unit, and then we assemble those into processing elements, modern computers, or built out of, you know, probably 10 to 20 locally, you know, organic processing elements, or coherent processing elements, and then that runs computer programs. Right, so abstraction layers and software, there's an instruction set you run, and then there's assembly language, C++, Java, JavaScript, there's abstraction layers essentially from the atom to the data center. So when you build a computer, first there's a target like what's it for,

Starting point is 00:05:25 like how fast does it have to be, which today there's a whole bunch of metrics about what that is. And then in an organization of 1,000 people who build a computer, there's lots of different disciplines that you have to operate on. Does that make sense? operate on. Does that make sense? And so there's a bunch of levels of abstraction in an organization I can tell and in your own vision, there's a lot of brilliance that comes in at every one of those layers. Some of it is science, some of it is engineering, some of his art. What's the most, if you could pick favorites, what's the most important your favorite layer on these layers of obstructions? Where does the magic enter this hierarchy? I don't really care

Starting point is 00:06:13 That's the fun, you know, I'm somewhat agnostic to that so I would say For relatively long periods of time instruction sets are stable So the x86 instruction set, the ARM instruction set, what's the instruction set? So it says, how do you encode the basic operations? Load store, multiply ads, track, conditional branch, there aren't that many interesting instructions. Look, if you look at a program and it runs, you know, 90% of the execution is on 25 opcodes, you know, 25 instructions.

Starting point is 00:06:48 And those are stable, right? What does it mean stable? Intel architecture has been around for 25 years. It works. It works. And that's because the basics, you know, are defined a long time ago, right? Now, the way an old computer ran is you fetched instructions and you executed them in order. Do the load, do the add, do the compare. The way a modern computer

Starting point is 00:07:15 works is you fetch large numbers of instructions, say 500. And then you find the dependency graph between the instructions. And then you execute in independent units those little micrographs. So a modern computer, like people like to say, computers should be simple and clean. But the turns out the market for simple, complete, clean slow computers is zero. Right? We don't sell any simple clean computers. Now you can, there's how you build it can be clean, but the computer people want to buy. That's say in a phone or a data center, fetches a large number of instructions, computes the dependency graph, and then executes it in a way that gets the right answers.

Starting point is 00:08:05 And optimize that graph somehow. Yeah, it's a cute thing. They run deeply out of order and then there's semantics around how memory ordering works and other things work. So the computer sort of has a bunch of bookkeeping tables that says what order cities operations finish in or appear to finish in. But to go fast, you have to fetch a lot of instructions and find all the parallelism. Now, there's a second kind of computer,

Starting point is 00:08:32 which we call GPUs today. And I call the difference. There's found parallelism, like you have a program with a lot of dependent instructions. You fetch a bunch and then you go figure out the dependency graph and you issues instructions out order. That's because you have one serial narrative to execute, which in fact is and can be done out of order. You call it a narrative? Yeah. Well. So yeah, so humans think of serial narrative. So read a book, right? There's a, you know, there's a sentence after sentence after sentence and there's paragraphs. Now you could diagram that. Imagine you could diagram it properly, and you said, which sentences could be read in

Starting point is 00:09:11 anti-order, any order without changing the meaning, right? That's a fascinating question to ask of a book. Yeah, you could do that. So paragraphs could be reordered, some sentences can be reordered. You could say, he is tall and smart and x. And it doesn't matter the order of tall and smart. But if you say that tall man is wearing a red shirt, what colors, you know, like you can create dependencies, right? Right? And so GPUs on other hand run simple programs on pixels, but you're given a million of them. And the first order, the screen you're looking at doesn't care which order

Starting point is 00:09:57 you do it in. So I call that given parallelism, simple narratives around the large numbers of things where you can just say, it's parallel because you told me it was. So found parallelism where the narrative is sequential, but you discover like little pockets of parallelism versus... Turns out large pockets of parallelism. Large. So how hard is it to... Well, how hard is it? That's just transistor count, right? So once you crack the problem, you say, here's how you fetch 10 instructions at a time. Here's how you calculate the dependencies between them. Here's how you describe the dependencies.

Starting point is 00:10:35 Here's, you know, these are pieces, right? So I know, once you describe the dependencies, then it's just a graph. Sort of, it's an algorithm that finds what is that? I'm sure there's a graph here, the theoretical answer here that's solved. But in general, programs, modern programs, that human beings write, how much fond parallelism is there in the 10X? What does 10X mean? Thon parallelism is there in the 10x. What does 10x mean? You executed in order versus yeah, you would get what's called cycles per instruction and it would be about

Starting point is 00:11:18 you know three instructions three cycles per instruction because of the latency of the operations and stuff and in a modern computer excuse it but 0.2, 0.25 cycles per instruction. So it's about, we today find 10x. And there's two things. One is the found parallelism in the narrative, right? And the other is to predictability of the narrative, right? So certain operations say, do a bunch of calculations, and if greater than one do this else do that. That decision is predicted in modern computers to high 90% accuracy. So branches happen a lot.

Starting point is 00:11:55 So imagine you have a decision to make every six instructions, which is about the average, right, but you want to fetch 500 instructions, figure out the graph and execute them all parallel. Right, but you want to fetch 500 instructions figure out the graph and execute them all parallel that means You have let's say if you fetch 600 instructions if it's every six You have to fetch you have to predict 99 out of a hundred branches correctly For that window to be effective Okay, so Parallelism you can't paralyze branches or you can. No, it can pretty, you can predict the branch mean.

Starting point is 00:12:27 What's predicted? So imagine you do a computation over and over. You're in a loop. So while N is greater than one, do. And you go through that loop a million times. So every time you look at the branch, you say, it's probably still greater than one. And you're saying you could do that accurately? Very accurately. My mind is blown.

Starting point is 00:12:47 How the heck did you do that? Wait a minute. Well, you want to know, this is really sad. 20 years ago, you simply recorded which way the branch went last time and predicted the same thing. Right. Okay. What's the accuracy of that? 85%.

Starting point is 00:13:04 So then somebody said, hey, let's keep a couple of bits and Have a little counter so when it predicts one way we count up and then pins so say you have a three-bit counter so you count up and Then you count down and if it's you know, you can use the top bit as a sign bit So you have a sign two-bit number so if it's greater than one you predict predict taking and then less than one you predict not taking right or lessons or whatever the thing is. And that got us to 92%. Okay, you know it's better. This branch depends on how you got there. So if you came down the code one way, you're talking about Bob and Jane. So if you came down the code one way, you're talking about Bob and Jane, right? And then sad is just Bob like Jane, it went one way, but if you're talking about Bob and Jill, just Bob like Jane, you go a different way, right?

Starting point is 00:13:52 So that's called history. So you take the history and a counter. That's cool, but that's not how anything works today. They use something that looks a little like a neural network. So, modern, you take all the execution flows, and then you do basically deep pattern recognition of how the program is executing. And you do that multiple different ways, and you have something that chooses what the best result is.

Starting point is 00:14:24 There's a little supercomputer inside the computer. That's trying to project branching. That calculates which way branches go. So the effective window that it's worth finding grass and gets bigger. Well, why was that going to make me sad? Because that's amazing. It's amazingly complicated.

Starting point is 00:14:41 Oh, well, here's the funny thing. So to get to 85%, took 1,000 bets. To get to 99% takes tens of megabits. So this is one of those. To get the result, to get from a window of say, 50 instructions to 500, it took three orders of magnitude or four orders of magnitude or bits. Now if you get the prediction of a branch wrong,

Starting point is 00:15:12 what happens then? Flush the pipe. If flush the pipe says just the performance cost, but it gets even better. Yeah. So we're starting to look at stuff that says, so executed down this path, and then you had two ways to go, but far, far away, there's

Starting point is 00:15:28 something that doesn't matter which path you went. So you missed, you took the wrong path, you executed a bunch of stuff. Then you had the mispredicting, you backed it up, but you remembered all the results you already calculated. Some of those are just fine. Like, if you read a book and you misunderstand a paragraph, your understanding of the next paragraph, sometimes is invariant to that understanding. Sometimes it depends on it. And you can kind of anticipate that in variance. Yeah, well, you can keep track of whether the data changed. And so when you come back to a piece of code, should you calculate it again or do the same thing? Okay, how much of this is art and how much of it is science?

Starting point is 00:16:12 Because it sounds pretty complicated. So how do you describe a situation? So imagine you come to a point in the road, you have to make a decision, right? And you have a bunch of knowledge about which way to go. Maybe you have a map. So you want to go the shortest way, or do you want to go the fastest way, or you want to take the nicest roads. So it's just some set of data. So imagine you're doing something complicated, like building a computer. And there's hundreds of decision points, all with hundreds of possible ways to go. And the ways you pick interact in a complicated way. Right. And then you have to pick the right spot.

Starting point is 00:16:51 Right, so that's an order of science, I don't know. You avoided the question. You just described the Robin Frost poem, or let's take it. I described the Robin Frost problem. Which... That's what we do as computer designers, it's all poetry. Okay, great.

Starting point is 00:17:08 Yeah, I don't know how to describe that because some people are very good at making those intuitive leaps. It seems like the combinations of things. Some people are less good at it, but they're really good at evaluating the alternatives. Right. And everybody has a different way to do it. And some people can't make those leaps, but they're really good at analyzing it.

Starting point is 00:17:31 So when you see computers are designed by teams of people who have very different skill sets, and a good team has lots of different kinds of people. And I suspect you would describe some of them as artistic. But not very many. Unfortunately. Unfortunately. Well, you know, computer design is hard. It's 99% perspiration. The 1% inspiration is really important. But you still need the 99. Yeah, you got to do a lot of work. And then there's, there are interesting things to do at every level that stack. At the end of the day, if you run the same program multiple times, does it always produce

Starting point is 00:18:16 the same result? Is there some room for fuzziness there? That's a math problem. So if you run a correct C program, the definition is every time you run it, you get the same answer. Yeah, well, that's a math statement. Well, that's a language definitional statement. So for years, when people did, when we first did 3D acceleration of graphics, you could

Starting point is 00:18:41 run the same scene multiple times and get different answers. Right. Right. And then some people thought that was okay and some people thought it was a bad idea. And then when the HPC world used GPUs for calculations, they thought it was a really bad idea. Okay. Now, in modern AI stuff, people are looking at networks, whereas the precision of the data is low

Starting point is 00:19:07 enough that the data is somewhat noisy. And the observation is the input data is unbelievably noisy. So why should the calculation be not noisy? And people have experimented with algorithms that say, can get faster answers by being noisy. Like as the network starts to converge, if you look at the Like as the network starts to converge, if you look at the computation graph, it starts out really wide and it gets narrower. And you can say, is that last little bit that important? Or should I start to graph on the next rep, rev,

Starting point is 00:19:34 before we would have all the way down to the answer? Right? So you can create algorithms that are noisy. Now, if you're developing something and every time you run it, you get a different answer. It's really annoying. And so most people think even today, every time you run the program, you get the same answer. No, I know, but the question is, that's the formal definition of a programming language.

Starting point is 00:19:59 There is a definition of languages that don't get the same answer, but people who use those, you always want something because you get the same answer, but people who use those, you always want something because you get a bad answer and then you're wondering is it because of something in the algorithm or because of this. And so everybody wants a little switch that says no matter what, do it deterministically. And it's really weird because almost everything going into modern calculations is noisy. So the answers have to be so clear. It's why so what do you stand?

Starting point is 00:20:26 I design computer for people who run programs. So somebody says I want a deterministic answer. Like most people want that. Can you deliver a deterministic answer? I guess this is the question. Like when you- Yeah, hopefully, sure. What people don't realize is you get a deterministic answer even though the execution flow is very on deterministic. So you run this program a hundred times, it never runs the same way twice ever. And the answer it arises at the same answer, but it gets the same answer every time. It's just, it's just amazing. Okay.

Starting point is 00:20:59 You've achieved in the eyes of many people, legend status as a chip art architect. What design creation are you most proud of? Perhaps because of challenging, because of its impact, or because of the set of brilliant ideas that were involved in the entire life. I find that description odd, and I have two small children

Starting point is 00:21:28 and I promise you they think it's hilarious. This question. Yeah, so I do it for them. So I am really interested in building computers. And I've worked with really, really smart people. I'm not unbelievable, be smart. I'm fascinated by how they go together, both as a thing to do and as endeavor that people do.

Starting point is 00:21:54 How people and computers go together? Yeah. Like how people think and build a computer. And I find sometimes at the best computer architects aren't that interested in people or the best people managers aren't that good at designing computers. So the whole stack of human beings is fascinating. So the managers, the individual engineers. Yeah, so I said I realized after a lot of years of building computers where you sort

Starting point is 00:22:20 of build them out of transistors, larger gates, functioning units, computational elements, that you could think of people the same way. So people are functional units. And then you could think of organizational design as a computer architecture problem. And then it was like, oh, that's super cool, because the people are all different,

Starting point is 00:22:37 just like the computational elements are all different. And they like to do different things. And so I had a lot of fun, like reframing how I think about organizations. Just like with computers who were saying execution paths, you can have a lot of different paths that end up at the same good destination. So what have you learned about the human abstractions from individual functional human units to the broad organization. What does it take to create something special? Well, most people don't think simple enough.

Starting point is 00:23:16 All right, so you know the difference between a recipe and the understanding. There's probably a philosophical description of this. So imagine you're going to make a loaf of bread. The recipe says get some flour, add some water, add some yeast, mix it up, let it rise, put it in a pan, put it in the oven. It's a recipe. Right, understanding bread. You can understand biology, supply chains, you know, rain grinders, yeast, physics, you know, thermodynamics. Like there are so many levels of understanding there. And then when people build and design things, they frequently are executing some stack of recipes. Right. And the problem with that is the recipes all have limited scope? Like,

Starting point is 00:24:05 if you have a really good recipe book for making bread, it won't tell you anything about how to make an omelette. Right. But if you have a deep understanding of cooking, right, then bread, omelettes, you know, sandwich, you know, there's, there's a different, you know, way of viewing everything. Most people, when you get to be an expert at something, you're hoping to achieve deeper understanding, not just a large set of recipes to go execute. And it's interesting to walk groups of people because executing reps to pieces is unbelievably efficient if it's what you want to do, if it's not what you want to do, you're really stuck.

Starting point is 00:24:52 And that difference is crucial. And everybody has a balance of, let's say, deeper understanding in recipes. And some people are really good at recognizing when the problem is to understand something deeply. Does that makes sense. Atomic sense, does it every stage of development, deep understanding on the team needed? This goes back to the heart versus the science question. Sure.

Starting point is 00:25:16 If you constantly unpacked everything for deeper understanding, you never get anything done. If you don't unpack understanding when you need to, you'll do the wrong thing. And then at every juncture, like human beings are these really weird things because everything you tell them has a million possible outputs, right? And then they all interact in a hilarious way. And then having some intuition about what do you tell them, what do you do, when do you intervene, when do you not? It's complicated. Right. So it's, it's, you know, essentially

Starting point is 00:25:48 computationally unsolvable. Yeah, it's the intractable problem, sure. Humans are mess. But deep understanding, do you mean also sort of fundamental questions of things like, what is a computer? Like, or why, like, the why questions, why are we even building this, like, of purpose? Or do you mean more, like, going towards the fundamental limits of physics, sort of really getting into the core of the science. Well, in terms of building a computer, think a little simpler. So common practice is you build a computer, and then when somebody says I want to make it 10% faster, you'll go in and say, all right, I need to make this buffer bigger, and

Starting point is 00:26:37 maybe I'll add an ad unit, or, you know, I have this thing that's three instructions wide, I'm going to make it four instructions wide. And what you see is each piece gets incrementally more complicated, right? And then at some point you hit this limit, like adding another feature or buffer doesn't seem to make it any faster. And then people say, well, that's because it's a fundamental limit.

Starting point is 00:27:02 And then somebody else to look at it and say, well, actually the way you divide the problem up and the way the different features are interacting is limiting you. And it has to be rethought rewritten, right? So then you refactor and rewrite it and what people commonly find is the rewrite is not only faster, but half is complicated from scratch. Yes. So how often in your career, but just have you seen as needed, maybe more generally, to just throw the whole out the whole thing out. This is where I'm on one end of it. Every three to five years. Which end are you on? Wait. rewrite more often. Right. And three to five years is. If you want to really make a lot of progress on computer architecture,

Starting point is 00:27:45 every five years, you should do one from scratch. So where does the X86, 64 standard come in? Or how often do you? I wrote the, I was the co-author of that spec in 98. That's 20 years ago. Yeah, so that's still around. The instruction set itself has been extended quite a few times. And instruction sets are less interesting than the implementation underneath.

Starting point is 00:28:11 There's been on X86 architecture, Intel's designed a few. I am designed a few very different architectures. And I don't want to go into too much of the detail about how often, but there's a tendency to rewrite it every 10 years and it really should be every five. So you're saying you're an outlier in that sense in the... Really more often. Very, very more often. Isn't that scary?

Starting point is 00:28:39 Yeah, of course. Well, scary to who? To everybody involved, because like you said, repeating the recipe is efficient. Companies want to make money. No, individual engineers want to succeed, so you want to incrementally improve, increase the buffer from 3 to 4. Well, this is where you get into the diminishing return curves. I think Steve Jobs said this, right?

Starting point is 00:29:03 So every, you have a project and you start here and it goes up and they have a diminishing return curves. I think Steve Jobs said this, right? So every, you have a project and you start here and it goes up and they have diminishing return. And to get to the next level, you have to do a new one and the initial starting point will be lower than the old optimization point, but it'll get higher. So now you have two kinds of fear, short term disaster and long term disaster.

Starting point is 00:29:24 And your conscious. Right. People with a quarter by quarter business objective are terrified about changing everything. And people who are trying to run a business or build a computer for a long-term objective know that the short-term limitations block them from the long-term success.

Starting point is 00:29:46 So if you look at leaders of companies that had really good long-term success, every time they saw that they had to redo something, they did. And so somebody has to speak up? Or you do multiple projects in parallel. Like you optimize the old one while you build a new one. But the marketing guys are always like, make promise me that the new computer is faster on every single thing.

Starting point is 00:30:09 And the computer architect says, well, the new computer will be faster on the average. But there's a distribution of results and performance and you'll have some outliers that are slower. And that's very hard because they have one customer who cares about that one. So speaking of the long term term for over 50 years now, Moore's law has served the for me and millions of others as an inspiring

Starting point is 00:30:31 beacon of what kind of amazing future brilliant engineers can build. Yeah. I'm just making your kids laugh all of today. That's great. So first in your eyes, what is Moore's law? If you could define for people who don't know. Well, the simple statement was, from Gordon Moore, was, doubled the number of transistors every two years, something like that. And then my operational model is, we increased the performance of computers by 2X every two or three years.

Starting point is 00:31:05 And it's wiggled around substantially over time. And also in how we deliver performances changed. But the foundational idea was 2x, the transistors every 2 years. The current cadence is something like they call it a shrink factor, like 0.6 every two years, which is not 0.5. But that's referring strictly to the original definition of just a transistor count. And shrink factor is just getting them smaller, smaller, smaller.

Starting point is 00:31:36 Well, as you use for a constant chip area, if you make the transistor smaller by 0.6, then you get one over 0.6 more transistors. So can you linger a little longer? What's a broader, what do you think should be the broader definition of Moore's law? When you mentioned how you think of performance, just broadly, what's a good way to think about Moore's law? Well, first of all,

Starting point is 00:32:00 so I've been aware of Moore's law for 30 years. In what sense? Well, I've been aware of Moore's Law for 30 years. In which sense? Well, I've been designing computers for 40. You're just watching it before your eyes kind of. Well, and somewhere away I became aware of it. I was also informed that Moore's Law was going to die in 10 to 15 years. And so I thought that was true at first, but then after 10 years it was going to die in 10 to 15 years. And then at one point it was going to die in 10 to 15 years. And then at one point

Starting point is 00:32:25 it was going to die in five years. And then it went back up to 10 years. And at some point I decided not to worry about that particular product gnostication for the rest of my life, which is fun. And then I joined Intel and everybody said Moore's Law is dead. And I thought that's sad because it's the Moore's Law company. And it's not not dead and it's always been going to die and you know humans like these apocryphal kind of statements like we'll run out of food or run out of air or run out of room or run out of you know something. Right but it's still incredible that it's lived for as long as it has and yes there's many people who believe now that more's lost is dead. You know, they can join the last 50 years of people had the same tradition.

Starting point is 00:33:12 But why do you think if you can and try to understand, why do you think it's not dead? Well, apparently, let's just think people think more is lost one thing. Trends,isters get smaller, but actually under the sheets there's literally thousands of innovations. And almost all those innovations have their own diminishing return curves. So if you graph it, it looks like a cascade

Starting point is 00:33:36 of diminishing return curves. I don't know what to call that, but the result is an exponential curve, but at least it has been. So, and we keep inventing new things. so if you're an expert in one of the things on a machine return curve, and you can see it's plateau, you will probably tell people, well, this is done, meanwhile some other pilot people are doing something different. So that's just normal.

Starting point is 00:34:05 So then there's the observation of how small could a switching device be. So a modern transistor is something like 1,000 by 1,000 by 1,000 atoms, right? And you get quantum effects down around 2 to 10 atoms. So you can imagine a transistor as small as 10 by 10 by 10. So that's a million times smaller. And then the quantum computational people are working away at how to use quantum effects. So... A thousand by a thousand by a thousand.

Starting point is 00:34:38 Atoms. That's a really a clean way of putting it. Well a fan, like a modern transistor, if you look at the fan, it's like 120 atoms wide, but we can make that thinner. And then there's a gate wrapped around it and then there's spacing. There's a whole bunch of geometry. And, you know, a competent transistor designer

Starting point is 00:34:58 could count both atoms in every single direction. Like, there's techniques now to already put down atoms in a single atomic layer. And you can place atoms if you want to. It's just, you know, from a manufacturing process, if placing an atom takes 10 minutes and you need to put, you know, 10 to the 23rd atoms together to make a computer, it would take a long time. So the methods are, you things and then coming up with effective ways to control what's happening.

Starting point is 00:35:34 Manufacture is stable and cheaply. Yeah. So the innovation stack is pretty broad. There's equipment, there's optics, there's chemistry, there's physics, there's material science, there's metallurgy, there's lots of ideas about when you put different materials together, how they interact, how they stable, how they stable, over temperature, like are the repeatable, there's literally thousands of technologies involved. But just for the shrinking, you don't think we're quite yet close to the fundamental

Starting point is 00:36:06 women's physics. I did a talk on Mars law and I asked for a roadmap to a path about 100 and after two weeks, they said, we only got to 50. 100 what say 100 X rank 100 X shrink. We only got to 50. And I said, once it gives in another two weeks. Well, here's the thing about Moore's Law, right? So I believe that the next 10 or 20 years

Starting point is 00:36:31 of shrinking is gonna happen, right? Now, as a computer designer, you have two stances. You think it's going to shrink, in which case you're designing and thinking about architecture in a way that you'll use more transistors or conversely not be swamped by the complexity of all the transistors you get right you have to have a strategy you know so you're you're open to the possibility and waiting for the possibility of a whole new army of transistors ready to work I'm'm expecting more transistors every two or three years by a number large enough that

Starting point is 00:37:09 how you think about design, how you think about architecture has to change. Like imagine you build buildings out of bricks and every year the bricks are half the size or every two years. Well if you kept building bricks the same way, you know, so many bricks per person per day, the amount of time to build a building would go up exponentially. Right.

Starting point is 00:37:33 Right. But if you said, I know that's coming. So now I'm gonna design equipment. I'm gonna move bricks faster, uses them better, cause maybe you're getting something out of the smaller bricks, more strengths than our walls, you know, less material or efficiency out of that. So once you have a roadmap with what's gonna happen, transistors, they're gonna get, we're gonna get more of them.

Starting point is 00:37:53 Then you design all this collateral around it to take advantage of it and also to cope with it. Like that's the thing people don't understand. It's like if I didn't believe in Moore's law and then Moore's law transistors showed up, my design teams were all drowned. So what's the, what's the hardest part of this influx of new transistors? I mean, even if you just look historically throughout your career, what's, what's the thing? What fundamental changes when you add more transistors in the task of designing an architecture? There's two constants, right? One is people don't get smarter. By the way, there's some signs showing that we do get smarter because of nutrition, whatever. Sorry, bring that. One effect. Yes. Yeah, for millions. Nobody understands it. Nobody knows if it's still going on. So that's on. Or whether it's real or not. But yeah, that's a, I sort of, anyway, but not

Starting point is 00:38:48 expected. I would believe for the most part, people aren't getting much smarter. The evidence doesn't support it. That's right. And then teams can't grow that much. Right. All right. So human beings, you know, we're really good in teams of 10, you know, up to teams of 100, they can know each other beyond that. You have to have organizational boundaries. So you're kind of, you have, those are pretty hard constraints. Right. So then you have to divide and conquer like, as the designs get bigger, you have to divide it into pieces, you know, the power of abstraction layers is really high. We used to build computers out of transistors. Now we have a team that turns transistors in logic cells

Starting point is 00:39:25 and another team that turns them in the functional units, another one that turns them in computers. So we have abstraction layers in there. And you have to think about when do you shift gears on that? We also use faster computers to build faster computers. So some algorithms run twice as fast on new computers. But a lot of algorithms are in squared. So, you know, a computer with twice as many transistors and it might take

Starting point is 00:39:50 four times as long to run. So you have to refactor this off or like simply using faster computers to build bigger computers doesn't work. So you have to think about all these things. So in terms of computing performance and the exciting possibility that more powerful computers bring is shrinking the thing we've just been talking about. One of the for you, one of the biggest exciting possibilities of advancement in performance or is there other directions that you're interested in? Like in the direction of sort of enforcing given parallelism or like doing massive parallelism in terms of many, many CPUs, stacking CPUs

Starting point is 00:40:33 on top of each other, that kind of parallelism or any kind of parallelism. Well, think about it in a different way. So, all computers, slow computers, you said, A equals B plus C times D. Pretty simple. Then we made faster computers with vector units and you can do proper equations and matrices. Then modern AI computations or convolutional neural networks, we can evolve one large data set against another. And so there's sort of this hierarchy of mathematics, you know, from simple equations to linear equations

Starting point is 00:41:11 to matrix equations to deeper kind of computation. And the data sets are good and so big that people are thinking of data as a topology problem, you know, data is organized in some immense shape. And then the computation, which sort of wants to be get data from a man's shape and do some computation on it. So the computers of a lot of people to do is have algorithms go much, much further.

Starting point is 00:41:39 So that paper you reference the certain paper, they talked about, you know, like in an AI started, it was apply rules that's to something. That's a very simple computational situation. And then when they did first chest thing, they they solved deep searches. So have a huge database of moves and results deep search, but it's still just a search. Right. Now, we take large numbers of images and we use it to train these weight sets that we convolve across to completely different kind of phenomena.

Starting point is 00:42:15 We call that AI. Now, they're doing the next generation. If you look at it, they're going up this mathematical graph. Right. they're going up just mathematical graph, right? And then computations, both computation and data sets support going up that graph. Yeah, the kind of computation that might, I mean, I would argue that all of it is still a search, right? Just like you said, topology problem of data sets

Starting point is 00:42:39 is searching the data sets for valuable data and also the actual optimization of neural networks is a kind of search for the... I don't know. If you looked at the interlayers of finding a cat, it's not a search. It's a set of endless projections. So a projection has, here's a shadow of this phone. Then you can have a shadow of that on the something, a shadow on that or something. And if you look in the layers, you'll

Starting point is 00:43:07 see this layer actually describes pointy years and round eyeness and fuzziness and but the the computation to tease out the attributes is not search. Right. I mean, well, the inference part might be search, but the training is not search. Okay. Well, 10 and then in deep networks, they look at layers and they don't even know it's represented. And yeah, if you take the layers out, it doesn't work. Okay. So if I don't take it search, all right. Well, but you have to talk to my mathematician about what that actually is. Well, we could disagree, but the, the, it's just semantics, I think it's not, but it's certainly not, I would say it's absolutely not semantics, but okay. All right. Well, if you want to go there,

Starting point is 00:43:53 so optimization to me is search and we're trying to optimize the ability of a new you'll not work to detect cat ears. And the difference between chess and the space, the incredibly multi-dimensional, 100,000-dimensional space that, you know, and that works to try to optimize over, is nothing like the chess board database. So it's a totally different kind of thing. Okay, in that sense, you can say that it loses the meaning. I can see how you might say, if you, the funny thing is, it's the difference between given search space and found search space. Right, exactly.

Starting point is 00:44:34 Yeah, maybe that's the different way that's right. That's a beautiful way to put it. Okay, but you're saying, what's your sense in terms of the basic mathematical operations and the architectures computer hardware that enables those operations. Do you see the CPUs of today still being a really core part of executing those mathematical operations? Yes. Well, the operations, you know, continue to be at

Starting point is 00:44:58 some track, loads, store, repair, and branch. It's remarkable. So it's interesting that the building blocks of computers are transistors and under that atoms. So you got atoms, transistors, logicates, computers, functioning units and computers. The building blocks are mathematics. At some level are things like ads and subtracts and multiplies. But the space mathematics can describe is, I think, essentially infinite. But the computers that run the algorithms are still doing the same things. Now a given algorithm may say, I need sparse data, or I need 32-bit data, or I need, you know, like a convolution operation that naturally takes 8-bit data, multiplies it

Starting point is 00:45:46 and sums it up a certain way. So the like the data types and tensor flow imply an optimization set, but when you go right down and look at the computers, it's an and or gates doing that, it's a multiplies. Like that hasn't changed much. Now the quantum researchers think they're going to change that radically, and then there's people who think about analog computing because you look in the brain, and it seems to be more analog-ish, you know, that makes us a way to do that more efficiently. But we have a million X on computation, and I don't know the reference, the relationship between computational, let's say, intensity

Starting point is 00:46:27 and ability to hit mathematical abstractions. I don't know, anyways, describe that, but just like you saw in AI, you went from rule sets, to simple search, to complex search, to say found search. Like, those are orders of magnitude more computation to do. And as we get the next two orders of magnitude, like a friend Roger Godori said, like every order of magnitude changes the computation. Fundamentally changes what the computation is doing. Yeah. Oh, you know, the expression of the difference in quantity is a difference in

Starting point is 00:47:04 kind. You know, the difference of the difference in quantity is a difference in kind. You know, the difference between ant and ant hill, right? Or neuron and brain. You know, there's, there's, there's, there's indefinable place where the, the quantity changed the quality, right? And we've seen that happen in mathematics multiple times. And, you know, my, my guess is it's going to keep happening. So in your senses, yeah, if you focus had down

Starting point is 00:47:29 and shrinking the transistor, well, not just had down, we're aware about the software stacks that are running in the computational loads. And we're kind of pondering what do you do with a petabyte of memory that wants to be accessed in a sparse way and have, you know, the kind of calculations AI programmers want. So there's a dialogue interaction, but when you go into the computer chip, you know, you find Adderson, as you mentioned, resutton, the idea that most of the development

Starting point is 00:48:06 in the last many decades in AI research came from just leveraging computation and just simple algorithms waiting for the computation to improve. Well, suffer guys have a thing that they call the problem of early optimization. So you're right, a big software stack, and if you start optimizing the first thing you write, the odds of that being the performance limit is low. But when you get the whole thing working, can you make it too X faster by optimizing the right things?

Starting point is 00:48:36 Sure. While you're optimizing that, could you've written a new software stack, which would have been a better choice? Maybe. Now you have creative tension. But the whole whole time as you're doing the writing, that's the software we're talking about, the hardware underneath gets faster. Let's go back to the Moore's Law. If Moore's Law is going to continue, then your AI research should expect that to show up. And then you make a slightly difference that it choices and we've hit the wall. Nothing's

Starting point is 00:49:07 going to happen. And from here, it's just us rewriting algorithms. Like that seems like a failed strategy for the last 30 years of Moore's Law of Death. So can you just linger on it? I think you've answered it, but it just has the same dumb question over and over. So what, why do you think Moore's law is not going to die? Which is the most promising, exciting possibility of why it won't die in the next five, ten years? So is it the continued shrinking of transistor or is it another S curve that steps in and

Starting point is 00:49:41 it totally sort of... Well, shrinking to transistor is literally thousands of innovations. Right. So there's a whole bunch of S curves just kind of running their course and being reinvented and new things. The semiconductor fabricators and technologists have all announced what's called nanowire. So they took a fan which had a gate around it and turned it into a little wire, so you have better control that, and they're smaller. And then from there, there are some obvious steps about how to shrink that.

Starting point is 00:50:16 The metallurgy around wire stacks and stuff has very obvious abilities to shrink. And there's a whole combination of things there to do. Your sense is that we're going to get a lot, if this innovation from just that shrinking. Yeah, like a factor of 100, it's a lot. Yeah, I would say that's incredible. And it's totally, it's only 10 or 15 years. Now, you're smarter, you might know, but to me, it's totally unpredictable of what

Starting point is 00:50:45 that 100x would bring in terms of the nature of the computation that people would be familiar with. Bell's law. So for a long time, those mainframes, many's workstation, PC, mobile, Moore's Law draw a faster, smaller computers. Right. And then when we were thinking about Moore's Law, Roger Godori said, every 10X generates a new computation.

Starting point is 00:51:10 So scale, or vector, matrix, topological computation. And if you go look at the industry trends, there was mainframes in many computers and NPCs, and then the internet took off. then we got mobile devices and our building 5G wireless with one millisecond latency. And people are starting to think about the smart world where everything knows you recognizes you. Like, like the transformations are going gonna be like unpredictable. How does it make you feel that you're one of the key architects of this kind of future? So you're not when I was talking about the architects

Starting point is 00:51:54 of the high level people who build angry bird apps and snap. I was saying angry bird apps. Who knows? I'm gonna take a stand. That's the whole point of the universe. I'm gonna take a stand at that and the attention distracting nature of mobile phones.

Starting point is 00:52:09 I'll take a stand. But anyway, in terms of the... It doesn't matter so much. The side effects of smartphones or the attention distraction, which part? Well, who knows, where this is all leading? It's changing so fast. Wait, so back to the... My parents used to yell at my sisters for hiding in the closet with a

Starting point is 00:52:27 wired phone with a dial on it. Stop talking to your friends all day. Right. Now my wife feels my kids for talking to their friends all day on text. It looks the same to me. It's always, it's echoes of the same thing. Okay, but you are the one of the key people are detecting the hardware of this future. How does that make you feel? Do you feel responsible? Do you feel excited? So we're in a social contact.

Starting point is 00:52:55 So there's billions of people on this planet. There are literally millions of people working on technology. I feel lucky to be, you know, doing what I do and getting paid for it. And there's an interest in it. But there's so many things going on in parallel. The actions are so unpredictable. If I wasn't here, somebody else would do it. The vectors of all these different things are happening all the time. I'm, there's a, I'm sure some philosopher or meta-philosopher is wondering about how we transform our world. So you can't deny the fact that these tools,

Starting point is 00:53:36 whether these tools are changing our world. That's right. So do you think it's changing for the better? Somebody I read this thing recently, it said the two disciplines with the highest year e-scores in college are physics and philosophy. Right. And they're both sort of trying to answer in the question, why is there anything? Right. And the philosophers, you know, we're on the kind of theological side and the physicists are obviously on the, you know, the material side. And there's a hundred billion galaxies with a hundred billion stars. It seems well repetitive at best.

Starting point is 00:54:17 So, you know, on our way to ten billion people, I mean, it's hard to say what it's all for if that's what you're asking. Yeah, I guess I can say. Things do tend to are significantly increases in complexity. And I'm curious about how computation, like our world, our physical world inherently generates mathematics. It's kind of obvious, right? So we have XYZ coordinates. You take a sphere, you make it bigger,

Starting point is 00:54:47 you get a surface that falls, you know, grows by R squared. Like it generally generates mathematics and the mathematicians and the physicists have been having a lot of fun talking to each other for years. And computation has been, let's say relatively pedestrian. Like computation in terms of mathematics has been doing

Starting point is 00:55:06 binary algebra while those guys have been gallivanting through the other realms of possibility. Now, recently, the computation lets you do mathematical computations that are sophisticated enough that nobody understands how the answers came out. Machine learning. Machine learning. Yeah. It used to be you get data set, you guess at a function, the function is considered physics as if it's predictive of new functions, new data sets. Modern, you can take a large data set with no intuition about what it is and use machine learning to find a pattern that has no function. Right?

Starting point is 00:55:51 And it can arrive at results that I don't know if they're completely mathematically describable. So computation has kind of done something interesting compared to A plus equal B plus C. There's something reminiscent of that step from the basic operations of addition to taking a step towards neural networks. That's reminiscent of what life on earth at its origins was doing. Do you think we're creating sort of the next step in our evolution in creating artificial intelligence systems that will? I don't know. I mean, you know, there's so much in the iterace already. It's hard to say. Well, we stand in this whole thing. Are human beings working on additional abstraction layers

Starting point is 00:56:34 and possibilities? Yeah, it appears so. Does that mean that human beings don't need dogs? You know, no, like, like there's so many things that are all simultaneously interesting and useful. You've seen, throughout your career, you've seen greater and greater level abstractions built in artificial machines. Right? Do you think, when you look at humans, do you think that the look of all life on earth is a single organism building this thing, this machine with greater and greater levels

Starting point is 00:57:05 of abstraction. Do you think humans are the peak, the top of the food chain in this long arc of history on earth? Or do you think we're just somewhere in the middle? Are we, are we the basic functional operations of a CPU? Are we the C++ program, the Python program, or we didn't know that work. Like, somebody's, you know, people have calculated, like, how many operations does the brain do? And something, you know, I've seen the number 10 to the 18th of a bunch of times arrive different ways.

Starting point is 00:57:37 So could you make a computer that did 10 to the 20th operations? Yes. Sure. So you think we're going to do that now. Is there something magical about how brains compute things? I don't know. My personal experience is interesting because you think, you know, how you think and then you have all these ideas and you can't figure out how they happened. And if you meditate, you know, the like what would you can be aware of is interesting.

Starting point is 00:58:04 You know, like what would you can be aware of is interesting? So I don't know if brains are magical or not. You know the physical evidence says no lots of people's personal experience says yes. So what would be funny is if brains are magical and yet we can make brains with more computation? You know, I don't know what to say about that, but What do you think magic is an immersion phenomena? What would be high f? I don't know. I think Yeah, I'm a young killer of what what what in your view is consciousness?

Starting point is 00:58:35 With with consciousness. Yeah, like what you know consciousness? Love Things that are these deeply human things that seems to emerge from our brain. Is that something that will be able to make encode in chips that get faster and faster and faster and faster? It's like a 10 hour conversation. Nobody really knows. Can you summarize it in a couple of sentences?

Starting point is 00:59:02 Many people have observed that organisms run at lots of different levels. If you had two neurons, somebody said you'd have one sensory neuron and one motor neuron. So we move towards things and away from things, and we have physical integrity and safety or not. And then if you look at the animal kingdom, you can see brains that are a little more complicated. And at some point there's a planning system and then there's an emotional system that's happy about being safe or unhappy about being threatened. And then our brains have massive numbers of structures, like planning and movement and thinking and feeling and drives and emotions. And we seem to have multiple layers of thinking systems.

Starting point is 00:59:47 And we have a brain, a dream system that nobody understands whatsoever, which I find completely hilarious. And you can think in a way that those systems are more independent and you can observe, you know, the different parts of yourself can observe them I don't know which one's magical. I don't know which one's not computational So is it possible that it's all computation? Probably is there a limit to computation? I don't think so Do you think the universe is a computer? Like you think to be it's a weird kind of computer because if it was a computer,

Starting point is 01:00:29 right? Like when they do calculations on what it, how much calculation it takes to describe quantum effects is unbelievably high. So if it was a computer, when you built it out of something that was easier to compute, right, that's, that's a funny, it's a funny system. But then the simulation guys pointed out that the rules are kind of interesting, like when you look really close, it's uncertain. And the speed of light says you can only look so far, and things can't be simultaneously except for the odd entanglement problem where they seem to be.

Starting point is 01:00:59 Like the rules are all kind of weird. And somebody said physics is like having 50 equations with 50 variables, to define 50 variables. Like, you know, it's, you know, like physics itself has been a shit show for thousands of years. It seems odd when you get to the corners of everything, you know, if either uncomputable or un-defineable or uncertain. It's almost like the design is the simulation are trying to prevent us from understanding it perfectly. But also the things that require calculations, require so much calculation that are idea of the universe of a computer is absurd because every single little bit of it takes all the computation in the universe to figure out.

Starting point is 01:01:43 So that's a weird kind of computer, you know, you say, simulation is running in the computer, which has, by definition, infinite computation. Not infinite. Oh, you mean if the universe is infinite? Yeah. Well, everything piece of our universe seems to take infinite computation. I did. You're out. Just a lot. Well, a lot. Some pretty big number.

Starting point is 01:02:02 Compute this little teeny spot takes all the mass in the local one lit year by one light year space It's close enough to influence. Oh, it's a heck of a computer if it is one. I know it's it's a weird It's a weird description because the simulation description seems to the break when you look closely at it But the rules in the universe seem to imply something's up That seems a little arbitrary. The universe, the whole thing, the laws of physics, it just seems like how did it come out to be the way it is. Well, lots of people talk about that. It's, you know, it's like I said, the two smartest groups of humans are working on the same problem from different aspects and they're both complete failures. So that's that's kind of cool. They might succeed eventually. Well, after 2000 years, the trend isn't good.

Starting point is 01:02:54 Oh, 2000 years is nothing in this band of the history of the universe. So we have some time. But the next thousand years doesn't look good either. So that's what everybody says at every stage. But with Moore's Law, as you've described, not being dead, the exponential growth of technology, the future seems pretty incredible. Well, it'll be interesting. That's for sure.

Starting point is 01:03:16 That's right. So what are your thoughts on Ray Kurzweil's sense that exponential improvement in technology will continue indefinitely? Is that how you see Moore's Law? Do you see Moore's Law more broadly in sense that technology of all kinds has a way of stacking-askers on top of each other where it'll be exponential and then we'll see all kinds of...

Starting point is 01:03:41 What does an exponential of a million mean? That's a pretty amazing number. And that's just for a local little piece of silicon. Now let's imagine you say decided to get a thousand tons of silicon to collaborate in one computer at a million times the density. Like now you're now you're talking I don't know 10 to the 20th more computation power than our current already unbelievable be fast computers. Nobody knows what that's going to mean. The sci-fi guy's called, you know, Computronium.

Starting point is 01:04:16 Like when a local civilization turns the nearby star into a computer. I don't know if that's true, but- So just even when you shrink the transistor, the that's only one dimension. The ripple effects of that. Like people tend to think about computers as a cost problem, right? So computers are made out of silicon and minor amounts of metals. And you know, this and that, none of those things cost any money like there's plenty of sand like Like you could just turn the beach in a little bit of ocean water in the computers

Starting point is 01:04:49 So all the cost is in the equipment to do it and the trend on equipment is Once you figure out how to build equipment the trend of cost is here. Oh Elon said first you figure out what can figure Haitian you want the atoms in and then how to put them there. Right? Because, well, here's the, you know, his great insight is people are how constrained. I have this thing, I know how it works, and then little tweaks to that will generate something

Starting point is 01:05:19 as opposed to what do I actually want, and then figure out how to build it. It's a very different mindset. And almost nobody has it, obviously. Well, let me ask on that topic, you were one of the key early people in the development of autopilot, at least in the hardware side.

Starting point is 01:05:39 Elon Musk believes that autopilot and vehicle autonomy, if you just look at that problem, can follow this kind of exponential improvement in terms how the how question that we're talking about, there's no reason why you can't. What are your thoughts on this particular space of vehicle autonomy and you're a part of it and Elon Musk's and Tesla's vision for the computer. The computer you need to build was straight forward and you can argue, well, does it need to be two times faster or five times or ten times? But that's just a matter of time or price in the short run.

Starting point is 01:06:15 So that's not a big deal. You don't have to be especially smart to drive a car. So it's not like a super hard problem. I mean, the big problem with safety is attention, which computers are really good at. Not skills. Well, let me push back on one, you see, everything you said is correct, but we as humans tend to take for granted how incredible our vision system is. So you can drive a car with 2050 vision and you can train a neural network to extract

Starting point is 01:06:49 the distance of any object in the shape of any surface from a video and data. Yeah, but that's really simple. No, it's not simple. That's a simple data product. It's not simple. It's because it's not just detecting objects, it's understanding the scene and it's being able to do it in a way that doesn't make errors. So the beautiful thing about the human vision system and our entire brain around the whole thing is we're able to fill in the gaps. It's not just about perfectly detecting cars. It's inferring the occluded cars.

Starting point is 01:07:26 It's trying to, it's understanding the. I think that's mostly a data problem. You so you think what data with compute, with improvement in computation, with improvement in collection. Well, when you're driving a car and somebody cuts you off, your brain has theories about what they did it. They're a bad person, they're distracted, they're dumb.

Starting point is 01:07:47 You can listen to yourself. If you think that narrative is important to be able to successfully drive a car, then current auto-pilot systems can't do it. If cars are ballistic things with tracks and probabilistic changes of speed and direction, and roads are fixed and given, by the way, they don't change dynamically. You can map the world really thoroughly. You can place every object really thoroughly. You can calculate your trajectories of things really thoroughly.

Starting point is 01:08:22 But everything you said about really thoroughly has a different degree of difficulty. So you could say at some point computer autonomous systems way better, it's things that humans are lousy at. Like, there'll be better at attention. They'll always remember there was a pothole in the road that humans keep forgetting about. They'll remember that this set of roads has these weirdo lines on it that the computer's figured out ones and especially if they get updates. So it's so many changes, a given like the key to robots and stuff. So we said is to maximize the givens.

Starting point is 01:09:01 Right. So having a robot pick up this bottle cap is ways you put a red dot on the top. Because then you have to figure out, you know, and if you want to do a certain thing with it, you know, maximize the give-ins as the thing. And autonomous systems are happily maximizing the give-ins. Like, like humans, when you drive someplace new, you remember it because you're processing it the whole time. And after the 50th time you drove to work, you get to work.

Starting point is 01:09:26 You don't know how you got there, right? You're on autopilot, right? Autonomous cars are always on autopilot, but the cars have no theories about why they got cut off or why they're in traffic. So that's never stopped paying attention. Right. So I tend to believe you do have the theories, mental models of other people, especially with pedestrian cyclists,

Starting point is 01:09:48 but also with other cars. So everything you said is actually essential to driving. Driving is a lot more complicated than people realize, I think, so to push back slightly. But to do it. So in the traffic, right? You can't just wait for a gap. You have to be so much aggressive. You'll be surprised how simple a

Starting point is 01:10:08 calculation for that is. I may be on that particular point, but there's a yeah, it I may be asked to push back. I would be surprised. You know what? Yeah, I'll just say where I stand. I would be very surprised. But I think it's you might be surprised how complicated it is. That I tell people, it's like progress disappoints in the short run,

Starting point is 01:10:29 surprises in the long run. It's very possible. Yeah, I suspect in 10 years, it'll be just like taking for granted. Yeah, probably. But probably right now look like it. It's gonna be a $50 solution that nobody cares about.

Starting point is 01:10:42 It's like GPS is like, wow, GPS is, we have satellites in space. That tell you where your location is, it was a really big deal. And now everything is a GPS. Yeah, sure. But I do think that systems that involve human behavior are more complicated than we give them credit for.

Starting point is 01:10:57 So we can do incredible things with technology that don't involve humans. But when you- The humans are less complicated than people. Frequently you subscribe. Maybe I... We stand off right out of large numbers of patterns and just keep doing it over and over. But I can't trust you because you're a human.

Starting point is 01:11:14 That's something I human would say. But my hope is on the point you've made is, even if no matter who's right, I'm hoping that there's a lot of things that humans aren't good at that machines are definitely good at. Like you said, attention, things like that. Well, there'll be so much better that the overall picture of safety and autonomy will be obviously cars will be safer, even if they're not as good at everything.

Starting point is 01:11:42 I'm a big believer in safety. I mean, there are already the current safety systems like cruise control that doesn't let you run into people and lane keeping. There are so many features that you just look at the Pareto of accidents and knocking off like 80% of them is, you know, super doable. Just a lingart on the autopilot team and the efforts there, it seems to be that there's a very intense scrutiny by the media and the public in terms of safety, the pressure, the bar put before autonomous vehicles. What are you sort of as a person there working on the hardware and trying to build a system that builds a safe vehicle and so on. What was your sense about that pressure?

Starting point is 01:12:25 Is it unfair? Is it expected of new technology? Yeah, it seems reasonable. I was interested. I talked about American and European regulators. And I was worried that the regulations would write into the rules, technology solutions, like modern brake systems in ply hydraulic brakes So if you'll read the regulations to meet the letter of law for brakes It sort of has to be hydraulic right and the regulator said they're they're interested in the use cases Like a head-on crash an offset crash don't hit pedestrians don't run into people don't leave the road Don't run the red lighters stop light. They were very much into the scenarios.

Starting point is 01:13:09 And they had all the data about which scenarios injured or killed the most people. And for the most part, those conversations were like, what's the right thing to do to take the next step? Now Elon's very interested in also in the benefits of autonomous driving or freeing people's time and attention as well as safety. And I think that's also an interesting thing, but building autonomous systems so they're safe and safer than people seemed. Since the goal is to be 10x safer than people, having the bar to be safer than people and scrutinizing accidents seems

Starting point is 01:13:52 philosophically correct. So I think that's a good thing. What are... It's different than the things that you work that the Intel AMD Apple with autopilot chip design and hardware design. What are interesting or challenging aspects of building this specialized kind of computing system in the automotive space? I mean, there's two tricks to building like an automotive computer one is the software team the machine learning team is developing algorithms that are changing fast.

Starting point is 01:14:27 So as you're building the accelerator, you have this worry or intuition that the algorithms will change enough that the accelerator will be the wrong one, right? And there's the generic thing, which is if you build a really good general purpose computer, say it's performance is one. And then GPU guys will deliver about 5X to performance for the same amount of silicon because instead of discovering parallelism, you're given parallelism.

Starting point is 01:14:56 And then special accelerators get another 2 to 5X on top of a GPU, because you say, I know the math is always eight bit integers into 32 bit accumulators and the operations are the subset of mathematical possibilities. So auto, you know, AI accelerators have a claimed performance benefit over GPUs because in the narrow math space, you're nailing the algorithm. Now you still try to make it programmable, but the AI field is changing really fast. So there's a little creative tension there of, I want the acceleration afforded by specialization without being over-specialized so that the new algorithm is so much more effective that you would have been better off on a GPU. So there is a tension there.

Starting point is 01:15:46 To build a good computer for an application like a lot of motive, there's all kinds of sensor inputs and safety processors and a bunch of stuff. So one of you own goals to make it super affordable. So every car gets an autopilot computer. So some of the recent startups you look at, and they have a server and a trunk. Because they're saying, I'm gonna build this autopilot computer. So some of the recent startups you look at and they have a server and a trunk because they're saying, I'm going to build this autopilot computer or a place as a driver. So they're cost budgets, $10,000 or $20,000. And Elon's constraint was, I'm

Starting point is 01:16:14 going to put one every car, whether people buy it on a striping or not. So the cost constraint he had in mind was great. Right. And to hit that, you had to think about the system design. That's complicated. It's fun. You know, it's like, it's like, it's craftsman's work. Like, you know, a violin maker, right? You can say, strata various, this is an incredible thing. Any musicians are incredible. But the guy making the violin, you know, picked wood and sanded it and then he cut it, you know, and he glued it, you know, and he waited for the right day. So that when he put that finish on it, it didn't, you know, do something dumb it, you know, and it waited for the right day so that when you put the finish on it, it didn't, you know, do something dumb.

Starting point is 01:16:48 That's craftsman's work, right? You may be a genius craftsman because you have the best techniques and you discover a new one, but most engineering craftsman's work. And humans really like to do that. You know, smart humans. No, everybody. All humans. I don't know. I used to, I dug

Starting point is 01:17:05 ditches when I was in college. I got really good at it. Satisfying. Yeah. So digging ditches is also craftsman. Yeah. Of course. So there's an expression called complex mastery behavior. So when you're learning something, that's fine because you're learning something. When you do something, it's wrote in simple. It's not that satisfying, but if the steps that you have to do are complicated and you're good at them, it's satisfying to do them. And then if you're intrigued by it all, as you're doing them, you sometimes learn new things that you can raise your game, but Crescent's work is good. And engineers, like engineering is complicated enough that you have to learn a lot of skills and then a lot of what you do is then craftsman's work, which is fun.

Starting point is 01:17:50 Aton was driving, building a very resource constraint computer, so computer has to be cheap enough to put in every single car. That essentially boils down to craftsman's work. It's engineering, it's- You know, there's thoughtful decisions and problems to solve. And trade-offs to make, you need 10 camera inputs or eight, you know, building for the current car is the next one. You know, how do you do the safety stuff?

Starting point is 01:18:14 You know, there's a whole bunch of details. But it's fun, but it's not like I'm building a new type of neural network, which has a new mathematics and a new computer to work. There's more invention in them. But the rejection of practice once you pick the architecture, you look inside and what do you see? Adders and multipliers and memories and the basics. So, computers is always this weird set of abstraction layers of ideas and thinking that reduction to practice is transistors and wires and pretty basic stuff. And that's an interesting phenomenon. By the way, like factory work, lots of people saying factory work is road assembly stuff.

Starting point is 01:18:59 I've been on the assembly line. The people work that really like it. It's a really great job. It's really complicated. Putting cars together is hard, right? And the cars moving, the parts are moving and sometimes the parts are damaged and you have to coordinate putting all the stuff together and people are good at it. They're good at it. And I remember one day I went to work and the line was shut down for some reason and some of the guys sitting around were really bummed because they had reorganized a bunch

Starting point is 01:19:25 of stuff and they were gonna head a new record for the number of cars built that day and they were all gun hoe to do it. And these were big tough buggers. And, you know, but what they did was complicated and you couldn't do it. Yeah, and I mean, well, after a while you could,

Starting point is 01:19:39 but you'd have to work your way up because, you know, like putting a bright, what's called the bright the trim on a car on a moving assembly line where it has to be attached 25 places in a minute and a half is unbelievably complicated and and human beings can do it's really good. I think that's harder than driving a car by the way. Putting together work of working on a factory. Two smart people can disagree. Yeah. I think driving a car... Well, we'll get you in the factory someday and then we'll see how you do. Not for us humans driving a car is easy. I'm saying building a machine that drives a car is not easy. Okay, driving a car is easy for humans because we've been evolving for billions

Starting point is 01:20:27 of years. Drive cars. Yeah, do the pay all of the cars are super cool. Now you join the rest of the internet and mocking me. I just, I'm intrigued by your, you know, your anthropology. Yeah, it's like, I have to go dig into that. There's some inaccuracies there. Yes. Okay, but in general, what have you learned in terms of, think about passion, craftsmanship, tension, chaos, the whole mess of it. Well, what did you learn? Have taken away from your time

Starting point is 01:21:11 working with Elon Musk working at Tesla, which is known to be a place of chaos, innovation, craftsmanship and all that. I really like the way you thought. Like, you think you have an understanding about what first principles of something is. And then you talk to Elon about it and you didn't scratch the surface. You know, he has a deep belief that no matter what you do is a local maximum. Right, and I had a friend he invented a better electric motor

Starting point is 01:21:41 and it was like a lot better than what we were using. And one day he came by, he said, you know, I'm a little disappointed because, you know, this is really great. You didn't seem that impressed. And I said, you know, when the super intelligent aliens come, are they going to be looking for you? Like, where is he? The guy built the motor. Yeah, probably not, you know, like, like, like, like, but doing interesting work that's both innovative and let's say craftsman's work on the current thing is really satisfying and it's good. And that's cool. And then Elon was good at taking everything apart.

Starting point is 01:22:15 And like, what's the deep first principle? Oh, no, what's really, no, what's really, you know, you know, that, that, you know, ability to look at it without assumptions and and how constraints is super wild. You know, we built rocket ship and using the same car and you know everything and that's super fun and he's into it too like when they first landed two SpaceX rockets at Tesla We had a video projector in the big room and like 500 people came down and when they landed everybody cheered and some people cried. It was so cool. All right, but how did you do that? Well, it was super hard. And then people say,

Starting point is 01:22:58 well, it's chaotic really. To get out of all your assumptions, you think that's not going to be unbelievably painful. And New Zealand tough, yeah, probably. The people look back on it and say, boy, I'm really happy I had that experience to go take a part, that many layers of assumptions, sometimes super fun, sometimes painful. So it could be emotionally and intellectually painful, That whole process is just stripping away assumptions. Yeah, imagine 99% of your thought process is protecting yourself conception. And 98% of that's wrong. Now you got to math, right?

Starting point is 01:23:39 How do you think you're feeling when you get back into that one bit that's useful? And now you're open and you have the ability to do something different. I don't know if I got the math right, it might be 99.9, but it ain't 50. Imagine it, the 50% is hard enough. Yeah. Now, for a long time, I've suspected you could get better. Like you can think better, you can think more clearly, you can take things apart. And there's lots of examples of that.

Starting point is 01:24:12 People who do that. So. And Neelon is an example of that. Apparently. You are an example. So, I don't know if I am. I'm fun to talk to. Certainly.

Starting point is 01:24:24 I've learned a lot of stuff. Right. Well, here's the other thing. I joke like I read books and people think, oh, you read books? Well, no, I've read a couple books awake for 55 years. Wow. Well, maybe 50 because I didn't learn to read until I was eight or something. And it turns out when people write books they often take 20 years

Starting point is 01:24:47 of their life where they passionately did something, reduce it to 200 pages. That's kind of fun. And then you go online and you can find out who wrote the best books and who liked, you know, that's kind of wild. So there's this wild selection process. And then you can read it it and for the most part understand it and Then you can go apply it like I went to one company I thought I haven't managed much before so I read 20 management books and I started talking to them basically Compared to all the VPs running around I'd run night read 19 more management books and anybody else Wasn't even that hard and half the stuff worked like first. It wasn't even that hard.

Starting point is 01:25:25 And half the stuff worked like first time. It wasn't even rocket science. But at the core of that is questioning the assumptions or sort of entering the thinking first principles thinking sort of looking at the reality of the situation and using the using that knowledge, applying that knowledge. So I would say my brain has this idea that you can question first assumptions. But I can go days at a time and forget that and you have to kind of like circle back that observation because it is emotionally challenging.

Starting point is 01:26:02 Well, it's hard to just keep it front and center because you operate on so many levels all the time and getting this done takes priority or being happy takes priority or screwing around takes priority. Like how you go through life is complicated. And then you remember, oh yeah, I could really think first principles and oh shit, that's tiring. But to do Oh, shit, that's tiring.

Starting point is 01:26:25 You know, but you do for a while, and that's kind of cool. So just as a last question in your sense from the big picture, from the first principles, do you think you kind of answered already, but do you think autonomous driving is something we can solve on a timeline of years. So one, two, three, five, ten years, as opposed to a century. Yeah, definitely. Just to linger in it a little longer,

Starting point is 01:26:54 where's the confidence coming from? Is it the fundamentals of the problem, the fundamentals of building the hardware and the software? As a computational problem, understanding ballistics, roles, topography, it seems pretty solvable. And you can see this, like speech recognition for a long time people are doing frequency and domain analysis and all kinds of stuff.

Starting point is 01:27:21 And that didn't work for at all. And then they did deep learning about it, and it worked great. And it took multiple iterations. And you know, autonomous driving is way past the frequency analysis point. You know, use radar, don't run into things. And the data gathering is going up

Starting point is 01:27:42 and the computation is going up and the algorithm understanding is going up and there's a whole bunch of problems getting solved like that. The data side is really powerful, but I disagree with both you and you on, I'll tell you on once again as I did before that when you add human beings into the picture, it's no longer a ballistics problem, it's something more complicated, but I could be very well proven. And cars are highly damped in terms of rate of change. Like the steering and the steering systems really slow compared to a computer. The acceleration and the accelerations really slow.

Starting point is 01:28:17 Yeah, on a certain timescale on a ballistics timescale, but human behavior. I don't know. Yeah, I don't know. I shouldn't say. You and beings are really slow too. Weirdly, we operate, you know, half a second behind reality. I'll be really understands that when either it's pretty funny. Yeah. Yeah. So, now, it would be very well could be surprised. And I think with the rate of improvement and all aspects on both the compute and the software and the hardware There's gonna be pleasant surprises all over the place Speaking of unpleasant surprises many people have worries about a singularity in the development of AI Forgive me for such questions. Yeah

Starting point is 01:29:00 When AI improves the exponential reaches a point of superhuman level general intelligence. Beyond the point, there's no looking back. Do you share this worry of existential threats from artificial intelligence, from computers becoming superhuman level intelligent? No, not really. We already have a very stratified society and then if you look at the whole animal kingdom of capabilities and abilities and interests and You know smart people have their niche and you know normal people have their niche and craftsmen have their niche and You know animals have their niche. I suspect

Starting point is 01:29:40 That the domains of interest for things that you know expect that the domains of interest for things that, you know, astronomically different, like the whole something got 10 times smarter than us and wanted to track us all down because what we like to have coffee at Starbucks. Like it doesn't seem plausible. Now, is there an existential problem that how do you live in a world where there's something way smarter than you? And you, you based your kind of self-esteem on being the smartest local person. Well, there's what 0.1% of the population who thinks that because the rest

Starting point is 01:30:10 of the population has been dealing with it since they were born. So the the the breath of possible experience that can be interesting is really big. And you know, super intelligent seems likely, although we still don't know if we're magical, but I suspect we're not. And it seems likely that all great possibilities that are interesting for us. And its its interests will be interesting for that for whatever it is. It's not obvious why it's interest would somehow want to fight over some square foot of dirt or you know whatever the usual fears are about. So you don't think you'll inherit some of the darker aspects of human nature. Depends on how you think reality is

Starting point is 01:31:00 constructed. So for for whatever reason human beings are in, let's say, creative tension and opposition with both are good and bad forces. Like, there's lots of philosophical understanding of that. Right. I don't know why that would be different. So you think the evils is necessary for the good? I mean, the attention. I don't know about evil, but like we live in a competitive world where you're good. If somebody else is, you know, evil, you know, there's, there's the malignant part of it, but that seems to be self limiting, although occasionally it's, it's super horrible. But yes, there's a debate over ideas and some people have different beliefs and that debate itself is a process.

Starting point is 01:31:51 So the arriving at something. Yeah, and why wouldn't that continue? Yeah. Yeah. Just you, but you don't think that whole process will leave humans behind in a way that's painful. And emotionally painful, yes, for the point 1%, they'll be.

Starting point is 01:32:07 And why isn't it already painful for a large percentage of the population? And it is. I mean, society does have a lot of stress in it. About the 1%, and about the this, and about that. But everybody has a lot of stress in their life about what they find satisfying. And know yourself seems to be the proper

Starting point is 01:32:26 dictum and pursue something that makes your life meaningful. Seems proper. And there's so many avenues on that. Like there's so much unexplored space at every single level. You know, I'm somewhat of my nephew called me a jaded optimist. Yeah, so it's... There's a beautiful tension in that label, but if you were to look back at your life and could relive a moment, a set of moments because they're the happiest times of your life outside of family.

Starting point is 01:33:08 What would that be? I don't want to relive any moments. I like that. I like that situation where you have some amount of optimism and then the anxiety of the unknown. So you love the unknown, the mystery of it. I don't know about the mystery, I'm sure it gets your blood pumping. What do you think is the meaning of this whole thing? Of life on this pale budat. It seems to be what it does.

Starting point is 01:33:42 Like the universe for whatever reason makes atoms which makes us, which we do stuff. And we figure out things and we explore things. And that's just what it is. It's not just. Yeah, it is. Jim, I don't think there's a better place to end it. It's a huge honor. Well, that was super fun. Thank you so much for talking today. All right, great. Thanks for listening to this conversation. And thank you to our presenting sponsor, CashApp.

Starting point is 01:34:16 Download it, use code Lex Podcast. You'll get $10 and $10 will go to first. A STEM education nonprofit that inspires hundreds of thousands of young minds to become future leaders and innovators. If you enjoy this podcast, subscribe to my YouTube, give it 5 stars on Apple Podcast, follow on Spotify, support it on Patreon, or simply connect with me on Twitter. And now let me leave you with some words of wisdom from Gordon Moore. If everything you try works, you aren't trying hard enough.

Starting point is 01:34:48 Thank you.

Lex Fridman Podcast - Jim Keller: Moore’s Law, Microprocessors, Abstractions, and First Principles

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.