Lex Fridman Podcast - George Hotz: Comma.ai, OpenPilot, and Autonomous Vehicles

Episode Date: August 5, 2019

George Hotz is the founder of Comma.ai, a machine learning based vehicle automation company. He is an outspoken personality in the field of AI and technology in general. He first gained recognition fo...r being the first person to carrier-unlock an iPhone, and since then has done quite a few interesting things at the intersection of hardware and software. This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on iTunes or support it on Patreon.

Transcript
Discussion (0)
Starting point is 00:00:00 The following is a conversation with George Hots. He's the founder of Kama AI, a machine learning-based vehicle automation company. He is most certainly an outspoken personality in the field of AI and technology in general. He first gained recognition for being the first person to carry our on-lock an iPhone. And since then, he has done quite a few interesting things at the intersection of hardware and software. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, give it 5 stars on iTunes, support it on Patreon, or simply connect with me on Twitter.
Starting point is 00:00:35 Alex Friedman spelled F-R-I-D-M-A-N. And I'd like to give a special thank you to Jennifer from Canada for her support of the podcast on Patreon. Merci beaucoup Jennifer. She's been a friend and an engineering colleague for many years since I was in grad school. Your support means a lot and inspires me to keep this series going. And now here's my conversation with George or George Hots. in a simulation. Yes, but it may be unfalseifiable. What do you mean by unfalseifiable? So if the simulation is designed in such a way that they did like a formal proof to show that no information can get in and out. And if their hardware is designed for the anything in the simulation to always keep the hardware in spec, it may be impossible to prove whether we're in a simulation or not.
Starting point is 00:01:48 So they've designed it such that it's a closed system, you can't get outside the system. Well, maybe it's one of three worlds. We're either in a simulation which can be exploited, we're in a simulation which not only can't be exploited, but like the same thing is true about VMs. A really well designed VM, you can't even detect if you're in a VM or not. That's brilliant. So the simulation is running on a virtual machine. But now in reality, all VMs have ways to detect.
Starting point is 00:02:16 That's the point. I mean, you've done quite a bit of hacking yourself. So you should know that really any complicated system will have ways in and out. So this isn't necessarily true going forward. I spent my time away from comma, I learned a cock, it's a language for writing math proofs and if you write code that compiles in a language like that, it is correct by definition. The types check its correctness. So it's possible that the simulation is written in a language like this, in which case, yeah.
Starting point is 00:02:55 Yeah, but that can be sufficiently expressive a language like that. Oh, it can. It can be. Yeah. Okay. Well, so yeah. All right. So the simulation doesn't have to be tearing complete if it has a scheduled end date looks like it does actually with entropy I mean, I don't think that simulation that results in something as complicated as the universe
Starting point is 00:03:19 Would have a formal proof of correctness right It's possible of course we have no idea how good their tooling is and we have no would have a formal proof of correctness, right? It's possible, of course. We have no idea how good their tooling is, and we have no idea how complicated the universe computer really is. It may be quite simple. It's just very large, right?
Starting point is 00:03:36 It's very, it's definitely very large. But the fundamental rules might be super simple. Yeah, Conno is gonna have a life kind of stuff. Right. So if you could hack, so imagine simulation that is hackable, if you could hack it, what would you change about the, like how would you approach hacking a simulation? The reason I gave that talk. I've, by the way, I'm not familiar with the talking game. I just read the talked about
Starting point is 00:04:05 escaping the simulation or something like that. So maybe you can tell me a little bit about the theme and the message there too. It wasn't a very practical talk about how to actually escape a simulation. It was more about a way of restructuring an us versus them narrative. If we continue on the path we're going with technology, I think we're in big trouble, like as a species and not just as a species, but even as me as an individual member of the species. So if we could change rhetoric to be more like to think upwards, like to think about that we're in a simulation and how we could get out. Already, we'd be on the right path. What you actually do once you do that,
Starting point is 00:04:51 while I assume I would have acquired way more intelligence in the process of doing that. So I'll just ask that. So the thinking upwards, what kind of ideas, what kind of breakthrough ideas do you think thinking in that way could inspire? And what did you say upwards upwards into space? Are you thinking sort of exploration and all forms? You space narrative That held for the modernist generation doesn't hold as well for the postmodern generation What's the space narrative? I was talking about the same space the theater match on space like going on Literally like building like Elon Musk like we're gonna build rockets What's the space narrative? I was talking about the same space, the three-dimensional space. Don't know what space, like going on space.
Starting point is 00:05:25 Literally, like building like Elon Musk, like we're going to build rockets, we're going to go to Mars, we're going to colonize the universe. And the narrative you're referring to the race to space. The race to space, yeah. Explore, okay. That was a great modernist narrative. Yeah. It doesn't seem to hold the same weight in today's culture.
Starting point is 00:05:43 I'm hoping for good post postmodern narratives that replace it. So think, let's think. So you work a lot with AI. So AI is one formulation of that narrative. There could be also, I don't know how much you do in VR and AR. That's another, I know less about it, but every time I play with it and our research is fascinating, the virtual world. our research is fascinating the virtual world Are you are you interested in the virtual world? I would like to move to a virtual reality
Starting point is 00:06:11 In terms of your work, no, I would like to physically move there The apartment I can rent in the cloud is way better than the apartment I can rent in the real world Well, it's all relative, isn't it because others will have very nice apartments too So you'll be inferior in the virtual world as well. But that's not how I view the world, right? I don't view the world. I mean, it's a very like, like, almost zero-sumish way to view the world, say like, my great apartment isn't great
Starting point is 00:06:35 because my neighbor has one too. No, my great apartment is great because like, look at this dishwasher, man. You just touch the dish and it's washed, right? And that is great in and of itself, if I have the only apartment or if everybody had the apartment. I don't care. So you have fundamental gratitude. The world first learned of Geoha, George Hots in August 2007, maybe before then,
Starting point is 00:06:59 but certainly in August 2007 when you were the first person to unlock carry on lock and iPhone. How did you get into hacking? What was the first system you discovered? Vulnerabilities for and broken to so that was really kind of the first thing. I had a I had a book in in 2006 called Grey Hat Hacking. And I guess I realized that if you acquired these sort of powers, you could control the world. But I didn't really know that much about computers back then. I started with electronics.
Starting point is 00:07:38 The first iPhone hack was physical. Cardware. You had to open it up and pull and address line high. And it was because I didn't really know about software exploitation. I learned that all in the next few years and I got very good at it. But back then, I knew about like how memory chips are connected to processors and stuff. We knew about software and programming. He didn't know. Oh, really? Your view of the world and computers was physical was was hardware.
Starting point is 00:08:05 Actually, if you read the code that I released with that in August 2007, it's atrocious. Well, language was in a sea. See, and in a broken sort of state machine, S.C., I didn't know how to program. Yeah. So how did you learn to program? What was your journey?
Starting point is 00:08:24 Because I mean, we'll talk about it. You've live streamed some of your programming. So, how did you learn to program? What was your journey? I mean, we'll talk about it. You've live streamed some of your programming. This is a chaotic, beautiful mess. How did you arrive at that? Years and years of practice. I entered at Google, the summer after the iPhone unlock.
Starting point is 00:08:41 And I did a contract for them where I built a hardware for Street View and I wrote a software library to interact with it. And it was terrible code. And for the first time, I got feedback from people who I respected saying, no, like, don't write code like this. Now, of course, just getting that feedback is not enough. The way that I really got good was I wanted to write this thing like that could emulate and then visualize like arm binaries because I wanted to hack the iPhone better and I didn't like that I couldn't like see what I couldn't single-step through the processor because I had no debugger on there, especially for the low-level things like the boot run and the boot loader. So I tried to build this tool to do it.
Starting point is 00:09:27 And I built the tool once and it was terrible. I built the tool second times, it was terrible. I built the tool third time. This is by the time I was at Facebook, it was kind of okay. And then I built the tool fourth time when I was a Google intern again in 2014. And that was the first time I was like, this is finally usable.
Starting point is 00:09:42 How do you pronounce this, Kira? Kira, yeah. So it's essentially the most efficient way to visualize the change of state of the computer as the program is running. That's what you mean by debugger. Yeah, it's a timeless debugger. So you can rewind just as easily as going forward.
Starting point is 00:10:01 Think about, if you're using GDB, you have to put a watch on a variable. If you want to see if that variable changes, and Cure you can just click on that variable, and then it shows every single time when that variable was changed or accessed. Think about it like Git for your computers, the run log. So there's like a deep log of the state of the computer
Starting point is 00:10:22 as the program runs, and you can rewind. Why isn't that, maybe it is, maybe you can educate me, why isn't that kind of debugging used more often? Oh, because the tooling's bad. Well, two things. One, if you're trying to debug Chrome, Chrome is a 200 megabyte binary that runs slowly on desktops. So that's going to be really hard to use for that. But it's really good to use for
Starting point is 00:10:45 like CTFs and for bootrums and for small parts of code. So it's hard if you're trying to debug like massive systems. What's the CTF and what's the bootrum? The bootrum is the first code that executes the minute you give power to your iPhone. Okay. And CTF were these competitions that I played capture the flag. Capture the flag. I was going to ask you about that. What are those look at? I watched a couple videos on YouTube. Those look fascinating. What have you learned about maybe at the high level of vulnerability of systems from these competitions? The like, I feel like like in the heyday of CTFs, you had all of the best security people
Starting point is 00:11:23 in the world challenging each other and coming up with new toy exploitable things over here and then everybody okay who can break it and when you break it you get like there's like a file in the server called flag and then there's a program running listening on a socket it's vulnerable so you write an exploit you get a shell and then you cat flag and then you type the flag into like a web based scoreboard and you get points. So the goal is essentially to find an exploit in a system that allows you to run shell, to run arbitrary code on that system. That's one of the categories.
Starting point is 00:11:56 That's like the ponable category. Ponable? Yeah, ponable. It's like, you know, you pwn the program. It's a program. Yeah. Yeah, you know, for personal, I apologize. I'm gonna, I'm gonna say it's because I'm Russian, but maybe you can help educate me. Some video game like misspelled own way back in the day. Yeah, and there's just,
Starting point is 00:12:21 I wonder if there's a definition and I have to go to Urban Dictionary for it. It's just, I wonder if there's a definition and I have to go to Urban Dictionary for it. You need to be interested in just hearing it sounds. Okay, so what was the heyday of CTF, by the way, but was it, what decade are we talking about? I think like, I mean, maybe I'm biased because it's the era that I played. But like 2011 to 2015,
Starting point is 00:12:43 because the modern CTF scene is similar to the modern competitive programming scene. You have people who like to drills, you have people who practice. And then once you've done that, you've turned it less into a game of generic computer skill and more into a game of, okay, you drill on these five categories. And then before that, it didn't have as much attention as it had. I don't know, I won $30,000 in Korea for one of these competitions.
Starting point is 00:13:12 Oh, we crap. Yeah, they were, they were, that was. So that means, I mean, money's money, but that means there was probably good people there. Exactly, yeah. Are the challenges human constructed or are they grounded in some real flaws in real systems? Usually they're human constructed, but they're usually inspired by real flaws.
Starting point is 00:13:32 What kind of systems I imagine is really focused on mobile. What has vulnerabilities these days? Is it primarily mobile systems like Android? Oh, everything does. You can, of course. The price has kind of gone up because less and less people can find them. And what's happened in security is now if you want to like jailbreak an iPhone, you don't need one exploit anymore, you need nine. Nine, chain together. What would you mean? Yeah. Wow. Okay. So it's really, what's the, what's the benefit of speaking higher level
Starting point is 00:14:02 philosophically about hacking? I mean, it sounds from everything I've seen about you, you just love the challenge and you don't want to do anything. You don't want to bring that exploit out into the world and do any actual letter run wild. You just want to solve it and then you go on to the next thing. Oh, yeah, I mean, doing criminal stuff is not really worth it. And I'll actually use the same argument for why I don't do defense, for why I don't do crime. If you want to defend a system,
Starting point is 00:14:33 say the system has 10 holes, right? If you find nine of those holes as a defender, you still lose because the attacker gets in through the last one. If you're an attacker, you only have to find one out of the 10. But if you're a criminal, if you log on with a VPN nine out of the 10 times, but one time you forget, you're done. Because you only have to mess up once to be caught
Starting point is 00:14:58 as a criminal. That's why I'm not a criminal. But, okay, let me, because let's have a discussion with somebody, just at a higher level about nuclear weapons actually, why we're having blow and R ourselves up yet. And my feeling is all the smart people in the world, if you look at the distribution of smart people, smart people are generally good. And then some other person, I was talking to Sean Carroll the physicist and you were saying no good and bad people are
Starting point is 00:15:27 evenly distributed amongst everybody. My sense was good hackers are in general good people and they don't want to mess with the world. What's your sense? I'm not even sure about that. Like... about that, like, if a nice life, crime wouldn't get me anything. But if you're good and you have these skills, you probably have a nice life too, right? Right. You can use the farther things. But is there an ethical, is there some, is there a little voice in your head that says, well, yeah, if you could hack something to where you could hurt people
Starting point is 00:16:09 and you could earn a lot of money doing it though, not hurt physically perhaps, but disrupt their life in some kind of way. It isn't there a little voice that says, well, two things, one, I don't really care about money. So like the money wouldn't be an incentive. The thrill might be an incentive. But when I was 19, I read crime and punishment. Right. Good. That was another, that was another great one that talked me out of ever really doing
Starting point is 00:16:34 crime. Because it's like, that's going to be me. I get away with it, but I'm just running my head. Even if I got away with it, you know, and then you do crime for long enough, you'll never get away with it. That's right. In the end, that's a good reason to be good. I wouldn't say I'm good, I would just say I'm not bad. You're a talented programmer and a hacker in a good positive sense of the word. You've played around fond vulnerabilities in various systems.
Starting point is 00:17:01 What have you learned broadly about the design of systems and so on from that whole process? You learn to not take things for what people say they are, but you look at things for what they actually are. I understand that's what you tell me it is, but what does it do? Right. And you have nice visualization tools to really know what it's really doing. Oh, I wish.
Starting point is 00:17:34 I'm a better programmer now than I was in 2014. I said, Kira, that was the first tool that I wrote that was usable. I wouldn't say the code was great. I still wouldn't say my code is great. So how was your evolution as a programmer except practice? I wouldn't say the code was great. I still wouldn't say my code is great. So how was your evolution as a programmer, except practice? You started with C, which point did you pick up Python?
Starting point is 00:17:52 Because you're pretty big in Python now. Now, yeah, in college, I went to Carnegie Mellon when I was 22. I went back, I'm like, all right, I'm gonna take all your hardest CS courses. We'll see who I do, right? Like did I miss anything by not having a real undergraduate education? Took operating systems, compilers, AI, and their freshman Weeder math course.
Starting point is 00:18:15 And some of those classes you mentioned, pretty tough actually. They were great. At least the 2012, circa 2012, operating systems and compilers were two of the best classes I've ever taken in my life, because you write an operating system and you write a compiler. I wrote my operating system in C and I wrote my compiler in Haskell, but somehow I picked up Python that semester as well. I started using it for the CTFs, actually. That's when I really started to get into CTFs and CTFs you're all to race against the clock.
Starting point is 00:18:49 So I can't write things and see. Oh, there's a clock component. So you really want to use the programming language just so you can be fastest. 48 hours. Pone as many of these challenges as you can. Pone. Yeah.
Starting point is 00:18:59 You got like 100 points of challenge, whatever team gets the most. You were both a Facebook and Google for a brief stint. Yeah, with the project zero actually at Google for five months Where you develop Kira? What was project zero about in general? what We're just curious about the security efforts in these companies. Well product zero started the same time I went there. What
Starting point is 00:19:25 what years are you there? 2015. 2015. So that was right at the beginning of of of project. It's small. It's Google's offensive security team. I'll try to give I'll try to give the best public facing explanation that I can. So the idea is basically these vulnerabilities exist in the world. Nation states have them. Some high-powered bad actors have them. Sometimes people will find these vulnerabilities and submit them in bug bounties to the companies. But a lot of the companies don't really care. They don't even fix the bug. They just didn't hurt for there to be a vulnerability. So project zero is like, we're going to do a different. We're going to
Starting point is 00:20:12 announce a vulnerability and we're going to give them 90 days to fix it. And then whether they fix it or not, we're going to drop the drop the zero day. Oh, wow. We're going to drop the weapon. That's so cool. That is so cool. I love that deadline. Oh, that's so cool. Give him real deadlines. Yeah And I think it's done a lot for moving the industry forward. I Watched your coding sessions on the streamed online You code things up the basic projects usually from scratch. I would say sort of as a program myself just watching you that you type really fast and your brain works in both brilliant and chaotic ways. I don't know if that's always true, but certainly for the live streams.
Starting point is 00:20:54 So it's interesting to me because I'm more, I'm much slower and systematic and careful and you just move, I mean, probably an order of magnitude faster. So I'm curious, is there a method to your madness? Is this, is this just who you are? There's pros and cons. There's pros and cons to my programming style, and I'm aware of them. Like if you ask me to like, like get something up and working quickly with like an API that's kind of undocumented, I will do this super fast because I will throw things at it until it works. If you ask me to take a vector and rotate it 90 degrees and then flip it over the X, Y plane, I'll spam program for two hours and won't get it.
Starting point is 00:21:38 Oh, because it's something that you can do with a sheet of paper, a thing through design, and then just, you really just throw stuff at the wall and you get so good at it that it usually works. I should become better at the other kind as well. Sometimes I'll do things methodically. It's nowhere near as entertaining on the Twitch streams. I do exaggerate it a bit on the Twitch streams as well. The Twitch streams, I mean, what do you wanna see a game
Starting point is 00:22:02 or you wanna see actions permit, right? I'll show you APM for programming too. Yes. I recommend people go to it. I think I watched, I watched probably several hours of you. Like, I've actually left you programming in the background while I was programming because you made me, it was like watching a really good gamer.
Starting point is 00:22:19 It's like, energizes you. Because you're like moving so fast. And so it's awesome. It's inspiring. And it's something. It made me jealous that like because my own program is inadequate in terms of speed. Oh, I was like, so I'm I'm I'm I'm twice as frantic on the live streams as I am when I code without Oh, it's super entertaining. So I wasn't even paying attention to what you were coding, which is great. So it's just watching you switch windows and VIM, I guess, these most.
Starting point is 00:22:47 Yeah, the VIM and screen. I've developed a workflow at Facebook and to talk about that. How do you learn new programming tools, ideas, techniques these days? What's your like methodology for learning new things? So I wrote for comma, the distributed file systems out in the world are extremely complex. Like if you want to install something like like SEF, SEF is I think the like open infrastructure distributed file system or there's like newer ones like seaweed FS, but these are all like 10,000 plus line projects.
Starting point is 00:23:23 I think some of them are even 100,000 line. And just configuring them was a nightmare. So I wrote one. It's 200 lines. And it uses like engine X to the line servers and has a little master server that I wrote and go. And the way I go. If I would say that I'm proud per line of any code I wrote,
Starting point is 00:23:43 maybe there's some exploits that I think are beautiful and then this, this is 200 lines, and just the way that I thought about it, I think was very good, and the reason it's very good is because that was the fourth version of it that I wrote, and I had three versions that I threw away. You mentioned, you said go? I wrote and go, yeah. And go. So, that was a functional language, I forget what go is that. Go is Google's language.
Starting point is 00:24:03 Right. It's not functional. It's, it's like, in a way it's C++, but easier. It's strongly typed. It has a nice ecosystem around it. When I first looked at it, I was like, this is like Python, but it takes voices long to do anything. Yeah.
Starting point is 00:24:22 Now that I've open pilot is migrating to C but it still has large Python components, I now understand why Python doesn't work for large code bases and why you want something like go. Interesting. So why why doesn't Python work for so even most speaking for myself at least like we do a lot of stuff basically demo level work with autonomous vehicles and most of the work is Python. Yeah. Why doesn't Python work for large-code basis? Because, well, lack of type checking is a big... So errors creep in.
Starting point is 00:24:55 Yeah, and like, you don't know, the compiler can tell you like nothing, right? So everything is either, you know, like syntax errors, fine, but if you misspell a variable on Python, the compiler won't catch that. There's like linteres that can catch it some other time. There's no types. This is really the biggest downside, and then we'll Python slow, but that's not related to it. Well, maybe it's kind of related to it. So it's like, so what's what's in your toolbox these days? There's a Python, what else? Go. I need to move to something else.
Starting point is 00:25:26 My adventure into dependently typed languages. I love these languages. They just have like syntax from the 80s. What do you think about JavaScript? ES6, like the modern type script? JavaScript is the whole ecosystem is unbelievably confusing. NPM updates a package from 022 to 025 and that breaks your babble-linter, which translates your ES5 into ES6, which doesn't run on.
Starting point is 00:25:55 So why do I have to compile my JavaScript again, huh? Maybe the future though. You think about, I mean, I've embraced JavaScript recently, just because just like I've continued to embrace PHP, it seems that these worst possible languages live on for a long, is that cockroaches never die. Yeah.
Starting point is 00:26:15 Well, it's in the browser and it's fast. It's fast. Yeah. It's in the browser and compute might become, you know, the browser. It's unclear what the role of the browsers in terms of distributed computation in the future. So. JavaScript is definitely here to stay.
Starting point is 00:26:31 Yeah. It's interesting if Atom's vehicles will run on JavaScript one day. I mean, you have to consider these possibilities. Well, all our debug tools are JavaScript. We actually just open source them. We have a tool explorer, which you can annotate your disengagement,
Starting point is 00:26:45 and we have a tool cabana, which lets you analyze the canned traffic from the car. So basically anytime you're visualizing something about the log you're using JavaScript. Well, the web is the best UI toolkit by far. So, and then you're going to JavaScript, we have a React guy. He's good. For React, he's nice. Let's get into it. So let's talk a ton of vehicles. You found it comma AI Let's at a high level How did you get into the world of vehicle automation? Can you also just for people who don't know tell the story of comma AI?
Starting point is 00:27:19 So I was working at this Hey, I start up and uh, a friend approached me and he's like, dude, I don't know where this is going, but the coolest applied AI problem today is self-driving cars. I'm like, well, absolutely. You want to meet with Elon Musk and he's looking for somebody to build a vision system for autopilot. This is when there was still on AP1. They were still using mobile eye. Elon back then was looking for a replacement. And he brought me in and we talked about a
Starting point is 00:27:53 contract where I would deliver something that meets mobile eye level performance. I would get paid $12 million if I could deliver it tomorrow. And I would lose $1 million for every month I didn't deliver. So I was like, okay, this is a great deal. This is a super exciting challenge. You know what, even if it takes me 10 months, I get two million dollars. It's good. Maybe I can finish up in five. Maybe I don't finish it at all.
Starting point is 00:28:15 And I get paid nothing and I'll work for 12 months before I. So maybe just take a pause on that. I'm also curious about this because I've been working robotics for a long time. And I'm curious to see a person like you just step in and sort of somewhat naive, but brilliant, right? So that's the best place to be because you basically full steam take on a problem. How confident, how, from that time, because you know, a lot more now at that time, how hard do you think it is to solve all of the time it was driving?
Starting point is 00:28:42 I remember I suggested to Elon in the meeting, I'm putting a GPU behind each camera to keep the compute local. This is an incredibly stupid idea. I leave the meeting 10 minutes later, and I'm like, I could've spent a little bit of time thinking about this problem before I went in. Why is it stupid idea?
Starting point is 00:29:00 Oh, just send all your cameras to one big GPU. You're much better off doing that. Oh, sorry, you said behind every camera between you. You're much better off doing that. Oh, sorry. You said behind every camera that you view. Have a small GPU. I was like, oh, I'll put the first few layers of my com there. Like, why'd I say that?
Starting point is 00:29:12 That's possible. It's possible, but it's a bad idea. It's not obviously a bad idea. Pretty obviously bad. But whether it's actually a bad idea or not, I left that meeting with Elon, like beating myself up. I'm like, why'd I say something stupid? Yeah, you haven't, like you haven't,
Starting point is 00:29:25 at least thought through every aspect, he's very sharp too. Like usually in life, I get away with saying stupid things and then kind of course, right away, you call me out about it. And like usually in life, I get away with saying stupid things and then like people will, you know, a lot of times people don't even notice.
Starting point is 00:29:42 And I'll like correct it and bring the conversation back. But with Elon, it like nope like okay, well That's not at all why the contract fell through I was much more prepared the second time I met him Yeah, but in general how how hard did you think? It it like 12 months is uh, oh is a tough timeline. Oh, I just thought I clone mobile I IQ 3 I didn't think I solved level 5 self-driving or anything So the goal there was to do lanekeeping, good lanekeeping. I saw my friend showed me the outputs from a mobileI. And the outputs from a mobileI was just basically two lanes
Starting point is 00:30:14 and a position of a lead car. I can gather a data set and train this net in weeks. And I did. Well, first time I tried the implementation of MobileI and the Tesla was really surprised how good it is. It's quite incredibly good. Because I thought it's just because I've done a lot of computer vision, I thought it'd be a lot harder
Starting point is 00:30:34 to create a system that that's stable. So I was personally surprised. It's, you know, I have to admit it, because I was kind of skeptical before trying it. Because I thought it would go in and out a lot more. It would get disengaged a lot more. And it's pretty robust. So what how how how how hard is the problem when you to when you tackle it? So I think AP one was great. Like Elon talked about disengagement on the 405 down in L.A. with the lane marks
Starting point is 00:31:06 are kind of faded and the mobile eye system would drop out. I had something up and working that I would say was the same quality in three months. Same quality, but how do you know? You say stuff like that, yeah, but you can't and I love it, but, uh, well, the question is, you can't, you're kind of going by feel because you just feel absolutely, absolutely. Like, like, I would take, I had my partner, friend's Tesla, yeah, I would take AP one out for a drive,
Starting point is 00:31:36 yeah, and then I would take my system out for a drive and seems reasonably like the same. So the 405, how hard is it to create something that could actually be a product that's deployed? I mean, I've read an article where Elon just responded, said something about you saying that to build autopilot is more complicated than a single George Hodge level job. How hard is that job to create something that would work across the globally? What do the global is the challenge? But Elon followed that up by saying it's going to take two years in a company of 10 people.
Starting point is 00:32:21 And here I am four years later with a company of 12 people. And I think we still have another two to go. Two years. So, yeah. So, what do you think about how test is progressing with autopilot of V2, V3? I think we've kept pace with them pretty well. I think Navigate and autopilot is terrible.
Starting point is 00:32:43 We had some demo features internally of the same stuff, and we would test it and I'm like, I'm not shipping this even as open source software to people. What do you think is terrible? Consumer Reports does a great job of describing it. When it makes a lane change, it does it worse than a human. You shouldn't ship things like Autopilot, Open Pilot,
Starting point is 00:33:03 they lane keep better than a human. If you turn it on for a stretch of highway like an hour long, it's never going to touch a lane line. Human will touch probably a lane line twice. You just inspired me. I don't know if you're grounded in date on that. I read your paper. Okay, but that's interesting. I wonder actually how often we touch lane lines in general, like a little bit, because it is, I can answer that question pretty easily with the common data set.
Starting point is 00:33:32 Yeah, I'm curious. I've never answered it. I don't know. I just too was like my personal. It feels right. Well, that's interesting, because every time you touch a lane, that's the source of a little bit of stress
Starting point is 00:33:43 and kind of lane keeping is removing that stress. That's ultimately the biggest value add honestly is just removing the stress of having to stay in lane. And I think honestly, I don't think people fully realize, first of all, that that's a big value add, but also that that's all it is. And that not only, I find it a huge value value add I drove down when we moved to San Diego I drove down in an enterprise or anta car and I missed it so I missed having the system so much It it's so much more tiring yeah to drive without it. It's it is that lane centering that's the key feature. Yeah
Starting point is 00:34:22 And in a way, it's the only feature that actually adds value to people's lives in a ton of vehicles today. Waymo does not add value to people's lives. It's a more expensive, slower, slower Uber. Maybe someday it'll be this big cliff where it adds value, but I don't usually do. Yeah, it's fascinating. I haven't talked to this. This is good.
Starting point is 00:34:38 Yeah. I haven't, I haven't intuitively, but I think we're making it explicit now. I actually believe that really good lanekeeping is a reason to buy a car. We'll be a reason to buy a car. It's a huge value ad. I've never, until we just started talking about it, I haven't really quite realized it that I've felt with Elon's chase of level four is not the correct chase. It was on, because you should just say Tesla has the best,
Starting point is 00:35:12 as if from a Tesla perspective, say Tesla has the best lanekeeping, Kameiai should say, Kameiai is the best lanekeeping, and that is it. Yeah, yeah. Do you think? You have to do the longitudinal as well. You can't just lane keep. You have to do ACC,
Starting point is 00:35:29 but ACC is much more forgiving than lane keep, especially in the highway. By the way, are you coming eyes camera only, correct? No, we use the radar. We from the car, you're able to get the... We can do a camera only now. It's gotten to the point, but we leave the radar there as like a, it's fusion now. Okay, so let's maybe talk through some of the system specs on the hardware.
Starting point is 00:35:55 What's the hardware side of what you're providing? What's the capabilities on the software side with open pilot and so on? So open pilot, as the box that we sell that and so on. So open pilot. As the box that we sell that it runs on, it's a phone in a plastic case. It's nothing special. We sell it without the software. So you're like, you know, you buy the phone is just easy. It'll be easy set up, but it's sold with no software. Open pilot right now is about to be 0.6.
Starting point is 00:36:23 When it gets to 1.0, I think we'll be ready for a consumer product. We're not gonna add any new features We're just gonna make the lane keeping really really good So what do we have right now? It's a snapdragon 820 It's a Sony IMX 298 forward-facing camera driver monitoring camera is just a selfie cam on the phone and camera driver monitoring camera is just a selfie cam on the phone and Can transceiver a little thing called pandas and they talk over USB to the phone and then they have three can buses that they talk to the car One of those can buses is the radar can bus one of them is the main car can bus and the other one is the proxy camera can bus We We leave the existing camera in place so we don't turn AEB off. Right now we still turn AEB off
Starting point is 00:37:08 if you're using our longitudinal but we're gonna fix that before 1.0. You got it. Wow that's cool so in its can both ways so how are you able to control vehicles? So we proxy the vehicles that we work with already have a lane keeping assist system. So lane keeping assist can mean a huge variety of things. It can mean it will apply a small torque to the wheel after you've already crossed a lane line by a foot, which is the system in the older Toyota's versus like, I think Tesla still calls it lane keeping assist where it'll keep you perfectly in the center of the lane
Starting point is 00:37:46 on the highway. You can control, like you would enjoy the cars. So these cars already have the capability of drive by car. So is it trivial to convert a car that it operates with, it open pile is able to control the steering? Oh, a new car or a car that we, so we have support now for 45 different makes of cars. What are what are the cars, General?
Starting point is 00:38:11 Mostly Honda's and Toyota's. We support almost every Honda and Toyota made this year. And then a bunch of GM's, a bunch of Subaru's, bunch of gen. It just has to be like a pre-s, it could be Corolla as well. Oh, okay. The 2020 Corolla is the best car with open pilot. It just came out there,
Starting point is 00:38:28 the actuator has less lag than the older Corolla. I think I started watching a video with you. I mean, the way you make videos is awesome. You just literally hit the dealership streaming. It's kind of my friend to follow him, like by your stream for an hour. Yeah, and basically, like if stuff goes a little wrong, I'm like, by your stream for an hour? Yeah. And basically, like if stuff goes a little wrong, you just go with it.
Starting point is 00:38:49 Yeah, I love it. What's real? Yeah, it's real. That's so beautiful and it's so in contrast to the way other companies would put together a video like that. How do I like to do it like that? Good. I mean, if you become super rich one day in success,
Starting point is 00:39:06 I hope you keep it that way because I think that's actually what people love, that kind of genuine. Oh, it's all that has value to me. Yeah. Money has no, if I sell out to like make money, I sold out, it doesn't matter. But what do I get? Yeah, I don't like that.
Starting point is 00:39:21 And I think Tesla has actually has a small inkling of that as well with autonomy day. They did reveal more than, of course, this marketing communications, you can tell, but it's more than most companies would reveal, which is I hope they go towards their direction more other companies GM Ford. Oh, Tesla. Tesla is going to win level five. They really are. So let's talk about it.
Starting point is 00:39:44 You think the year focused on level five. They really are. So let's talk about it. You think, uh, you're focused on level two. Uh, currently, currently. We're gonna be one to two years behind Tesla, getting to level five. Okay. Hey, we're Android, right? We're Android. You're Android.
Starting point is 00:39:57 I'm just saying, once Tesla gets it, we're one to two years behind. I'm not making any time one on when Tesla is gonna get it. That's right. You did. That was brilliant. I'm sorry, Tesla investors. If you think you're going to have an autonomous robo taxi fleet by the end of the year. Yes, that's all I'll bet against that.
Starting point is 00:40:11 So what do you think about this? The most level four companies are just doing their usual safety driver doing full-time with kind of testing. And then Tesla does basically trying to go from lane keeping to full autonomy. What do you think about that approach? How successful would it be? It's a ton better approach.
Starting point is 00:40:37 Because Tesla is gathering data on a scale that none of them are. They're putting real users behind the behind the wheel of the cars. It's I think the only strategy that works. The incremental. Well, so there's a few components to test approach. That's more than just the incremental. As we spoke with is the one is the software. So over the air air software updates. Necessity. I mean, Waymo had cruise have those two. Those aren't. But those differentiated things from the automakers. It's necessity. I mean, Waymo had cruise have those too, those aren't. But no. It was different, Jay, everything from the automakers. No lanekeeping assist systems have, no cars will be keeping system have that except Tesla.
Starting point is 00:41:12 And the other one is the data, the other direction, which is the ability to query the data. I don't think they're actually collecting as much data as people think, but the ability to turn on collection and turn it off. So I'm both in the robotics world and the psychology human factors world. Many people believe that level to autonomy is problematic because of the human factor. Like the more the task is automated, the more there's a vigilance document, you start to fall asleep, you start to become complacent, start texting more and so on. Do you worry about that? Because if you're talking about transition from linkkeeping to full autonomy, if you're spending 80% of the time not supervising the machine, do you worry about what that means to the 60 other drivers?
Starting point is 00:42:03 One, we don't consider open pilot to be 1.0 until we have 100% driver monitoring. You can cheat right now, our driver monitoring system. There's a few ways to cheat it. They're pretty obvious. We're working on making that better. Before we ship a consumer product that can drive cars, I want to make sure that I have driver monitoring
Starting point is 00:42:20 that you can't cheat. What's like a successful driver monitoring system look like? Is it all, is it all a bunch? Keep your eyes on the road. Well, a few things. So that's what we went with at first for driver monitoring. I'm checking, I'm actually looking at where your head is looking. The camera's not that high resolution eyes are a little bit hard to get.
Starting point is 00:42:38 Well, head is this big. I mean, head is good. And actually a lot of it just, it's psychology wise to have that monitor constantly. It reminds you that you have to be paying attention, but we want to go further, we just hired someone full time to come on to do the driver monitoring. I wanted to detect phone in frame and I want to make sure you're not sleeping. How much does the camera see of the body?
Starting point is 00:43:01 This one not enough. Not enough. The next one, everything. Well, it's interesting, Fish Agues. We're doing just data collection, not real time, but Fish Agues is a beautiful being able to capture the body. And the smartphone is really the biggest problem. I'll show you one of the pictures
Starting point is 00:43:21 from our new system. Awesome. You're basically saying the driver monitoring will be the answer to that. I think the other point that you're raising your papers is good as well. You're not asking the human to supervise a machine without giving them the, they can take over at any time. Right. How our safety model you can take over, we disengage on both the gas or the break. We don't disengage on steering
Starting point is 00:43:45 I don't feel you have to but we just engage on gas or brake So it's very easy for you to take over and it's very easy for you to reengage that switching should be super cheap The cars that require even autopilot requires a double press That's almost I see how to like that Yeah, and then the cancel to cancel in autopilot, you either have to press cancel, which no one knows what that is. So they press the break. But a lot of times you don't have to, you want to press the break. You want to press the gas. So you should cancel and gas or wiggle the steering wheel, which is bad as well.
Starting point is 00:44:14 Wow. That's brilliant. Haven't heard anyone articulate that point. Oh, I think about. It's the, because I think, I think actually Tesla has done a better job than most automakers at making that frictionless, but you just described that it could be even better. I love SuperCruis as an experience once it's engaged. I don't know if you've used it, but getting the thing to try to engage. Yeah, I've used the super driven SuperCruis a lot. So what's your thoughts on the supercars system in general? Disengaged supercrues and it falls back to ACC.
Starting point is 00:44:49 So my car's still accelerating, it feels weird. Otherwise, when you actually have supercrues engaged on the highway, it is phenomenal. We bought that Cadillac, we just sold it, but we bought it just to experience this. And I wanted everyone in the office to be like, this is what we're striving to build. GM pioneering with the driver monitoring.
Starting point is 00:45:08 You like their driver monitoring system? It has some bugs. If there's a sun shining back here, it'll be blind to you. But overall, mostly, yeah. That's so cool that you know all this stuff. I don't often talk to people that, because it's such a rare car, unfortunately, currently. We bought one explicitly for that. We lost like 25K in the deprecation, but I feel like it's worth it.
Starting point is 00:45:33 I was very pleasantly surprised that GM system was so innovative. And really, it wasn't advertised much, wasn't talked about much. And I was nervous that it would die, that it would disappear. Well, I said they put it on the wrong car, they should have put it on the bolt and not some weird Cadillac that nobody bought. I think that's going to be, they're saying at least it's going to be into their entire fleet. So what do you think about, as long as we're on the driver monitoring, what do you think
Starting point is 00:46:03 about Elon Musk's claim that driver monitoring, what do you think about, you know, a must claim that driver monitoring is not needed? Normally, I love his claims. That one is stupid. That one is stupid and, you know, he's not going to have his level 5 fleet by the end of the year. Hopefully he's like, okay, I was wrong. I'm going to add driver monitoring.
Starting point is 00:46:23 Because when these systems get to the point that they're only messing up once every 1000 miles, you absolutely need driver monitoring. So let me play, cause I agree with you, but let me play devil's advocate. One possibility is that without driver monitoring, people are able to monitor self-regulate monitor themselves. So your idea is, I'm seeing all the people sleeping in tassels?
Starting point is 00:46:49 Uh, yeah. Well, I'm a little skeptical of all the people sleeping in tassels because I've stopped paying attention to that kind of stuff because I wanna see real data. There's too much glorified. It doesn't feel scientific to me. So I wanna to know, you know, how many people are really sleeping in Tesla's versus sleeping? I was driving here, sleep deprived in a car with no automation. I was falling asleep. I agree that it's hypey. It's just like, you know what? If you want to put
Starting point is 00:47:21 driver monitoring, I think I rented, my last autopilot experience was I ran it a Model 3 in March and drove it around. The wheel thing is annoying and the reason the wheel thing is annoying, we use the wheel thing as well, but we don't disengage on wheel. For Tesla, you have to touch the wheel just enough to trigger the torque sensor to tell it that you're there, but not enough as to disengage it, Which don't use it for two things. Don't disengage on wheel. You don't have to. That whole experience, wow, beautiful. But they all those elements, even if you don't have driver monitoring,
Starting point is 00:47:54 that whole experience needs to be better driver monitoring, uh, I think would make, I mean, I think supercruis is a better experience once it's engaged over autopilot. I think supercruis is a transition to engagement and disengagement are significantly worse. There's a tricky thing because if I were to criticize supercruises, it's a little too crude and I think like six seconds or something, if you look off road, they'll start warning you. It's a some ridiculously long period of time. And just the way, I think it's basically,
Starting point is 00:48:31 it's a binary, it should be adapted. Yeah, it needs to learn more about you, and you need to communicate what it sees about you more. Like, I'm not, you know, Tesla shows what it sees about the external world. It would be nice to supercruise with tell us what it sees about the internal world. It's even worse than that. You press the button to engage and it just says, supercruise unavailable.
Starting point is 00:48:52 Why? Why? Yeah. Transparency is good. We've renamed the driver monitoring packet to driver state. Driver state. We have car state packet, which has the state of the car and driver state packet, which has the driver.
Starting point is 00:49:07 So what is the estimate there B.A.C. What's B.A.C. Blood alcohol kind of. You think that's possible with computer vision? Absolutely. To me, it's an open question. I haven't looked into it too much. Actually, I seriously looked at the literature.
Starting point is 00:49:25 It's not obvious to me that from the eyes and so on, you can tell. You might need stuff from the car as well. Yeah. You might need how they're controlling the car, right? And that's fundamentally at the end of the day what you care about. But I think, especially when people are really drunk,
Starting point is 00:49:38 they're not controlling the car nearly as smoothly as they would look at them walking, right? They're the cars like an extension of the body. So I think you could totally detect. And if you could fix people who drunk distracted asleep, if you fix those three, yeah, that's huge. So what are the current limitations of open pilot?
Starting point is 00:49:54 What are the main problems that still need to be solved? We're hopefully fixing a few of them in 06. We're not as good as autopilot at stopgars. So if you're coming up to a red light at like 55, so it's the radar stopped car problem, which is responsible for two autopilot accidents, it's hard to differentiate a stopped car from a like sign post. Yeah, static object.
Starting point is 00:50:21 So you have to fuse, you have to do this visually. There's no way from the radar data to tell the difference. Maybe you can make a map, but I don't really believe in mapping at all anymore. Um, really, what? You don't believe in mapping. No. So you basically, the open pilot solution is saying, react to the environment. It's just like human doing beings.
Starting point is 00:50:40 And then eventually, when you want to do navigate on, uh, open Open Pilot I'll train the net to look at ways I'll run ways in the background. I'll train to come that away. Are you using GPS at all? We use it to ground truth. We use it to very carefully ground truth the paths We have a stack which can recover relative to 10 centimeters over one minute And then we use that to ground truth exactly where the car went in that local part of the environment, but it's all local How are you testing in general just just for yourself, like experiments and stuff? Where are you located? San Diego.
Starting point is 00:51:12 San Diego. Yeah. Okay. What's basically a driver on there, then collect some data and watch. We have a simulator now. And we have our simulator. It's really cool. Our simulator is not, it's not like a unity-based simulator. Our simulator lets us load in real estate. We can load
Starting point is 00:51:30 in a drive and simulate what the system would have done on the historical data. Ooh, nice. Interesting. So what, yeah. Right now, we're only using it for testing, but as soon as we start using it for training training That's it. That's all for testing What's your feeling about the real world versus simulation? Do you like simulation for training if this moves to training? So we have to distinguish two types of simulators right? There's a simulator that Like is completely fake. I could get my car to drive around in GTA Um, I feel that this kind of simulator is useless.
Starting point is 00:52:08 You're never, there's so many, my analogy here is like, okay, fine. You're not solving the computer vision problem, but you're solving the computer graphics problem. Right. And you don't think you can get very far by creating ultra realistic graphics. No, because you can create ultra realistic graphics of the road, now create ultra realistic behavioral models of the other cars. Oh, well, I'll just use myself driving. No, you won't. You need real, you need actual human behavior because that's what you're trying to learn. The driving does not have a spec.
Starting point is 00:52:42 The definition of driving is what humans do when they drive. Whatever Waymo does, I don't think it's driving. Right. Well, I think actually Waymo and others, if there's any use for reinforcement learning, I've seen it used quite well. I studied pedestrian is a lot to, is try to train models from real data of how pedestrians move and try to use reinforcement learning
Starting point is 00:53:03 models to make pedestrians move in human-like ways. By that point, you've already gone so many layers. You detected a pedestrian. Did you hand code the feature vector of their state? Did you guys learn anything from computer vision before deep learning? Well, okay. I feel like this is a perception to you is this ticking point. I mean, what's the hardest part of this stack here?
Starting point is 00:53:30 There is no human understandable feature vector separating perception and planning. That's the best way I can put that. There is no, so it's all together and it's a joint problem. So you can take localization. Localization and planning, there is a human understandable feature vector between these two things. I mean, okay, so I have like three degrees position,
Starting point is 00:53:55 three degrees orientation and those derivatives, maybe those second derivatives, right? That's human understandable, that's physical. The between perception and planning. So like Waymo has a perception stack and then a planner. And one of the things Waymo does right is they have a simulator that can separate those two. They can like replay their perception data
Starting point is 00:54:19 and test their system, which is what I'm talking about like the two different kinds of simulators. There's the kind that can work on real data and this is the kind of can't work on real data. Now the problem is that I don't think you can hand code a feature vector, right? Like you have some list of like, well, here's my list of cars in the scenes. Here's my list of pedestrians in the scene.
Starting point is 00:54:37 This isn't what humans are doing. What are humans doing? Global. Some people are talking about. He's saying that's too difficult to hand a engineer. I'm saying that there is no state vector. Given a perfect, I could give you the best team of engineers in the world to build a perception system and the best team to build a planner.
Starting point is 00:54:57 All you have to do is define the state vector that separates those two. I'm missing the state vector that separates those two. What do you mean? So what is the output of your perception system? I'll put it the perception system. It's There's okay, well there's several ways to do it. One is the slammer bonus localization. The other is dri air, drivable space. Drivable space, yeah. And then there's the different objects in the scene. Yep. And different objects in the scene over time, maybe to give you input,
Starting point is 00:55:33 to then try to start modeling the trajectories of those objects. Sure. That's it. I can give you a concrete example of something you missed. What's that? So say there's a bush in the scene. Humans understand that when they see this bush that there may or may not be a car behind that bush.
Starting point is 00:55:51 Drivable area and a list of objects does not include that. Humans are doing this constantly at the simplest intersections. So now you have to talk about a clueded area. Right. Right? But even that, what do you mean by a clueded? Okay, so I can't see it
Starting point is 00:56:05 Well, if it's the other side of a house, I don't care. What's the likelihood that there's a car in that occluded area? Right and if you say, okay, we'll add that I can come up with 10 more examples that you can't add Certainly occluded area would be something that simulator would have because it's simulating the entire you know something that simulator would have because it's simulating the entire, you know, a occlusion is part of it. Occlusion part of a vision stack. But what I'm saying is if you have a hand engineered, if your perception system output can be written in a spec document, it is incomplete.
Starting point is 00:56:46 Yeah, I mean, I certainly it's, uh, it's hard to argue with that because in the end, that's going to be true. And I'll tell you what the output of our perception system is. What's that? It's a 1024 dimensional vector. Trans-bnderally. Oh, you're on that. No, it's 1024 dimensions of who knows what? Because it's operating on real data.
Starting point is 00:57:01 Yeah. And that's the perception. That's the perception state, right? Think about a, think about an autoencoder for faces, right? If you have an autoencoder for faces and you say it has 256 dimensions in the middle, and I'm taking a face over here and projecting it to a face over here. Yeah.
Starting point is 00:57:19 Can you hand label all 256 of those dimensions? Well, no, but those are generally automatically, but I, but even if you tried to do it by hand, could you come up with a spec for your end between your encoder and your decoder? No, no, because that's how it was in design, but they're no, but if you could design it, if you could design a face reconstructor system, could you come up with a spec? If you could design a face reconstructor system, could you come up with a spec?
Starting point is 00:57:51 No, but I think we're missing here a little bit. I think that you're just being very poetic about expressing a fundamental problem of simulators, that they are going to be missing so much that the feature of actually would just look fundamentally different from in the simulated world than the real world I'm not making a claim about simulators. I'm making a claim about the Spec division between perception and planning and planning even in your system Just in general right just in general if you're trying to build a car that drives If you're trying to hand code the output of your perception system, like saying, like, here's a list of all the cars in the scene, here's a list of all the people, here's a list of the included areas, here's a vector of drivable areas, it's insufficient.
Starting point is 00:58:33 And if you start to believe that, you realize that what Waymo and Cruiser doing is impossible. Currently, what we're doing is the perception problem is converting the scene into a chess board. And then you reason some basic reasoning around that chess board. And you're saying that really there's a lot missing there. First of all, why are we talking about this? Because isn't this a full autonomy? Is this something you think about? Oh, I want to win self driving cars. So you're definition of win includes level four, five. Level five. I don't think level four is a real thing. I want to build the AlphaGo of driving.
Starting point is 00:59:17 So AlphaGo is really end to end. Yeah. Is, yeah, it's end to end. Yeah. It's end to end. And do you think this whole problem, is that also kind of what you're getting at with the perception and the planning, is that this whole problem, the right way to do it,
Starting point is 00:59:35 is really to learn the entire thing. I'll argue that not only is it the right way, it's the only way that's going to exceed human performance. Well, it's certainly true for go. Everyone who tried to handcode go things built human inferior things. And then someone came along and wrote some 10,000 line thing that doesn't know anything about go that beat everybody. It's 10,000 lines.
Starting point is 00:59:57 True, in that sense, the open question then, that maybe I can ask you is driving is much harder than go. The open question is how much harder? So how, because I think the Elon Musk approach here with planning and perception is similar to what you're describing, which is really turning into not some kind of modular thing, but really do formulate this learning problem and solve the learning problem with scale. So how many years, how many years would it take to solve this problem or just how hard is this freaking problem? Well, the cool thing is, I think there's a lot of value that we can deliver along the way. I think that
Starting point is 01:00:48 you can build lane keeping assist, actually plus adaptive cruise control, plus, okay, looking at ways, extends to like all of driving. Yeah, most of driving. Oh, your adaptive cruise control treats red lights like cars. Okay. So, let's jump around. You mentioned that you didn't like navigate an autopilot. What advice, how would you make it better?
Starting point is 01:01:14 Do you think as a feature that, if it's done really well, it's a good feature? I think that it's too reliant on, like, hand-coded hacks for, like, how does navigate an autopilot do a lane change? It actually does the same lane change every time and it feels mechanical. Humans do different lane changes. Humans sometime will do a slow one, sometimes do a fast one. Navigate an autopilot, at least every time I use it, it is the identical lane change. How do you learn, I mean this is a fundamental thing actually is the breaking and an accelerating, something that's still, test it probably does it better than most cars, but it still doesn't
Starting point is 01:01:52 do a great job of creating a comfortable, natural experience and navigate an autopilot is just lane changes and extension of that. So how do you learn to do natural lane change? So we have it, and I can talk about how it works. So I feel that we have the solution for lateral, but we don't yet have the solution for longitudinal. There's a few reasons longitudinal is harder than lateral. The lane change component, the way that we train on it very simply is like our model has
Starting point is 01:02:26 an input for whether it's doing a lane change or not. And then when we train the end-to-end model, we hand label all the lane changes because you have to. I've struggled a long time about not wanting to do that, but I think you have to because you order the training data. For the training data, right? Oh, we actually, we have an automatic ground truth or which automatically labels all the lane changes? Was that possible to automatically label the lane changes? Yeah, I just had to take the lane I see when I cross this right and I don't have to get that high percent accuracy But it's like 95 good enough. Okay. Now I set the bit When it's doing the lane change in the end to end learning and then I said it to zero when it's not doing lane change
Starting point is 01:03:04 So now if I want us to do a lane change a test time, I just put the bit to to zero when it's not doing lane change. So now if I want us to do a lane change of test time, I just put the bit to a one and it'll do a lane change. Yeah, but so if you look at the space of lane change, you know, some percentage, not 100% that we make as humans is not a pleasant experience because we mess some part of it up. Yeah, it's nerve-wracking to change. If you look, test, the accelerator, how do we label the ones that are natural and feel good? You know, that's the, because that's your ultimate criticism, the current navigate and autopilot just doesn't feel good. Well, the current navigate and autopilot is a hand-coded policy written by an engineer in a room who probably went out and tested it a few times on the 280. Probably a more, a better version of that, but yes. That's how we would have written
Starting point is 01:03:47 it. Yeah. Maybe Tesla Tesla they tested it in. I might have been two engineers. Yeah. Um, no, but so if you learn the lane change, if you learn how to do a lane change from data, just like just like you have a label that says lane change and then you put it in when you want to do the lane change, it'll automatically do the lane change that's appropriate for the situation. Now, to get at the problem of some humans do bad lane changes, we haven't worked too much on this problem yet. It's not that much of a problem in practice. My theory is that all good drivers are good in the same way and all bad drivers are bad
Starting point is 01:04:23 in different ways. And we've seen some data to back this up. Well beautifully put. So you just basically if that's true, hypothesis, then you're tasked with to discover the good drivers. The good drivers stand out because they're in one cluster and the bad drivers are scattered all over the place and your netlens the cluster. Yeah, so you just learn from the good drivers and they're easy to cluster. We learn from all of them and then automatically learns the policy that's like the majority,
Starting point is 01:04:53 but we'll eventually probably have to build some. If that theory is true, I hope it's true, because the counter theory is, there is many clusters, maybe arbitrarily many clusters of good drivers. Because if there's one cluster of good drivers, you can at least discover a set of policies. You can learn a set of policies, which would be good universally. That would be a nice, that would be nice if it's true. And you're saying that there is some evidence that let's say lane changes can be clustered into four clusters.
Starting point is 01:05:26 Right. There's this finite level of... I would argue that all four of those are good clusters. All the things that are random are noise and probably bad. And which one of the four you pick, or maybe it's 10, or maybe it's 20? You can learn that. It's context dependent. It depends on the scene.
Starting point is 01:05:43 And the hope is it's not too dependent on the driver. Yeah, the hope is that it all washes out. The hope is that the distribution is not by modal. The hope is that it's a nice galsy. So what advice would you give to Tesla, how to fix, how to improve, navigate an autopilot? That's the lessons you've learned from Kamiai. The only real advice I would give to Tesla is please put driver monitoring in your cars. With respect to improving it. You can't do that anymore. I say I do interrupt, but you know,
Starting point is 01:06:14 there's a practical nature of many of hundreds of thousands of cars being produced that don't have a good driver facing camera. The Model 3 has a selfie cam. Is it not good enough? Did they not have put IRL IDs for night? That's a good question. But I do know that the is fisheye and it's relatively low resolution. So it's really not designed. It wasn't designed for time, model.
Starting point is 01:06:35 You can hope that you can kind of scrape up and have something from it. But why didn't they put it in today? Put it in today. Put it in today. Every time I've heard Carpathi talk about the problem and talking about life software 2.0 and how the machine learning is gobbling up everything, I think this is absolutely the right strategy. I think that he didn't write and have a date on autopilot. I think somebody else did. And kind of hacked it on top of that stuff. I think when Carpathi
Starting point is 01:07:00 says, wait a second, why did we hand code this Lanchion policy with all these magic numbers? We're going to learn it from data. They'll fix it. They already know what to do there. Well, that's that's Andre's job is to turn everything into a learning problem and collect a huge amount of data. The reality is though, not every problem can be turned into a learning problem in the short term. In the end, everything will be a learning problem. The reality is, In the end, everything will be a learning problem. The reality is, like if you want to build L5 vehicles today, it will likely involve no learning. And that's the reality.
Starting point is 01:07:34 So at which point does learning start? It's the crutch statement that light hours are crutch. Which point will learning get up to part of human performance? It's over human performance on ImageNet, classification, undriving is the question still. It is a question. I'll say this, I'm here to play for 10 years. I'm not here to try to, I'm here to play for 10 years and make money along the way.
Starting point is 01:07:59 I'm not here to try to promise people that I'm going to have my L5 taxi network up and working in two years. Do you think that was a mistake? Yes. What do you think was the motivation behind saying that other companies are also promising L5 vehicles with their different approaches in 2020, 2021? 2022. If anybody would like to bet me that those things do not pan out, I will bet you. Even money?
Starting point is 01:08:24 Even money? I'll bet you as much as you want. So, are you worried about what's going to happen? Because you're not in full agreement on that. What's going to happen when 2020, 2021, come around and nobody has fleets of autonomous vehicles? Well, you can look at the history. If you go back five years ago, they were all promised by 2018 and 2017.
Starting point is 01:08:46 But they weren't that strong of promises. I mean, Ford really declared pretty, I think not many have declared as, as like, definitively as they have now these dates. Well, okay, so let's separate L4 and L5. Do I think that it's possible for Waymo to continue to kind of like hack on their system until it gets to level 4 in Chandler, Arizona? Yes. When you don't know safety driver? Chandler, Arizona? Yeah.
Starting point is 01:09:15 By, what, sorry, which year are we talking about? Oh, I even think that's possible by like 2020, 2021. But level 4, Chandler, Arizona, not level five, New York City. Level four, meaning some very defined streets. Okay, works out really well. Very defined streets. And then practically, these streets are pretty empty. If most of the streets are covered in Waymo's, Waymo can kind of change the definition of what driving is. Right? If your self-driving network change the definition of what driving is. If your self-driving network is the majority of cars in an area, they only need to be safe
Starting point is 01:09:51 with respect to each other and all the humans will need to learn to adapt to them. Now go drive in downtown New York. Oh yeah, that's, I mean, already, you can talk about autonomy and like, like, fun farms, it already works great because you can really just follow the GPS line. So what does success look like for Kamei?
Starting point is 01:10:12 What, what are the milestones like where you can sit back with some champagne and say, we did boys and girls? Well, it's never over. Yeah, but it was drink champagne. Sure. What is a good, what are some wins? A big milestone that we're hoping for by mid-next year is profitability of the company. And we're going to have to revisit the idea of selling a consumer product.
Starting point is 01:10:46 But it's not going to be like the comma one. When we do it, it's going to be perfect. Open Pilot has gotten so much better in the last two years. We're going to have a few few feature. We're going to have a hundred percent driver monitoring. We're going to disable no safety features in the car. Actually, I think it'd be really cool what we're doing right now. Our project this week is where analyzing the data set and looking for all the AEP triggers
Starting point is 01:11:09 from the manufacturer systems. We have better data set on that than the manufacturers. How much does Toyota have 10 million miles of real-world driving to know how many times they're AEP triggered? So let me give you because you asked financial advice. Yeah. Because I work with a lot of automakers. And one possible source of money for you, which I'll be
Starting point is 01:11:32 excited to see you take on, is basically selling the data. So which is something that most people in not selling in a way where here, here at Automaker, but creating, we've done this actually at MIT, not for money purposes, but you could do it for significant money purposes and make the world a better place by creating a consortia where automakers would pay in and then they get to have free access to the data. And I think a lot of people are really hungry for that and we're basically gonna amount of money for it. Here's the problem with that.
Starting point is 01:12:12 I like this idea, all in theory, it'd be very easy for me to give them access to my servers and we already have all open source tools to access this data, it's in a great format, we have a great pipeline, but they're gonna put me in the room with some business development guy. And I'm gonna have to talk to this guy
Starting point is 01:12:29 and he's not gonna know most of the words I'm saying. I don't want to tolerate that. Okay, Mick Jagger. No, no, no, no. What I mean. I think I agree with you. I'm the same way, but you just tell them the terms and there's no discussion needed.
Starting point is 01:12:41 If I could, if I could just tell them the terms. Yeah. And then like, all right, uh, who wants access to my data, I will sell it to you for, uh, let's say, you want to, you want to subscribe, I'll sell it for 100k a month. Anyone? 100k month. 100k month. I'll give you access to this data subscription. Yeah. Um, yeah, I think that's kind of fair. Came up with that number off the top of my head. If somebody sends me like a three line email where it's like we would like to pay 100 came month to get access to your data, we would agree to like reasonable privacy terms of people who are in the data set.
Starting point is 01:13:14 I would be happy to do it, but that's not going to be the email. The email is going to be, hey, do you have some time in the next month where we can sit down and we can, I don't have time for that. We're moving too fast. Yeah. You could politely respond to that email, but not can, I don't have time for that. We're moving too fast. Yeah. You could politely respond to that email, but not saying, I don't have any time for your bullshit. Yeah.
Starting point is 01:13:30 You say, oh, well, unfortunately, these are the terms. And so this is what we try to, we brought the cost down for you in order to minimize the friction and communication. Yeah, absolutely. Here's the, whatever it is, one, two million years, dollars a year, and you have access. And it's not like I get that email from like, but okay, am I gonna reach out? Am I gonna hire a business development person? Is gonna reach out to the automakers?
Starting point is 01:13:52 No way. Yeah, okay. I've got you. I think they reached into me. I'm not gonna ignore the email. I'll come back with something like, yeah, if you're willing to pay 100K monthly for access to the data, I'm happy to set that up.
Starting point is 01:14:02 That's worth my engineering time. That's actually quite insightful of you. You're right. Probably because many of the automakers are quite a bit old school, there will be need to reach out and they want it, but they don't need to be some communication. You're right. Mobileye Circuit 2015 had the lowest R&D spend of any chip maker. Like per per, and you look at all the people who work for them,
Starting point is 01:14:27 and it's all business development people because the car companies are impossible to work with. Yeah, so you have no patience for that, and you're legit Android, huh? I have something to do, right? Like, it's not like, I don't mean to be a dick and say like, I don't have patience for that, but it's like, that stuff doesn't help us
Starting point is 01:14:44 with our goal of winning self-driving cars. If I want money in the short term, if I showed off like the actual, like the learning tech that we have, it's somewhat sad, like, it's years and years ahead of everybody else's. Maybe not Tesla's. I think Tesla has similar stuff to us actually.
Starting point is 01:15:02 Yeah. I think Tesla's similar stuff, but when you compare it to like what the Toyota Research Institute has, you're not even close to what we have. No comments, but I also can't, I have to take your comments, I intuitively believe you, but I have to take it with a grain of salt because I mean, you are an inspiration because you basically don't care about a lot of things
Starting point is 01:15:25 that other companies care about. You don't try to bullshit in a sense like makeup stuff so to drive a valuation, you're really very real and you're trying to solve the problem and admire that a lot. What I don't necessarily fully can't trust you on or don't trust, I can't trust. I can't prove it.
Starting point is 01:15:44 I can only, but I also know how bad others are and so I'll say two I'll say two things about don't trust but verify right I'll say two things about that one is try get in a 2020 Corolla and try open pilot 0.6 when it comes out next month I think already you'll look at this and you'll be like, this is already really good. Then I could be doing that all with hand lablers and all with the same approach that mobile I use this. When we release a model that no longer has the lanes in it,
Starting point is 01:16:17 that only outputs a path. Then think about how we did that machine learning, and then right away when you see, and that's going to be an open pilot. That's going to be an open pilot before 1.0. Then think about how we did that machine learning, and then right away when you see, and that's gonna be an open pilot, that's gonna be an open pilot before 1.0. When you see that model, you'll know that everything I'm saying is true,
Starting point is 01:16:31 because how else did I get that model? Good, one of the things too about the simulator. Yeah, yeah, yeah, the super exciting, that's super exciting. And, but like, I listened to your talk with Kyle, and Kyle was originally building the aftermarket system. And he gave up on it because of technical challenges. Because of the fact that he's going to have to support 20 to 50 cars, we support 45.
Starting point is 01:16:57 Because what is he going to do when the manufacturer ABS system triggers? We have alerts and warnings to deal with all of that and all the cars. And how is he going to formally verify it? Well, I got 10 million miles of data. It's probably better, it's probably better verified than the spec.
Starting point is 01:17:09 Yeah, I'm glad you're here talking to me. This is, I'll remember this day. It's interesting, if you look at Kyle's from Cruz, I'm sure they have a large number of business development folks. And he's working with GM. You could work business development folks and you work with G.M. You could work with R.G.A.I. you work with Ford. It's interesting because chances that you fail, business wise, like bankrupt are pretty high.
Starting point is 01:17:37 And yet, it's the Android model, as you're actually taking on the problem. So that's really inspiring. I mean, I have a long-term way for comedy to make money too. And one of the nice things when you really take on the problem, which is my hope for autopilot, for example, is things you don't expect, ways to make money, or create value that you don't expect will pop up.
Starting point is 01:18:00 Oh, I've known how to do it since kind of 2017 is the first time I set it. Which part to not do how to do what's part? Our long term plan is to be a car insurance company. Insurance. Yeah, I love it. Yep. I make driving twice as safe. Not only that, I have the best data such as who statistically is the safest drivers.
Starting point is 01:18:16 And oh, we see you. We see you driving unsafely. We're not going to ensure you. And that causes a bifurcation in the market, because the only people who can't get common insurance or the bad drivers, Geico can ensure them, their premiums are crazy high, our premiums are crazy low. Well, win car insurance, take over that whole market. Okay, so if we win, if we win, but that's I'm saying, how do you turn comma into a $10
Starting point is 01:18:40 billion company? That's right. So you, Elon Musk, who else is thinking like this and working like this in you view? Who are the competitors? Are there people seriously, I don't think anyone that I'm aware of is seriously taking on lane keeping, you know, like to where is a huge business that turns eventually into full autonomy that then creates, yeah, that creates other businesses on top of it and so on, things insurance, things all kinds
Starting point is 01:19:11 of ideas like that. Do you know who, anyone else think like this? Not really. That's interesting. I mean, my sense is everybody turns to that in like four or five years, like Ford wants to the autonomy, doesn't fall through, but at this time. Elon's the iOS. By the way, he paved the way for all of us. I would not be doing comma AI today,
Starting point is 01:19:37 if it was not for those conversations with Elon. And if it were not for him saying, like, I think he said, like, well, obviously, we're not going use Lidar, we use cameras, humans use cameras. So what do you think about that? How important is Lidar? Everybody else is un-elfized using Lidar.
Starting point is 01:19:53 What are your thoughts on his provocative statement that Lidar is a crutch? See, sometimes he'll say dumb things, like the driver monitoring thing, but sometimes he'll say absolutely, completely 100% obviously true things. Of course, Lidar is a crutch. It's not even a good crutch.
Starting point is 01:20:09 You're not even using it, oh my, they're using it for localization. Yeah. Which isn't good in the first place. If you have to localize your car to centimeters in order to drive, like, yeah, they're not driving. Currently not doing much machine learning, I felt polite at our data, meaning like to help you in the task
Starting point is 01:20:26 of general task of perception. The main goal of those lidars on those cars, I think is actually localization more than perception. Or at least that's what they used them for. Yeah, that's true. If you want to localize to centimeters, you can't use GPS. The fans use GPS and the world can't do it.
Starting point is 01:20:41 Especially if you're under tree covering stuff. The lidar you can do is pretty easily. So you really, they're not taking on, I mean, in some research, they're using it for perception, but and they're certainly not, which said, they're not fusing it well with vision. They do use it for perception. I'm not saying they don't use it for perception, but the thing that they have vision-based and radar-based perception systems as well. You could remove the lidar and keep around a lot of the dynamic object perception. You want to get centimeter accurate localization? Good luck doing that with anything else. So what should Cruz, Waymo do?
Starting point is 01:21:19 What would you be your advice to them now? I mean, Waymo is actually, there's serious, I mean, they're doing, they're serious. Waymo out of the ball of them, quite, so seriously about the long game. If, if, if L5 is a lot, is requires 50 years, I think Waymo will be the only one left standing at the end with a, given the financial back in that they have. They're book Google box. I'll say nice things about both Waymo and Chris. Yeah, let's do it. Nice is good.
Starting point is 01:21:52 Waymo is by far the furthest along with technology. Waymo has a three to five year lead on all the competitors. If the Waymo looking stack works, maybe three year lead. If the Waymo looking stack works works, maybe three-year lead. If the WAMO looking stack works, they have a three-year lead. Now I argue that WAMO has spent too much money to recapitalize, gain back their losses in those three years. Also, self-driving cars have no network effect like that. Uber has a network effect.
Starting point is 01:22:21 You have a market. You have drivers and you have riders. Self-driving cars, you have capital and you have riders. There's no network effect. You have a market. You have drivers and you have riders. Self-driving cars, you have capital and you have riders. There's no network effect. If I want to blanket a new city in self-driving cars, I buy the off-the-shelf Chinese knockoff self-driving cars, and I buy enough of them in the city. I can't do that with drivers. And that's why Uber has a first mover advantage that no self-driving car company will. Can you uh, just think a little bit. Uber, you're not talking about Uber there, Thomas Vickler. You're talking about the Uber card. Yeah.
Starting point is 01:22:48 I open for business in Austin, Texas, let's see. I need to attract both sides of the market. I need to both get drivers on my platform and riders on my platform. And I need to keep them both sufficiently happy, right? Riders aren't going to use it if it takes more than five minutes for an Uber to show up. Drivers aren't going to use it if they have to sit around all day and there's no riders. So you have to carefully balance a market. And whenever you have to carefully balance a market, there's a great first mover advantage
Starting point is 01:23:14 because there's a switching cost for everybody, right? The drivers and the riders would have to switch at the same time. Let's even say that, let's say, Luber shows up and Luber somehow agrees to do things that are at a bigger, you know, we're just gonna, we've done it more efficiently, right? Luber is only takes 5% of a cot instead of the 10% that Uber takes. No one is gonna switch because the switching cost
Starting point is 01:23:40 is higher than that 5%. So you actually can, in markets like that, you have a first-mover advantage. Yeah. Autonomous vehicles of the level 5 variety have no first mover advantage. If the technology becomes commoditized, say I want to go to a new city, look at the scooters. It's going to look a lot more like scooters. Every person with a checkbook can blanket a city in scooters and that's why you have 10 different scooter companies. Which one's going to win? It's a race to the bottom. It's why you have 10 different scooter companies. Yeah. Which one's going to win? It's a race to the bottom.
Starting point is 01:24:06 It's a terrible market to begin because there's no market for scooters. And the scooters don't get a say in whether they want to be bought and deployed to a city or not. Right. So the, yeah, we're going to entice the scooters with subsidies and deals. So whenever you have to invest that capital, it doesn't, it doesn't come back. Yeah. They can't be your main criticism of the waymo approach. So whenever you have to invest that capital, it doesn't come back. Yeah. They can't be your main criticism of the Waymo approach.
Starting point is 01:24:29 Oh, I'm saying even if it does technically work. Even if it does technically work, that's a problem. I don't know. If I were to say, I would say, you're already there. I haven't even thought about that. But I would say the bigger challenge is the technical approach. So Waymo's cruise is not just the technical approach, but creating value. I still don't understand how you beat Uber, the human driven cars. In terms of financially, it doesn't make sense to me that people want to get an autonomous vehicle.
Starting point is 01:25:06 I don't understand how you make money. In the long term, yes, like real long term, but it just feels like there's too much capital investment needed. Oh, and they're going to be worse than Uber's because they're going to stop for every little thing everywhere. I say a nice thing about Cruz. That was my nice thing about way for three years at it. It was a nice thing about Waymo, the three years at him. Well, it was a nice, oh, because there are three years. That the three years technically ahead of everybody. Their tech stack is great.
Starting point is 01:25:31 My nice thing about Cruz is GM buying them was a great move for GM. For $1 billion, GM bought an insurance policy against Waymo. They put, Cruz is three years behind Waymo. They put cruise is three years behind Waymo. That means Google will get a monopoly on the technology for at most three years. And technology works. You might not even be right about the three years,
Starting point is 01:25:57 it might be less. Might be less. Cruise actually might not be that far behind. I don't know how much Waymo has waffled around or how much of it actually is just that long tail. Yeah, okay. If that's the bus you could say in terms of nice things. If that's more of a nice thing for GM, that's smart insurance policy. It's a smart insurance policy. I mean, I think that's how I can't see crews working out any other, for crews to leapfrog Waymo would really surprise me.
Starting point is 01:26:30 Yeah, so let's talk about the underlying assumptions of everything. Why are not going to leapfrog Tesla? Tesla would have to seriously mess up for us because you're a problem. Okay, so the way you leapfrog, right, is you come up with an idea or you take a direction, perhaps secretly, that the other people aren't taking. And so, Cruz, Weimo, even Aurora. I don't know, Aurora, Zooks is the same stack as well. They're all the same codebase even. And they're all the same DARPA Urban Challenge codebase. So, the question is, do you think there's a room
Starting point is 01:27:04 for brilliance and innovation that will change everything? Like, say, okay, so I'll give you examples. It could be if revolution in mapping, for example, that allow you to map things, do HD maps of the whole world, all weather conditions somehow really well, or revolution is simulation. To where the what you said before becomes incorrect. That kind of thing. I hear room for breakthrough innovation. What I said before about, oh, they actually get the whole thing well. I'll say this about, we divide driving into three problems and I actually haven't solved the third yet, but I haven't had yet how to do it.
Starting point is 01:27:51 So there's the static. The static driving problem is assuming you are the only car on the road, right? And this problem can be solved 100% with mapping and localization. This is why farms work the way they do. If all you have to deal with is the static problem and you can statically schedule your machines, right? It's the same as statically scheduling processes. You can statically schedule your tractors to never hit each other on their paths, right?
Starting point is 01:28:12 Because they know the speed they go at. So that's the static driving problem. Maps only helps you with the static driving problem. Yeah. The question about static driving, you just made it sound like it's really easy. Static driving is really easy. How easy? Well, because the whole drifting out of lane,
Starting point is 01:28:33 when Tesla drifts out of lane, it's failing on the fundamental static driving problem. Tesla is drifting out of lane. The static driving problem is not easy for the world. The static driving problem is easy for one route. One route and one weather condition with one state of lane markings and like no deterioration, no cracks in the road. I'm assuming you have a perfect localizer.
Starting point is 01:28:59 So that's solved for the weather condition and the lane marking condition. But that's the problem is how could you how do you have a perfect localizer? Perfect localizers. But that's the problem. How could you, how do you have a perfect localizer? You can build perfect localizers and not that hard to build. Okay, come on now with with with with with lighter. It's lighter, yeah, with lighter. Okay, lighter.
Starting point is 01:29:11 Yeah, but you use lighter, right? Like you use lighter build a perfect localizer. Building a perfect localizer without lighter. It's gonna be, it's gonna be hard. You can get 10 centimeters without lighter. You can get one centimeter with lighter. Maybe we can turn about the one or 10 centimeters. I'm concerned every once in a while you're just way off.
Starting point is 01:29:29 Yeah, so this is why you have to carefully make sure you're always tracking your position. You want to use light arc hammer fusion, but you can get the reliability of that system up to 100,000 miles and then you write some fallback condition where it's not that bad if you're way off right. I think that you can get it to the point it's like azol d that you're you're never in a case where you're way off and you don't know it. Yeah okay so this is brilliant so that's the static static. We can especially with LiDAR and good HTML's you can solve that problem. Easy. No. No it's a word easy.
Starting point is 01:30:06 No, it's a word easy. The static, static problem so easy. Very typical for you to say something is easy. I got it. It's not as challenging as the other ones. Okay. Well, it's okay. Maybe it's obvious how to solve it.
Starting point is 01:30:16 The third one's the hardest. Well, wait a minute. And a lot of people don't even think about the third one. And even see it as different from the second one. So the second one is dynamic. The second one is like, say there's an obvious example, it's like a car stopped at a red light. You can't have that car in your map, because you don't know
Starting point is 01:30:29 whether that car is going to be there or not. So you have to detect that car in real time, and then you have to do the appropriate action. Also, that car is not a fixed object. That car may move, and you have to predict what that car will do. So this is the dynamic problem. Yeah. not a fixed object, that car may move, and you have to predict what that car will do. Right? So this is the dynamic problem.
Starting point is 01:30:47 Yeah. So you have to deal with this. This involves, again, like you're going to need models of other people's behavior. Do you, are you including in that? I don't want to step on the third one. Oh, but are you including in that your influence and people?
Starting point is 01:31:04 That's the third one. OK. That's the third one. Okay, that's the third one. We call it the counterfactual. Yeah, brilliant. I just talked to GDAPRO who's obsessed with counterfactuals. I think that's not a factual. Oh yeah, yeah, good as books. So the static in the dynamic,
Starting point is 01:31:18 our approach right now for lateral will scale completely to the static in dynamic. The counterfactual, the only way I have to do it yet, the thing that I want to do once we have all these cars is I want to do reinforcement learning on the world. I'm always going to turn the exploiter up to Max, I'm not going to have them explore, but the only real way to get at the counterfactual is to do reinforcement learning because the other agents are humans. So, this fascinating, the break-in humans. So that's fascinating.
Starting point is 01:31:45 The break down like that. I agree completely. That my life thinking about this is beautiful. Uh, there's so and part of it because you're slightly insane. Because, um, not my life just the last four years. No, no, you have like some, some non zero percent of your brain has a madman in it, which is a really good feature. But there's a safety component to it that I think when it's sort of with counterfactuals
Starting point is 01:32:14 and so on, that would just freak people out. How do you even start to think about this in general? I mean, you've you've had some friction with Nitsa and so on. I am frankly exhausted by safety engineers, the prioritization on safety over innovation to a degree where it kills in my view, kills safety in the long term. So the counterfactual thing, they just actually exploring this world of how to interact with dynamic objects and so on, how do you think about safety? You can do reinforcement learning without ever exploring.
Starting point is 01:32:54 And I said that, so you can think about your reinforcement learning, it's usually called a temperature parameter. And your temperature parameter is how often you deviate from the argomax. I could always set that to zero and still learn. And I feel that you'd always want that set to zero on your actual system. Gotcha. But the problem is, you first don't know very much, and so you're going to make mistakes.
Starting point is 01:33:16 So the learning, the exploration happens through these things. We're all already? Yeah, but... Okay, so the consequences of a mistake. Yeah. Open pilot and autopilot are making mistakes left and right Yeah, we have we have we have 700 daily active users a thousand weekly active users open pilot makes tens of thousands of mistakes a week These mistakes have zero consequences. These mistakes are oh It it I wanted to take this exit and it went straight
Starting point is 01:33:43 So I'm just gonna carefully touch the wheel. The humans catch them. The humans catch them. And the human disengagement is labeling that reinforcement learning in a completely consequence freeway. So driver monitoring is the way you ensure they keep paying attention.
Starting point is 01:33:58 How's your messaging say, I gave you a billion dollars, you would be scaling it now. Oh, I couldn't scale it with any amount of money. I've raised money if I could, if I had a way to scale it. Yeah, you would be scaling it now. Oh, I fact it's guy couldn't scale with any amount of money. I've raised money if I could, if I had a way to scale it. Yeah, you're not, not focused on that. I don't know how to do. Oh, like I guess I could sell it to more people, but I want to make the system better. Better, better.
Starting point is 01:34:14 I don't know how to. But what's the messaging here? I got a chance to talk to Elon and, and, um, he, he basically said that the human fact that doesn't matter. You know fact that it doesn't matter. The human doesn't matter because the system will perform. There will be sort of a, sorry to use the term, but like a singular, like a point where it gets just much better, and so the human, it won't really matter. But it seems like that human catching the system when it gets into trouble
Starting point is 01:34:42 is like the thing which will make something like reinforcement learning work. So how do you think messaging for Tesla, for you, for the industry in general should change? I think my messaging's pretty clear, at least like our messaging wasn't that clear in the beginning and I do kind of fault myself for that. We are proud right now to be a level two system.
Starting point is 01:35:05 We are proud to be level two. If we talk about level four, it's not with the current hardware, it's not gonna be just a magical OTA upgrade, it's gonna be new hardware, it's gonna be very carefully thought out. Right now, we are proud to be level two, and we have a rigorous safety model.
Starting point is 01:35:20 I mean, not like like, okay, rigorous, who knows what that means, but we at least have a safety model. And we make it explicit, it's in safety.md and open pilot. And it says, seriously though, safety.md. Safety.md. This is brilliant. So, so, so, so, so, well, this is, this is, this is the safety model.
Starting point is 01:35:38 And I like to have conversations like, if, like, you know, sometimes people will come to you and they're like, your system's not safe. Okay, have you read my safety docs? Would you like to have an intelligent conversation about this? And the answer is always no. They just like scream about it runs Python. Okay, what? So you're saying that that because Python's not real time, Python not being real time never causes disengagement. Disengagement are caused by you know, the model is QM. But safety.md says the following. First and foremost, the driver must be paying attention at all times. I don't, I do, I still consider the software to be alpha software until we can actually enforce that statement,
Starting point is 01:36:16 but I feel it's very well communicated to our users. Two more things. One is the user must be able to easily take control of the vehicle at all times. So if you step on the gas or brake with open pilot, it gives full manual control back to the user or press the cancel button. Step two, the car will never react so quickly. We define so quickly to be about one second that you can't react in time. And we do this by enforcing torque limits, breaking limits, and acceleration limits. So we have our torque limits way lower than Tesla's. This is another potential.
Starting point is 01:36:55 If I could tweak autopilot, I would lower their torque limit and I would add driver monitoring. Because autopilot can jerk the wheel hard. Open pilot can't. It's we limit, and all this code is open source, readable. And I believe now it's all Mizra C compliant, too. What's that mean? Mizra is like the automotive coding standard. At first, I, you know, I've come to respect.
Starting point is 01:37:19 I've been reading like the standards lately. And I've come to respect them. They're actually written by very smart people. Yeah, they're brilliant people actually. They have a lot of experience. They're sometimes a little too cautious, but in this case, it pays off. Misers written by like computer scientists.
Starting point is 01:37:35 And you can tell by the language they use. You can tell by the language they use, they talk about like whether certain conditions in Misra are decidable or undecidable. And you mean like the halting problem and yes, well, all right, you've earned my respect. I will read carefully what you have to say and we want to make our code compliant with that. All right, so you're proud level two. Beautiful.
Starting point is 01:37:54 So you are the founder and I think CEO of Kamei, then you were the head of research. What the heck are you now? What's your connection to Kamei? I'm the president, but I'm one of those like unelected presidents of like like a small dictatorship country not one of those like elected presidents. Oh, so you're like Putin when he was like the yeah, I got you. So there's what's the governance structure?
Starting point is 01:38:18 What's the what's the future of KMAI? I mean, yeah, it's a business. Do you want to just you just focused on getting things right now, making some small amount of money in the meantime, and then when it works, it works a new scale. Our burn rate is about 200 K a month, and our revenue is about 100 K a month. So we need to forex our revenue, but we haven't like tried very hard at that yet. And the revenue is basically selling stuff online.
Starting point is 01:38:46 Yeah, we sell stuff shop.com.ai. Is there other, well, okay, so you'll have to figure out that's our only see, but to me, that's like respectable revenues. We make it by selling products to consumers, we're honest and transparent about what they are. Most actually level four companies, right? Because you could easily start blowing up like smoke, like overselling the hype and feeding into getting some fun raisers. Oh, you're the guy, you're a genius because you hacked the iPhone. Oh, I hate that. I hate that.
Starting point is 01:39:19 Yeah, I can trade my social capital for more money. I did it once, I almost regret it doing it the first time. Well, on a small tangent, what's your, you seem to not like fame, and yet you're also drawn to fame. What, where are you on that currently? Have you had some introspection, some soul searching? Yeah, I actually, I've come to a pretty stable position on that, like after the first time. I realize that I don't want attention from the masses.
Starting point is 01:39:53 I want attention from people who I respect. Who do you respect? I can give a list of people. So these like Elon Musk type characters. Yeah. Or actually, you know what? I'll make it more broad than that. I won't make it about a person. I respect skill.
Starting point is 01:40:10 I respect people who have skills, right? And I would like to like be, I'm not gonna say famous, but be like known among more people who have like real skills. Who in cars do you think have skill, not do you respect? Oh, Kyle Vot has skill. A lot of people at Waymo have skill. And I respect them. I respect them as engineers.
Starting point is 01:40:40 Like I can think, I mean, I think about all the times in my life where I've been like dead sad on approaches and they turn out to be wrong. Yeah. So I mean, I think about all the times in my life where I've been like dead set on approaches and they turn out to be wrong. Yeah. Um, so I mean, this might, I might be wrong. I accept that. I accept that there's a decent chance that I'm, I'm wrong. And actually, I mean, having talked to Chris Armstrong since Sterling Anderson, those, those guys, I mean, I, uh, deeply respect Chris. I just admire the guy. Uh, he's legit. And you drive a car through the desert when everybody thinks it's impossible. That's, uh. He's legit. And you drive a car through the desert
Starting point is 01:41:05 when everybody thinks it's impossible. That's legit. And then I also really respect the people who are like writing the infrastructure of the world, like the Linus Torvalds and the Chris Ladd. Oh yeah, they were doing their real work. I don't say they're doing their real work. This heavy dog that Chris Ladd and you realize,
Starting point is 01:41:21 especially when they're humble, it's like you realize, oh you guys, we're just using your, the all the hard work that you did. Yeah, that's incredible. What do you think, Mr. Anthony Lewandowski? What do you, he's another mad genius? Sharp guy, oh yeah. What do you think he might long-term become a competitor?
Starting point is 01:41:44 Oh, Tacoma? Oh, to comma. Well, so I think that he has the other right approach. I think that right now there's two right approaches. One is what we're doing and one is what he's doing. Can you describe, I think it's called Pronto AI, starting to think, do you know what the approach is? I actually don't know. Embark is also doing the same sort of thing.
Starting point is 01:42:01 The idea is almost that you want to... So if you're... I can't partner with Honda and Toyota. Honda and Toyota are like 400,000 person companies. It's not even a company at that point. Like I don't think of it like... I don't personify it, I think of it like an object. But a trucker drives for a fleet, maybe that has like some truckers are independent, some truckers drive for fleets with a hundred trucks. There are tons of independent trucking companies out there. Start a trucking company and drive your costs down, or figure out how to drive down the cost of trucking.
Starting point is 01:42:40 Another company that I really respect is Nauto. Actually, I respect their business model. Nauto sells a driver monitor in camera and they sell it to fleet owners. If I owned a fleet of cars and I could pay 40 bucks a month to monitor my employees, this is gonna reduce as accidents 18%. It's so like that in the space that is like the business model that I like most respect because they're creating value today.
Starting point is 01:43:11 Yeah which is that's a huge one is how do we create value today with some of this and the link keeping things huge and it sounds like you're creeping in or full steam ahead on the drive of monitoring too which I think actually where the short-term value if you can get it right. I still, I'm not a huge fan of the statement that everything is to have drive of monitoring. I agree with that completely, but that statement usually misses the point that to get the experience of it right is not trivial. Oh, no, not at all. In fact, like, so right now we have, I think the timeout depends
Starting point is 01:43:44 on speed of the car, but we want to depend on the scene state. If you're on an empty highway, it's very different if you don't pay attention, and if you're coming up to a traffic light. And long term, it should probably learn from the driver, because I watched a lot of video. We've built a smartphone detector just to analyze how people are using smartphones and people are using it very differently. It is is a
Starting point is 01:44:11 Interesting. So texting styles there's I haven't watched nearly enough of the videos Yeah, I got millions of miles of people driving cars In this moment, I spent a large fraction of my time just watching videos because It's never fails to learn Like it never I've never failed from a video watching session to learn something I didn't know before fact. I usually Like when I eat lunch, I'll sit especially when the weather is good and just watch pedestrians with an eye to understand like from a computer vision eye just to see
Starting point is 01:44:45 to understand like from a computer vision eye just to see can this model can you predict what are the decisions made and there's so many things that we don't understand. This is what I mean about the state vector. Yeah, it's, I'm trying to always think like because I'm understanding in my human brain, how do we convert that into how how hard is the learning problem here? I guess is the fundamental question. So something that's from a hacking perspective, this always comes up, especially with folks. Well first, the most popular question is the trolley problem, right? So that's not a sort of a serious problem. There are some ethical questions that arise. Maybe you want to, do you think there's any ethical, serious, ethical questions
Starting point is 01:45:27 that are left? We have a solution to the Charlie problem at KMAI. Well, so there is actually an alert in our code, ethical dilemma detected. It's not triggered yet. We don't, we don't know how yet to detect the ethical dilemmas, but we're a level two system, so we're going to disengage and leave that decision to the human. You're such a troll. No, but the trolley problem deserves to be trolled. Yeah, that's a beautiful answer, actually. I know.
Starting point is 01:45:49 I gave it to someone who was like, sometimes people ask, like you asked about the trolley problem, like you can have a kind of discussion about it, like when you get someone who's like really like earnest about it, because it's the kind of thing where, if you ask a bunch of people in an office, whether we should use a sequel stack or a no sequel stack, if they're not that technical, they have no opinion. But if you ask them what color they want to
Starting point is 01:46:08 paint the office, everyone has an opinion on that. And that's why the trolley problem is. That's a beautiful answer. Yeah, we're able to detect the problem and we're able to pass it on to the human. Wow, I've never, never heard anyone say it. It's a nice escape route. Okay, but, um, proud level. Okay, but proud level two. I'm proud level two. I love it. So the other thing that people have some concern about with AI in general is hacking.
Starting point is 01:46:34 So how hard is it do you think to hack an autonomous vehicle either through physical access or through the more sort of popping around these adversarial examples on the sensors? Okay, the adversarial examples one. You popping out these adversarial examples on the sensors. Okay, the adversarial examples, one. You want to see some adversarial examples that affect humans, right? Oh, well, there used to be a stop sign here, but I put a black bag over the stop sign, and then people ran it, right?
Starting point is 01:46:57 Adversarial, right? Like, like, there's tons of human adversarial examples, too. Um, the question in general about, like, security, if's tons of human adversarial examples too. Um, the question in general about like security, if you saw something, something just came out today and like, there are always such hype headlines about like how navigate on autopilot was fooled by a GPS spoof to take an exit. Right. At least that's all they could do was take an exit.
Starting point is 01:47:20 If your car is relying on GPS in order to have a safe driving policy, you're doing something wrong. If you're relying, and this is why V2V is such a terrible idea, V2V now relies on both parties getting communication right. This is not even... So I think of safety... Security is like a special case of safety. Safety is like we put a little piece of caution tape
Starting point is 01:47:49 around the hole so that people won't walk into it by accident. Security is like put a 10 foot fence around the hole so you actually physically cannot climb into it with barbed wire on the top and stuff, right? So like if you're designing systems that are like unreliable, they're definitely not secure. Your car should always do something safe using its local sensors. If you're designing systems that are unreliable, they're definitely not secure.
Starting point is 01:48:05 Your car should always do something safe using its local sensors. And then the local sensor should be hardwired and then could somebody hack into your can-boss and turn your steering wheel on your brakes? Yes, but they could do it before coming AI2, so. Let's think out of the box and some things. So do you think Teleoperation has a role in any of this? So remotely stepping in and controlling the cars. No, I think that if safety, if the safety operation by design requires a constant link to the cars, I think it doesn't work. So that's the same argument using for V2i, V2v?
Starting point is 01:48:47 Well, there's a lot of non-safety critical stuff you can do with V2i. I like V2i. I like V2i way more than V2v, because V2i is already like, I already have internet in the car, right? There's a lot of great stuff you can do with V2i. Like, for example, you can, well, I already have V2i. Ways is VDi, right?
Starting point is 01:49:05 Ways can route me around traffic jams. That's a great example of VDi. Mm-hmm. And then, okay, the car automatically talks to that same service. Like, it's improving the experience, but it's not a fundamental fallback for safety. No, if any of your, if any of your,
Starting point is 01:49:18 uh, if any of your things that require wireless communication are more than QM, like having aasill rating. You should. You previously said that life is work and you don't do anything to relax. How do you think about hard work? What do you think it takes to accomplish great things? There's a lot of people saying that there needs to be some balance. You need to accomplish great things. There's a lot of people saying that there needs to be some balance, you need to accomplish great things, you need to take some time off, you need to reflect
Starting point is 01:49:49 and so on. And then some people are just insanely working, burning the candle of both ends. How do you think about that? I think I was trolling in the serrage interview when I said that off camera, right before I spoke to a little bit of weed, like, you know, come on, it went over to the joke, right? Like I do nothing to relax. Look where I am, I'm at a party, right? Yeah, sure. That's true.
Starting point is 01:50:11 So, no, of course, I don't. When I say that life is work, though, I mean that like, I think that what gives my life meaning is work. I don't mean that every minute of the day you should be working, I actually think this is not the best way to maximize results. I think that if you're working 12 hours a day, you should be working smarter and not harder. Well, so it gives work, gives you meaning for some people, other sorts of meaning is personal relationships, like family and so on.
Starting point is 01:50:41 You've also, in that interview with Saraj or the trolling mentioned that one of the things you look forward to in the future is AI girlfriends. So that's the topic that I'm all very much fascinated by, not necessarily girlfriends, but just forming a deep connection with AI. What kind of system do you imagine when you say AI girlfriend, whether you are trolling or not? No, that one I'm very serious about. I'm serious about that on both a shallow level and a deep level. I think that VR brothels are coming soon and are going to be really cool. It's not cheating if it's a robot.
Starting point is 01:51:16 I see the slogan already. But there's, I don't know if you've watched it, just watched the Black Mary episode. I watched the last one. Yeah. Yeah. Oh, the Ashley two one. Oh, the. No, where there's two friends were having sex with each other in the VR game.
Starting point is 01:51:37 In the VR game. Yeah. It's just two guys, but one of them was a female. Yeah. Yeah. Which is another mind blowing concept that in VR, you don't have to be the form. You can be two animals having sex. Weird. I mean, I'll see how nice of this offer maps the nerve endings, right?
Starting point is 01:51:56 Yeah. I mean, yeah, they sweep a lot of the fascinating, really difficult technical challenges under the rock. Like, assuming it's possible to do the mapping of the nerve endings Then I wish yeah, I saw that the way they did it with the little like stim you knew it on the head that that'd be amazing So well and on a shallow level like you could set up like almost Brothel with like real dolls and Oculus quests, right some good software. I think it'd be a cool novelty experience But no on a deeper, like emotional level. I mean, yeah, I would really like to fall in love with the machine. Do you see yourself having a long-term relationship
Starting point is 01:52:41 of the kind monogamous relationship that we have now with the robot, with the AI system, even, not even just the robot. So I think about maybe my ideal future. When I was 15, I read Elias Yugidkowski's early writings on the singularity and like that AI is going to surpass human intelligence massively. He made some more slaw-based predictions that I mostly agree with, and then I really struggled for the next couple years of my life. Like why should I even bother to learn anything? It's all going to be meaningless when the machine show up.
Starting point is 01:53:22 Right. Fun. Maybe when I was that young, I was still a little bit more pure and really like clung to that. And then I'm like, well, not a machine's ain't here yet. You know, and I seem to be pretty good at this stuff. Let's try my best, you know, like what's the worst that happens. But the best possible future I see is me sort of merging with the machine. And the way that I personify this is in a long-term and augment relationship with the machine.
Starting point is 01:53:49 Oh, you don't think there's room for another human in your life if you really truly merge with another machine? I mean, I see merging. I see like the best interface to my brain is like the same relationship interface to merge with an AI, right? What does that merging feel like? I've seen couples who've been together for a long time. And like, I almost think of them as one person, like couples who spend all their time together.
Starting point is 01:54:17 And that's fascinating. You're actually putting what is that merging actually looks like? It's not just a nice channel. Like a lot of people imagine it's just an efficient link Search link to Wikipedia or something. I don't believe in that. But it's more you're saying that there's the same kind of the same kind of relationship I have another human as a deep relationship is to look that's what merging looks like That's that's pretty I don't believe that link is possible
Starting point is 01:54:42 I think that that link so you're like oh, oh, I'm going to download Wikipedia right to my brain. My reading speed is not limited by my eyes. My reading speed is limited by my inner processing loop. And to like bootstrap that sounds kind of unclear how to do it and horrifying. But if I am with somebody, and I'll use somebody who is making a super sophisticated model of me, and then running simulations on that model. I'm not going to get into the question whether the simulations are conscious or not. I don't really want to know what it's doing. But using those simulations to play out hypothetical futures for me, deciding what
Starting point is 01:55:19 things to say to me to guide me along a path and that's how I envision it. So on that path to AI of superhuman level intelligence You've mentioned that you believe in the singularity that singularity is coming. Yeah again could be trolling could be not could be part All trolling has truth in it. I don't know what that means anymore. What is the singularity? So yeah, so that's that's really the question. How many years do you think before the singularity or what form do you think it will take? Does that mean fundamental shifts in capabilities of AI? Does it mean some other kind of ideas?
Starting point is 01:55:54 I'm maybe just my roots, but so I can buy a human beings worth of compute for like a million bucks a day. It's about one TPU pod V3. I think they claim a hundred paid of flops. That's being generous. I think humans are actually more like 20s. That's like five humans. That's pretty good.
Starting point is 01:56:10 Google needs to sell their TPUs. Um, but I could buy, I could buy GPUs. I could buy a stack of like a, I'd buy 1080 TIs, build data center full of them. And for a million bucks, I can get a human worth of a compute. But when you look at the total number of flops in the world, when you look at human flops, which goes up very, very slowly with the population, and machine flops, which goes up exponentially, but it's still nowhere near.
Starting point is 01:56:38 I think that's the key thing to talk about when the singularity happened. When most flops in the world are silicon and not biological, that's kind of the crossing point. Like they are now the dominant species on the planet. And just looking at how technology's progressing, when do you think that could possibly happen? You think it would happen in your lifetime? Oh yeah, definitely in my lifetime.
Starting point is 01:57:00 I've done the math. I like 2038 because it's the Unix timestamp rollover. Yeah, beautifully put. I'd done the math. I like 2038 because it's the Unix timestamp rollover. Yeah, beautifully put. So you've, you've said that the meaning of life is to win. If you look five years into the future, what does winning look like? So... I can go into technical depth to what I mean by that to win. It may not mean, I was criticized for that in the comments. Like, doesn't this guy want to save the penguins in Antarctica? Or like, oh man, you know, listen to what I'm saying. I'm not talking about, like, I have a yacht or something. I am an agent.
Starting point is 01:57:47 I am put into this world. And I don't really know what my purpose is. But if you're a reinforcement, if you're an intelligent agent and you're put into a world, what is the ideal thing to do? Well, the ideal thing mathematically, you can go back to like Schmidt Hoover theories about this, is to build a compressive model of the world, to build a maximally compressive to explore the world such that your exploration function maximizes the
Starting point is 01:58:13 derivative of compression of the past. Schmidt Hoover has a paper about this. And like, I took that kind of as like a personal goal function. So when I mean to win, I mean like, maybe maybe this is religious, but like I think that in the future I might be given a real purpose or I may decide this purpose myself. And then at that point, now I know what the game is and I know how to win. I think right now I'm still just trying to figure out what the game is. But once I know, so you have, you have imperfect information.
Starting point is 01:58:43 You have a lot of uncertainty about the reward function and you're discovering it. Exactly. That's the better way to put it. The purpose is to maximize it while you have a lot of uncertainty around it. You're both reducing the uncertainty and maximizing it at the same time. That's the technical level. If you believe in the universal prior, Yeah. What is the universal reward function? That's the better way to put it. So that win is interesting. I think I speak for everyone
Starting point is 01:59:12 in saying that I wonder what that reward function is for you. And I look forward to seeing that in five years and ten years. I think a lot of people can do myself for cheering you on, man. So I'm happy you exist and I wish you the best of luck. Thanks for talking to me today, man. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.