Lex Fridman Podcast - Elon Musk: Tesla Autopilot

Starting point is 00:00:00 The following is a conversation with Elon Musk. He's a CEO of Tesla, SpaceX, Neuralink, and a co-founder of several other companies. This conversation is part of the Artificial Intelligence podcast. The series includes leading researchers in the academia and industry, including CEOs and CTOs of automotive, robotics, AI, and technology companies.

Starting point is 00:00:24 This conversation happened after the release of the paper from our group at MIT on driver-functional vigilance during use of Tesla's autopilot. The Tesla team reached out to me offering a podcast conversation with Mr. Musk. I accepted with full control of questions I could ask and the choice of what is released publicly. I ended up editing out nothing of substance. I've never spoken with Elon before this conversation, publicly or privately. Neither he nor his companies have any influence on my opinion nor on the rigor and integrity of the scientific method that I practice in my position at MIT. Tesla has never financially supported my research, and I've never owned a Tesla vehicle.

Starting point is 00:01:07 I've never owned Tesla stock. This podcast is not a scientific paper. It is a conversation. I respect Elon as I do all other leaders and engineers I've spoken with. We agree on some things, and disagree on others. My goal is always with these conversations is to understand the way the guests sees the world. One particular point of this agreement in this conversation

Starting point is 00:01:29 was the extent to which camera-based driver monitoring will improve outcomes, and for how long it will remain relevant for AI-assisted driving. As someone who works on and is fascinated by human-centered artificial intelligence, I believe that if implemented and integrated effectively, camera-based driver-moderating is likely to be of benefit in both the short-term and the long-term. In contrast, Elon and Tesla's focus is on the improvement of autopilot such that its statistical safety benefits override any concern in human behavior and psychology. Elon and I may not agree on everything, but I deeply respect the engineering

Starting point is 00:02:13 and innovation behind the efforts that he leads. My goal here is to catalyze a rigorous, nuanced and objective discussion in industry and academia on AI assisted driving. One that ultimately makes for a safer and better world. And now here's my conversation with Elon Musk. What was the vision, the dream of autopilot when in the beginning, the big picture system level when it was first conceived and started being installed in 2014, the hardware and the cars? What was the vision, the dream? I would characterize the vision or dream simply that there are obviously two massive

Starting point is 00:03:12 revolutions in the automobile industry. One is the transition to electrification and then the other is autonomy. And it became obvious to me that in the future, any car that does not have autonomy, would be about as useful as a horse. Which is not to say that there's no use. It's just rare and somewhat idiosyncratic

Starting point is 00:03:40 if somebody has a horse at this point. It's just obvious that cars will drive themselves completely. It's just a question of time. And if we did not participate in the autonomy revolution, then our cars would not be useful to people relative to cars that are autonomous. I mean, an autonomous car is arguably worth five to ten times more than a car that is not autonomous. In a long term. Transformed, you mean by a long term, but let's say at least for the next five years, perhaps ten years. So there are a lot of very interesting design choices with Aut autopilot early on. First is showing on the instrument cluster,

Starting point is 00:04:26 or in the Model 3 on the center stack display, what the combined sensor suite sees, what was the thinking behind that choice? Was there debate? What was the process? The home point of the display is to provide a health check on the vehicle's perception of reality. So the vehicle's taking an information for a bunch of sensors, primarily cameras, but also radar and ultrasonic GPS and so forth. And then that information is then rendered into vector space and that you know with a bunch of objects with with properties like lane lines and traffic lights and other cars,

Starting point is 00:05:08 and then in vector space, that is re-randered onto a display, so you can confirm whether the car knows what's going on or not by looking at the window. Right, I think that's an extremely powerful thing for people to get an understanding, so it'd become one with the system and understanding what the system is capable of. Now, have you considered showing more? So if we look at the computer vision, you know, like road segmentation,

Starting point is 00:05:35 lane detection, vehicle detection, object detection, underlying the system, there is at the edges some uncertainty. Have you considered revealing the parts that the uncertainty in the system, the sort of problem that is associated with say image recognition or something like that? Yeah, so right now it shows like the vehicles and vicinity of very clean crisp image

Starting point is 00:05:58 and people do confirm that there's a car in front of me and the system sees there's a car in front of me but to help people build an intuition of what computer vision is by showing some of the uncertainty. Well, I think it's, my car I always look at the sort of the debug view. And there's this two debug views. One is augmented vision, which I'm sure you've seen where it basically, we draw boxes and labels around objects that are recognized.

Starting point is 00:06:28 And then there's what we're called the visualizer, which is basically a vector space representation, summing up the input from all sensors. That doesn't show any pictures, but it shows all of the, it basically shows the cause view of the world in vector space. But I think this is very difficult for people to, normal people to understand. They would not know what they're looking at. So it's almost an HMI challenge to the current things that are being displayed is optimized for the general public understanding of what the systems gave them both?

Starting point is 00:07:05 It's like if you know idea of how computer vision works or anything, you can solo get the screen and see if the card knows what's going on. And then if you're, you know, if you're a development engineer or if you're, you know, if you're, if you have the development build, like I do, then you can see, you know, all the debug information. But those would just be like total diverse to most people. Right. What's your view on how to best distribute effort?

Starting point is 00:07:31 So there's three, I would say technical aspects of autopilot that are really important. So it's the underlying algorithms, like the neural network architecture, there's the data, so the strain on, and then there's the hardware development, there may be others, but so look algorithm Data hardware

Starting point is 00:07:48 You only have so much money only have so much time. What do you think is the most important thing to Alicate resources to do you see it as pretty evenly distributed 2.03 We automatically get a fast amounts of data because all of our cars have eight external facing cameras and radar and usually 12 ultrasonic sensors, GPS obviously, and IMU. And so we basically have a fleet that has, and we've got about 400,000 cars on the road that have that level of data. Actually, I think you keep quite close track of it, actually.

Starting point is 00:08:32 Yes. Yeah, so we're approaching half a million cars on the road that have the full sensor suite. Yeah. So this is, I'm not sure how many other cars on the road have this sensor suite, but I'd be surprised if it was more than 5,000, which means that we have 99% of all the data. So there's this huge inflow of data. Absolutely. Massing inflow of data. And then we, it's taken about three years, but now we've finally developed a full self driving computer,

Starting point is 00:09:02 about three years, but now we've finally developed our full self-driving computer, which can process an order magnitude as much as the Nvidia system that we currently have in the cars. And it's really just to use it, you unplug the Nvidia computer and plug the Tesla computer in. And that's it. And it's, in fact, we're not even, we still exploring the boundaries of

Starting point is 00:09:25 capabilities. We're able to run the cameras at full frame rate, full resolution, not even crop the images, and it still got headroom even on one of the systems. The hard, full self driving computer is really two computers, two systems on a chip that are fully redundant. So you could put a both through basically any part of that system and it still works. The redundancy, are they perfect copies of each other? Yeah. Also, it's purely for redundancy as opposed to an argue machine kind of architecture where they're both making decisions.

Starting point is 00:09:56 This is purely for redundancy. You think even more like, if you have a twin engine aircraft, commercial aircraft, the system will operate best if both systems are operating, but it's capable of operating safely on one. So, but as it is right now, we can just run, we haven't even hit the edge of performance. So there's no need to actually distribute functionality across both SSEs. We can actually just run a full duplicate on on on each one. You haven't really explored or hit the limit of this. Not yet, the limiter. So the magic of deep learning is that it gets better with data. You said there's a huge inflow of data, but the thing about driving the really valuable

Starting point is 00:10:50 data to learn from is the edge cases. So I've heard you talk somewhere about autopilot disengagement being an important moment of time to Yes. To use, is there other edge cases or perhaps can you speak to those edge cases, what aspects that might be valuable, or if you have other ideas, how to discover more and more edge cases in driving? Well, there's a lot of things that are learned. There are certainly edge cases where I say somebody is on autopilot and they take over and then okay that's a trigger that goes to a system that says okay they take over for

Starting point is 00:11:31 convenience or do they take over because the autopilot wasn't working properly there's also like let's say we're trying to figure out what is the optimal spline for traversing an intersection. Then the ones where there are no interventions are the right ones. So you then say, okay, when it looks like this, do the following. And then you get the optimal spline for a complex, no getting a complex intersection. So that's for, there's kind of the common case. You're trying to capture a huge amount of samples of a particular intersection,

Starting point is 00:12:10 how one thing went right, and then there's the edge case where, as you said, not for convenience, but something didn't go exactly right. Somebody took over, somebody inserted manual control from Autopilot. And really, like the way to look at this is view all input is error. If the user had to do input, it does something, all input is error. That's a powerful line to think of it that way, because they may very well be error. But if you want to exit the highway or if you want to, it's a navigation decision that all autopilot is not currently designed to do, then the driver takes over. How do you know

Starting point is 00:12:44 that? That's going to change with navigate an autopilot, which currently designed to do, then the driver takes over. How do you know the difference? That's gonna change with Navigate and Autopilot, which we're just released, and without still confirm. So the Navigate, like lane change based, it's sitting in control in order to do a lane change or exit a freeway or doing a highway interchange. The vast majority of that will go away with the release that just went out.

Starting point is 00:13:05 Yeah, so that I don't think people quite understand how big of a step that is. Yeah, they don't. So if you drive the car, then you do. So you still have to keep your hands on the steering wheel currently when it does the automatic lane change. What are, so there's these big leaps through the development of autopilot through its history and What stands out to you as the big leaps? I would say this one navigate an autopilot without Confirm without having to confirm as a huge leap. It is a huge leap

Starting point is 00:13:38 What it will automatically overtake slow cars so it's it's both navigation and Seeking the fastest lane. So it'll overtake the slow cars and exit the freeway and take highway interchanges and then we have traffic light to recognition which is introduced initially as a warning. On the development version that I'm driving, the car fully stops and goes at traffic lights. So those are the steps, right? You've just mentioned something that an inkling of a step towards full autonomy. What would you say are the biggest technological roadblocks to full cell driving? Actually, I don't think I think we just the full cell driving computer that we just

Starting point is 00:14:31 that has a we're called the FSD computer that that's now in production So if you order Any model S or X or any model three that has the full self driving Package you'll get the FSD computer. That was that's important to have enough base computation, then refining the neural net and the control software, which all of that can just be providers and over their update. The thing that's really profound and while I'll be emphasizing at the sort of what the investor day that we're having focused on autonomy is that the cars currently

Starting point is 00:15:11 being produced, but the hardware currently being produced is capable of full self-driving. But capable is an interesting word because the hardware is. And as we refine the software, the capabilities will increase dramatically, and then the reliability will increase dramatically, and then it will receive regulatory approval. So essentially buying a car today is an investment in the future. You're essentially buying,

Starting point is 00:15:40 the most profound thing is that, if you buy a Tesla today, I believe you are buying an appreciating asset, not a depreciating asset. So that's a really important statement there because if hardware is capable enough, that's the hard thing to upgrade usually. Yes. So then the rest is a software problem.

Starting point is 00:16:00 Yes. Software has no marginal cost really. But what's your intuition on the software side? How hard are the remaining steps to get it to where you know the experience, not just the safety, but the full experience is something that people would enjoy. I think people would enjoy it very much on the highways. It's a total game changer for quality of life for using Tesla, AutoPilot on the highways. So it's really just extending that functionality to city streets, adding in the traffic light recognition, navigating complex intersections,

Starting point is 00:16:48 and then being able to navigate complicated parking lots so the car can exit a parking space and come and find you even if it's in a complete maze of a parking lot. And then you can just drop you off and find a parking spot by itself. Yeah, in terms of enjoyability and something that people would actually find a lot of use from the parking lot is a really, it's rich of annoyance

Starting point is 00:17:20 when you have to do it manually. So there's a lot of benefit to be gained from automation there. So let me start injecting the human into this discussion a little bit. So let's talk about full autonomy. If you look at the current level four vehicles being test on road like Waymo and so on, they're only technically autonomous. They're really level two systems

Starting point is 00:17:43 with just the different design philosophy because there's always a safety driver in almost all cases in their monitoring system. Do you see Tesla's full self driving as still for a time to come requiring supervision of the human being. So it's capabilities of power phone off the drive, but nevertheless, requires a human to still be supervising just like a safety driver is in a other fully autonomous vehicles. I think it will require detecting hands on wheel for at least six months or something like that from here. Really it's a question of like, from a regulatory standpoint, how much safer than a person just autopelt need to be for it to be okay to not monitor the car? And this is a debate that one can have it. And then if you need a large sample, a large amount of data so that you can prove with

Starting point is 00:18:50 high confidence, statistically speaking, that the car is dramatically safer than a person. And that adding in the person monitoring does not materially affect the safety. So it might need to be like two or three hundred percent safe for the person. And how do you prove that? Incidents per mile. Incidents per mile. So crashes and fatalities.

Starting point is 00:19:13 So yeah, which fatalities would be a factor, but there are just not enough fatalities to be statistically significant at scale. But there are enough crashes, there are more formal crashes in their fatalities. So you can assess where is the probability of a crash, that then there's another step which probability of injury, and probability of prone to injury, and probability of death. And all of those need to be much better than a person by at least

Starting point is 00:19:46 perhaps 200%. And you think there's the ability to have a healthy discourse with the regulatory bodies on this topic? I mean, there's no question that Regulators pay disproportionate amount of attention to that which generates press. This is just an objective fact. And Tesla generates a lot of press. So, in the United States, I think almost 40,000 automotive deaths per year. But if there are four in Tesla, they'll probably receive a thousand times more press than

Starting point is 00:20:24 anyone else So the the psychology of that is actually fascinating I don't think we'll have enough time to talk about that But I have to talk to you about the human side of things So myself and our team at MIT recently released the paper on Functional vigilance of drivers while using autopilot This is work we've been doing since autopilot was first released publicly over three years ago. Collecting video driver faces and driver body. So I saw that you tweeted a quote from the abstract so I can at least guess that you've glanced at it.

Starting point is 00:21:00 Yeah, right. Can I talk to you through what we found? Sure. Okay. So it appears that in the data that we've collected, that drivers are maintaining functional vigilance such that we're looking at 18,000 disengagement from autopilot, 18,900, and annotating where they able to take over control in a timely manner. So they were there present looking at the road to take over control. Okay, so this goes against what what many would predict from the body of literature and vigilance with automation. Now the question is do you think these results hold across the broader population? So ours is just a small subset. Do you think one of the criticism is that, you know, there's a small minority of drivers that may be highly responsible where their vigilant,

Starting point is 00:21:53 it's decrement would increase without a pilot use? I think this is all really gonna be swept. I mean, the systems are proving so much, so fast that this is gonna to be a mood point very soon. Where vigilance is, like if something's many times safer than a person, then adding a person, the effect on safety is limited. And in fact, it could be negative.

Starting point is 00:22:27 That's really interesting. So the, so the fact that a human may, some percent of the population may exhibit a visual and document will not affect the overall statistics numbers of safety. No, in fact, I think it will become very, very quickly, maybe and towards the end of this year, but I'd say, I'll be sure it was not next year at the latest, that having a human intervene will decrease safety. Decrease. I can imagine if you're an elevator. I used to be that there were elevator operators,

Starting point is 00:23:02 and you couldn't go on an elevator by yourself and work the lever to move between floors. And now nobody wants an elevator operator because the automated elevator that stops the floors is much safer than the elevator operator. And in fact, it would be quite dangerous to have someone with a lever that can move the elevator between floors.

Starting point is 00:23:26 So that's a really powerful statement, and a really interesting one. But I also have to ask from a user experience and from a safety perspective, one of the passions for me algorithmically is a camera-based detection of just sensing the human, but detecting what the driver is looking at, cognitive load, body pose, on the computer vision side, that's a fascinating problem. But do you, and there's many in industry, believe you have to have camera-based driver monitoring? Do you think there could be benefit gained from driver monitoring? If you have a system that's out of below below human level reliability, then drive monitoring makes sense.

Starting point is 00:24:06 But if your system is dramatically better, more level than than a human, then drive monitoring is not just not help much. And like I said, you just like as you wouldn't want someone into like you want someone in the elevator, if you're in an elevator, do you really want someone with a big lever? So some random person operating the elevator between flows, I wouldn't trust that. I would rather have the buttons. Okay, you're optimistic about the pace of improvement of the system. That's from what you've seen with the full-subject having car computer. The rate of improvement is exponential. So one of the other very interesting design choices early on that connects to this is the operational design domain of autopilot.

Starting point is 00:24:55 So where autopilot is able to be turned on? So contrast, another vehicle system that we are studying is the Cadillac Supercars system. That's in terms of ODD, very constrained to this particular kinds of highways, well mapped, tested, much narrower than the ODD of Tesla vehicles. What's, there's, there's, there's, like, ADD. Yeah. That's good. That's a good line.

Starting point is 00:25:25 What was the design decision in that different philosophy of thinking where the pros and cons, what we see with a wide ODD is driver, Tesla drivers are able to explore more the limitations of the system, at least early on, and they understand together with the instrument cluster display, they start to understand together with the instrument cluster display, they start to understand what are the capabilities. So that's the benefit. The con is you're letting drivers use it basically anywhere. So anyways, it could detect lanes with confidence.

Starting point is 00:25:58 Was there a philosophy, design decisions that were challenging, that were being made there. Or from the very beginning, was that done on purpose within 10? Well, I mean, I think it's, frankly, it's pretty crazy giving it, letting people drive a two-ton death machine manually. That's crazy. Like, in the future, people were like,

Starting point is 00:26:23 I can't believe anyone was just allowed to drive one of these two-ton death machines. I think it's just drive wherever they wanted. Just like elevators, you just like move the elevator with the lever wherever you want it can stop it halfway between floors if you want. It's pretty crazy. So it's going to seem like a mad thing in the future that people were driving cars. So I have a bunch of questions about the human psychology, about behavior and so on. That would become that. That mood is totally moving. Yeah, because you have faith in the AI system, not faith, but both on the hardware side and the deep learning

Starting point is 00:27:07 approach of learning from data will make it just far safer than humans. Yeah, exactly. Recently, there are a few hackers who tricked Autopilot to act in unexpected ways with adversarial examples. So, we all know that neural network systems are very sensitive to minor disturbance system, these adversarial examples on input. Do you think it's possible to defend against something like this for the industry? Sure.

Starting point is 00:27:33 So, can you elaborate on the confidence behind that answer? Well, the neural net is just like a basic bunch of matrix math. You have to be like a very sophisticated, so many who really have such neural nets and like basically reverse engineer how the matrix is being built and then create a little thing that just exactly causes the matrix math to be slightly off. But it's very easy to then block that by having basically anti negative recognition. It's like if the system sees something that looks like a matrix hack excluded, the so-feel is such an easy thing to do.

Starting point is 00:28:19 So learn both on the valid data and the valid data. So basically learn on the adversarial examples to be able to exclude them. Yeah, you basically want to both know what is what is a car and what is definitely not a car. You train for this is a car and this is definitely not a car. Those are two different things. If you'll have no idea of neural nets really. They probably think your lesson involves like, you know, fishing net alone thing. So, as you know, so taking a step beyond just Tesla and autopilot, current deep learning approaches still seem in some ways to be far from general intelligence systems. Do you think the current approaches will take us to general intelligence or do totally new ideas need to be invented? I think we're missing a few key ideas for general intelligence, general artificial general

Starting point is 00:29:18 intelligence. But it's going to be upon us very quickly. And then we'll need to figure out what shall we do if we even have that choice. It's amazing how people can differentiate between say the narrow AI that allows a card to figure out what a lane line is and navigate streets versus general intelligence. Like these are just very different things. Like your toaster and your computer are both machines, but one's much more sophisticated than another. You're confident with does that you can create

Starting point is 00:29:58 the world's best toaster. The world's best toaster, yes. But the world's best self-driving. Yes. The world's best host for yes, the world's best self-driving. I'm, yes. To me, right now, this seems game set match. I don't, I mean, that's how I don't want to be complacent or overconfident, but that's what it appears. That is just literally what it, how it appears right now. I could be wrong, but it appears to be the case that Tesla is vastly ahead of everyone.

Starting point is 00:30:27 Do you think we will ever create an AI system that we can love and love this back in a deep meaningful way like in the movie her? I think AI will be capable of convincing you to fall in love with it very well. And that's different than us humans? You know, we start getting into a metaphysical question of like, do emotions and thoughts exist in a different realm than the physical? And maybe they do, maybe they don't. I don't know, but from a physics standpoint, I tend to think of things, you know,

Starting point is 00:31:00 like physics was my main sort of training. And for a physics standpoint, essentially, if it loves you in a way that is, that you can't tell whether it's real or not, it is real. That's a physics view of love. Yeah. If there's no, if you, if you cannot, just, if you can't approve that it does not, if there's no test that you can apply that would make it allow you to tell the difference,

Starting point is 00:31:32 then there is no difference. And it's similar to seeing our world of simulation, there may not be a test to tell the difference between what the real world of simulation and therefore from a physics perspective, it might as well be the same thing. Yes. There may be ways to test whether it's a simulation. There might be, I'm not saying they're wrong, but you could certainly imagine that a simulation could correct, that once an entity in the simulation found a way to detect the simulation, it could either restart, you know, pause the simulation, start a new simulation or do one of any other things that then corrects for that error. So when maybe you or somebody else creates an AGI system and you get to ask her one question,

Starting point is 00:32:19 what would that question be? What's outside this simulation? Elon, thank you so much for talking today. It was a pleasure. All right. Thank you. you

Your Ad Here

Lex Fridman Podcast - Elon Musk: Tesla Autopilot

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.