a16z Podcast - Tesla's Road Ahead: The Bitter Lesson in Robotics
Episode Date: October 24, 2024What does Rich Sutton’s "Bitter Lesson" reveal about the decisions Tesla is making in its pursuit of autonomy?In this episode, we dive into Tesla’s recent "We, Robot" event, where they unveiled bo...ld plans for the unsupervised full-self-driving Cybercab, Robovan, and Optimus—their humanoid robot, which Elon Musk predicts could become “the biggest product ever.”Joined by a16z partners Anjney Midha and Erin Price-Wright, we explore how these announcements reflect the evolving intersection of hardware and software. We’ll unpack the layers of the autonomy stack, the sources of data powering it, and the challenges involved in making these technologies a reality.Anjney, with his experience in computer vision and multiplayer tech at Ubiquity6, and Erin, an AI expert focused on the physical world, share their unique perspectives on how these advancements could extend far beyond the consumer market.For more insights, check out Erin’s articles linked below. Resources: Find Anj on Twitter: https://x.com/anjneymidhaFind Erin on Twitter: https://x.com/espricewrightRead Erin’s article ‘A Software-Driven Autonomy Stack Is Taking Shape’: https://a16z.com/a-software-driven-autonomy-stack-is-taking-shape/AI for the Physical World: https://a16z.com/ai-for-the-physical-world/ Stay Updated: Let us know what you think: https://ratethispodcast.com/a16zFind a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
This is the big race in robotics.
The smarter your brain, so to speak,
the less specialized your appendages have to be.
AI has pushed every single one of these
kind of to its limit into a new state of the art.
The way they're solving precision
is instead of throwing more sensors on the car,
is to basically throw more data at the problem.
Data is absolutely eating the world.
What is good enough?
We used to have the Turing test,
which obviously we're blown past now.
His short hand for it was like the AWS
of AI. He's got this idea of this distributed swarm of unutilized inference computers.
Whether that's an oil rig, whether that's a mine, whether that's a battlefield, there's so many
different use cases for a lot of this underlying technology that are really starting to see
the light of day. It's basically an if not a win. In inevitability.
Earlier this month, Elon Musk and the team at Tesla held their Wii robot event, where they
unveiled their plans for the unsupervised full self-driving, cybercab, and robovan. Plus, Optimus,
their answer to consumer-grade humanoid robots, and also what Musk himself predicted would be,
quote, the biggest product ever of any kind. Now, of course, none of these products are on the
market yet, but several demos were on show at the event. Naturally, the response was mixed.
Supporters said we got a glimpse of the future, while critics said the details were missing.
But in today's episode, we're not here to debate that.
What we do want to talk about is what this indicates about the intersection of where hardware and software meet.
So what does Rich Sutton's 2019 blog post, The Bitter Lesson, tell us about the decisions that Tesla's making in autonomy?
And how realistic is the quoted $30,000 price range?
Also, what are the different layers of the autonomy stack?
And where do we get the data to power it?
And what does any of this look like when you exit the consumer sphere?
We cover all this and more with A16C partners, Anjani Mita, and Aaron Price Wright.
Anjane previously founded Ubiquity 6, a pioneering computer vision and multiplayer technology company
that sat right at this intersection of hardware and software, and was eventually acquired by Discord.
Aaron, on the other hand, invests on our American Dynamism team with a focus on AI for the physical world.
And if you'd like to dig even deeper here, Aaron has penned several articles on the topic that we've linked in our show notes.
All right, let's get to it.
As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16c.com slash disclosures.
So last week, Tesla had their we robot event, and Musk announced the cybercab, the Robovan, or as he liked to call it, the Robovin, and the Optimus.
You guys are so immersed in this hardware software world. I'd love to just get your initial reaction.
From my perspective, it wasn't that there was anything in particular that was super surprising, but what was exciting as just sort of a culmination of one thing that Elon Musk does really well and Tesla has done really.
well, which is continue to pour love and energy and money in time into a dream and a vision
that's been going on for a really long time, like well past when most financial investors
and most people kind of lost the luster of self-driving cars after their initial craze in the
mid to late 2010s. And they've just continued to plot along and to continue to make developments
and now we're finally seeing like this glimpse of the future for the first time in a really
long time. I think that's right. I think it was very impressive, but unsurprising.
Yeah. So I think the two schools of thought when people watch the event was, one was absolutely
this whole, oh my God, this is such vaporware. He shared literally nothing on engineering details.
What the hell? Come on. Give us the meat on timelines and dates and prices. And then the opposing
view was like, holy shit, they're still going. They haven't given up on any of this autonomy stuff that
he's been talking about for years. And I'm absolutely more empathetic towards the view that it was a
lean towards the latter, which is that it saw it as an homage to the bitter lesson.
It's a sort of amazing blog post that I'm going to do a terrible job of summarizing by
this great computer scientist Rick Sutton, which basically says that over the last 70 years or so
of computer science history, what we've learned is that general purpose methods basically beat
out any specific methods in artificial intelligence in particular.
Basically, the idea that if you're working on solving a task that requires intelligence,
you're usually better off leveraging Moore's law and more compute and more data than trying to
hand engineer a technique or a set of algorithms to solve a particular task. And broadly speaking,
that's been the big grand debate in self-driving and autonomy, I would say, for the last two decades,
right? It's the sort of general purpose, bitter lesson school versus the let's model self-driving
as a specific task. As a set of discrete decision-making algorithms unconnected to each other.
A system to solve, let's say, edge detection around stop signs, right? Where self-driving is a really hard
problem. And you could totally say, well, there's so many edge cases in the world.
that we should map out each of those edge cases.
And I think it was an homage to the bitter lesson.
So that's what I was most excited about is he did share actually details
that their pipeline is basically an end-to-end deep learning approach.
Right, which is incredible and probably true only for the last,
my guess is 18 to 24 months.
Right.
Yeah.
Yeah.
And I mean, in the bitter lesson,
he also talks about the fact that it's really appealing to do the opposite
because in the short term, you will get the benefit,
but the broader deep learning approach ends up winning out
in the long term. And a lot of people talk about Musk. Musk says it about himself that the timeline
sometimes are off, but he's basically banking on that premise in the long term. It's basically an
if not a when. In inevitability. And I think the event was the first time that it really did feel
in an emotional sense for the average American consumer. I'm not talking about the super duper
tech literate people who wanted the details of the underlying models and their weights. But like for
the average American consumer the first time that this version of the future felt like in inevitability.
And before we get into maybe the specifics around where else hardware and software are intersecting,
I'd love to just talk about that, that average person who's watching because you guys are meeting
with companies and investors and this has been going on for quite some time.
So I'm just curious if maybe you noticed anything under the hood or maybe the meta in that announcement or event
that maybe the average person watching is, you know, what are they seeing? They're saying things like,
oh, maybe it was human controlled and not like fully AI and device or other people are commenting on
the fact that these humanoids are shaped like humans. Like, why do we need that? On the topic of
humanoids, I think humanoid are a great choice of embodiment for a robot to really emotionally connect to
and speak to a human being watching because I can relate to a human form factor. Obviously, we've
found out that it was teleoperated, which is, in my opinion, still doesn't take away from
like how cool and amazing it was. The human form factor is a way to connect what is happening
with robotics to a regular person who is like, okay, yes, I like see myself in that. This looks
like Star Wars or some other sci-fi movie. In reality, maybe this is like a controversial
opinion. I don't see the vast majority of economic impact over the next decade from robotics
coming from the humanoid form factor. But that doesn't take away from the power of the
symbol of having a humanoid make a drink at this event because it just like connects back to the
sort of science fiction promise of our childhoods getting sort of finally delivered.
The opening sequence, he started with like a sci-fi. I think it was a Blade Runner visual.
And he was like, we all love sci-fi and I want to be wearing that jacket that he's wearing in
the picture, but we don't want any of the other dystopian stuff. And so that definitely stuck out
to me is that he did not start the way he usually does. It's often a technical first sort of
story. But he started with a, here's a vision for where I think the world should go. So it was
much more Disney-esque in that. And it was quite poetic. I think they literally did it on the
Warner Brothers studio a lot. And so they like recreated a bunch of cities. And I think they had
on site of the event, the robo vans, taking people around from these simulated cities.
There was a sort of theatricality to it all that stuck out to me, which I thought was
quite different. And I thought it was refreshing because the core problem with this branch of
AI, which is largely deep learning based and bitter lesson based, is that.
it's an empirical field. Unlike, call it Moore's Law, which was predictive, where you basically
know if you double the number of transistors, you get this much more performance on the chip,
and it's just about pure execution. A.I. is much more empirical. You don't really know when the
model is going to get done training and when it does get training, whether it'll converge or not.
Or even what is converged mean, like, what is good enough? We used to have the turning test,
which obviously we're blown past now. It's a feeling more than it is a set of discrete metrics that
you can really point to. Right. So it made a lot of sense.
to me that he's trying to decouple this idea of progress from a specific timeline.
I see.
Because I just think we're setting ourselves up for every time you ask a deep learning researcher.
So when's that GPD5 model going to show up?
It's like the most frustrating question ever, right?
Because they don't know.
We don't know.
And frankly, sometimes they show up earlier than schedule and show up later.
And by the way, you can look at the stock market's reaction.
It's a prime example of how people have been so conditioned by, I would say, the Steve Jobsian, Apple-like
cadence year on year of like, here's your new iPhone. This is incremental but predictable. I think
forecasting that the tech industry keeps trying to reward. And I think what he's doing is pretty
refreshing, which is saying, look, here's a vision for where we want to go, but it's decoupled.
The second thing on the humanoid piece that I was quite impressed by is actually the quality of
the teleoperation. So everybody's talking about how, oh, this is fake. This is all smoke and mirrors.
It's just people. It's so hard. I was going to say, really hard problem. Why is no one talking about
that? Have you ever tried? I mean, unfortunately. It's so hard.
We were at a company two weeks ago, and they've got these teleop robots.
And the founder was demoing a mechanical arm that he was teleoperating with a game pad.
And he was folding clothes with it.
And I was like, oh, that looks simple.
He's like, here, try it.
It was one of the hardest manipulation things I've ever tried.
And by the way, we tried that with, you know, VR headset with six-d-off motion controllers,
almost harder to do.
Teleoperating something, especially over the Internet, in a smooth fashion with precision, is incredibly hard.
And I don't think people appreciate the degree to which they've really solved that pipeline.
Yeah, I was actually really impressed by that. And, you know, I think that there's huge opportunity for teleop in sort of production applications that will have, like, massive economic benefit.
Right. Even before we have true robots running around managing themselves. Because if you think about there's all these really hard and really dangerous or hard to get to jobs or there's labor different credentials where it's a lot harder to hire people to do certain things in certain locations. And if we can imagine,
in a future where the teleop that we saw last week at the event is something that's widely
available. That's incredible. Imagine not having to go and service a power line, but you can actually
teleop a robot to do that for you, but still have the level of sort of human training and
precision needed to make a really detailed and specific evaluation. The promise of that is really
cool, even before we get to robots. So that was really exciting. Yeah, it's like a stop along this
journey. And so if we talk about that journey, the arc of hardware and software coming together
in maybe a different way than we've seen in the past, just as an example. So Mark famously said
software is eating the world. That was in 2011. We're in 2024. And it does feel like the last
decade has been a lot of traditional software, not so much integrating with the physical world
around us. And so where would you place us in that trajectory? Because we're seeing it with
autonomous vehicles, but I got the sense that's not the only place where this is happening.
Yeah. This is where I spend 95% of my time in all of these industries that are just starting to see the glimmers of what autonomy and sort of software-driven hardware can bring. What's really interesting is just actually a dearth of skills of people who know how to deal with hardware and software together. You have a lot of people that went and got computer science degrees over the last decade and relatively speaking a lot fewer that went and got electrical engineering or mechanical engineering degrees. And we're starting to see the rise of, oh, shoot, we actually
need people who understand not just maybe how the software works in the cloud with Wi-Fi,
where you have unlimited access to compute and you can retry things as many times as you want
and you can ship code releases all day every day. But you actually have kind of a hardware
deployment where you have limited compute in an environment where you maybe can't rely on
Wi-Fi all the time, where you have to tie your software timelines to your hardware production
timelines. Like these are a really difficult set of challenges to solve and write
now, like there just isn't a lot of standardized tooling for developers and how to do that.
So it's interesting. We're starting to see portfolio companies of ours across really different
industries that are trying to use autonomy, whether it's oil and gas or water treatment or
HVAC or defense. They're like sharing random libraries that they wrote to connect to like particular
sensor types because there's not this like rich ecosystem of tooling that exists for the software
world. So we're really excited about what we're starting to see emerge in the space.
Even Elon said when he's talking about these two different products that he's unveiling,
right, Optimus, and then you have the robobans or cybercabs. And those seem like two
completely different things. But he even said in the announcement, he said everything we've
developed for our cars, the batteries, power electronics, advanced motors, gearboxes, AI inference
computer. It all applies to both, right? So you're seeing this overlap. That's super exciting. When I was
watching it, I was just nerding out because my last company,
and he was a computer vision 3D mapping and localization company.
So I'd unfortunately bet spent too much in my life
calibrating LIDAR sensors to our computer vision sensors.
Because our whole thesis when I started back in 2017
was that you could do really precise positioning
just off of computer vision and that you didn't need fancy hardware
like LIDARs or depth sensors.
And to be honest, not a lot of people thought that we could pull it off.
And frankly, I think there were moments when I doubted that too.
And so it was just really fantastic to see that his bet
and the company's bet on computer visions
and a bunch of these sensor fusion techniques
that would not need specialized hardware
would ultimately be able to solve
a lot of the hard navigation problems,
which basically means that the way they're solving precision
is instead of throwing more sensors on the car
is to basically throw more data at the problem.
And so in that sense, data is absolutely eating the world.
And you asked where on the trajectory are we of software eating the world?
And I think we're definitely on an exponential
that has felt like a series of stacked sigmoids.
Often it feels like you're on a plateau,
But a series of plateaus totally make up an exponential if you zoom out enough.
And earlier in the conversation, we talked about the bitter lesson.
A number of other teams in the autonomy space decided to tackle it as a hardware problem,
not a software problem, right?
Where they said, well, more LiDAR, more expensive LiDAR, more GPS, more GPS.
Right.
And Elon's like, you know, actually I want cheap cars that just have computer vision sensors.
And what I'm going to do is use a bunch of the custom, really expensive sensors that many other
companies put on the car, which is at inference time.
And he's just going to use them at train time.
So Tesla does have a bunch of like really custom hardware that's not scalable that drives around the world in their parking lots and simulation environments and so on.
And then they distill the models they train on that custom hardware to a test time package.
And then they send that test time package to their retail cars, which just have computer vision sensors.
And the reality is that's a raw arbitrage, right, between sensor stack.
And it allows the hardware out in the world to be super cheap.
The result there is software is eating the sensor stack out in the world that makes the cost of these cars so much cheaper that you can have a $30,000 fully autonomous car versus $100 plus $1,000 of cars that are fully loaded with these LIDAR sensors and so on.
But I think in order to have the intuition that you can even do that, you really actually have to understand hardware.
If you just understand software and hardware is like a sort of a scary monster that lives over here and maybe you have a special hardware team that.
that does it, it's going to be hard for you to have the confidence to say, no, we can do it this
way. I think you're totally right, which is that the superpower that Tesla has is his
ability to go full stack, right? Because a lot of other industries often segment out software
versus hardware like you're saying. And that means that people working on algorithms and the
autonomy part just treat hardware as like an abstraction, right? You throw over a spec, it's an API,
it's an interface that I program against and I have no idea what's going on.
You don't have to worry about the details.
don't have to worry about it. It doesn't matter. Which, by the way, is super powerful. It's unlocked
this whole general purpose wave of models like chat GPT and so on, right? Because it allows
people who specialize in software to not have to think about the hardware. It's also what's
driven sort of the software renaissance of the last 15 years. Absolutely. Decoupling, right? Composition
and abstraction is sort of the fundamental basis of the entire computing revolution. But I
think when you're like him and you're trying to bring a new device to market, kind of like
what jobs did with the iPhone, by going full stack, you end up unlocking massive efficiency.
of cost. And I think that's what this event may have been lost in the sort of theatricality of
all is the fact that he's able to deliver an autonomous device to retail consumers at a cost
profile through vertical integration that would just not be possible if it was just a software
team buying hardware from somebody else and building on top. Can we talk about those economics,
by the way, just attacking that head on? Both Optimus and Cybercab were quoted as being under
the 30K range. Is that really realistic? And then tied into that.
to what you were saying, we see other autonomous vehicles, which are betting more on the LiDAR and the
sensors, which also have come down in price pretty substantially. My guess is Elon is backing in to
the cost based on what people are willing to pay, and he will do whatever it takes to get those
costs to line up. I mean, it's the same thing he did with SpaceX. He will operate within whatever
cost constraints he needs to operate within, even if the rest of the market or the research
community is telling him it's not possible. Obviously, like a 30K humanoid robot is way less than
what most production industrial robotic arms cost today, which I think are more in the 100K
range for the ones that are used in like the high-end factory. So if he can get it down to 30K,
that's really exciting. I also don't necessarily think you need even a 30K humanoid robot to
accomplish a wide swath of the automation tasks that would pretty radically transform the way
our economy functions today.
Yeah, I think Aaron's right in that there's probably a top-down director,
which is do whatever it takes to get into the cost footprint.
This car has to cost $30K.
Right.
But I think if you do a bottoms-up analysis,
I don't think you end up too far because actually if you just break down the kind of bomb
on a Tesla Model 3, you're not dramatically far off from the sensor stack you need
to get to a $30,000 car, right?
This is the beauty of solving your hardware problems with software.
You don't need a $4,000 scanning a lighter on the car.
So I think on the cyber cab, I feel much more confident that the cost footprint is going to fall in that range because it's, frankly, we kind of have at least an ancestor on the streets, right?
The thing that gets prices up is custom sensors because it's really expensive to build custom sensors in short production runs.
And so you either have scale of manufacturing like an Apple and you make a new CMOS sensor or a new face ID sensor and you get cost economies of scale because you're shipping more like 30 million devices in your first run.
or you just lean on commodity sensors from the past era
and you tackle most of your problems in software,
which is what he's doing.
And to that point, when he's betting on software,
another interesting thing that he announced
was really overspecking these cars
to almost change the economics potentially
based on the fact that those cars could be used
for distributed computing.
To your point, Ange, if you put a bunch of really expensive sensors
on the car, you can't really distribute the load of that
in any other way than driving the car, right?
But if you actually have this computing layer, that's, again, in his case, he's saying he's planning to over spec, that actually can fundamentally change, like, what this asset is. And you kind of saw the same thing even with Tesla's today where he's talking about this distributed grid, right?
Where all of a sudden, these large batteries are being used not just for the individual asset. So do you have any thoughts on that idea or if we've seen that elsewhere?
He was a bit skimpy on details on that. But I think he did say that the AI5 chip is over spec. It's probably going to be four to five times.
more powerful than the HW4, which is their current chip,
it's going to draw four times more power,
which probably puts it in that like 800 watts or so range,
which for context, your average hair dryer is at about 1,800 watts.
I mean, it's hard to run power on the edge.
But I think what he said was something to the effect of like,
your car's not working 24 hours a day.
So if you're driving, call it eight hours a day in L.A. traffic.
God bless whoever's having to do that.
For real.
Hopefully they're using self-driving.
One would hope.
Actually, he opened that his pitch with a story about driving to El Segundo and he's saying you can fall asleep and wake up on the other side.
But I think the t-shirt size he gave was about 100 gigawatts of unallocated inference compute just sitting out there in the wild.
And I think his short hand for it was like the AWS of AI.
He's got this idea of this distributed swarm of unutilized inference computers.
And it's a very sexy vision.
I really want to believe in it.
Ground us on, is this realistic?
Well, I know. I think it's realistic for workloads that we don't know yet in the following
sense, right? That the magic of AWS is that it's centralized. And it abstracts away a lot of
the complexity of hardware footprints for developers. And by centralizing almost all their
data centers in single locations with very cheap power and electricity and cooling, these
clouds are able to pass on very cheap inference costs to the developer. Now, what he's got to figure
out is how do you compensate for that in a decentralized fashion? And I think we have kind of
prototypes of this today, like there are these vast decentralized clouds, I think one is literally
called vast, of people's unallocated gaming rigs. People have millions of Nvidia 490 gaming cards
sitting on their desks that aren't used. And historically, those have not yet turned into
great businesses or high-utilized networks because developers are super sensitive to two things,
cost and reliability. And by centralizing things, AWS is able to ensure very high uptime and
reliability, whereas somebody's GPU sitting on their...
Maybe available.
Maybe they're driving to Elsinco.
Right.
Right.
And there are just certain things, especially with AI models that are hard to do on
highly distributed compute where you actually need good interconnect and you need
things to be reasonably close to each other.
Maybe in his vision, there's a world where you have optimist robots in every home.
And somehow your home optimist robot can take advantage of additional compute or additional
inference with your Tesla car that's like sitting outside in your driveway.
who knows. Right. Okay, well, this event clearly was focused on different models that are
consumer-facing. So again, Cybercab, that's for someone using an autonomous vehicle. Optimus is
a human-eyed robot probably in your home. But, Erin, you've actually been looking at the
hardware software intersection in a bunch of other spaces, right? And as you alluded to earlier,
maybe different applications with better economics at least today. I think long term, there's no market
bigger than the consumer market. So everyone having a robot in their home and a Tesla car in their
driveway that's also a robotaxie has huge economic value. But that's also a really long-term
vision. And there's just so much happening in autonomy that's taking advantage of the momentum
and the developments that companies like Tesla have put forward into the world over the last decade
that actually have the potential to have meaningful impact on our economy in the short term. I think
the biggest broad categories for me are largely the sort of dirty and unsexy industries that have
very high cost of human labor often because of safety or location access, whether that's an oil
rig out in the middle of Oklahoma somewhere, that's three hours drive from the New York City,
whether that's a mine somewhere in rural Wyoming that freezes over for six months out of the
year so humans can't live there and mine, whether that's a battlefield where you're
So we're starting to see autonomous vehicles go out and clear bombs and mines from battlefields to protect human life.
There's so many different use cases for a lot of this underlying technology that are really starting to see the light of day.
So very excited about that.
And as we think about that opportunity, you've also talked about this software-driven autonomy stack.
So as you think about the stack, what are the layers?
Can you just break that down?
Yeah, sure.
So across whether it's a self-driving car or sort of an autonomous control system, we're seeing,
the stack break down into pretty similar categories. So first is perception. You have to see the world
around you, know what's going on, be able to see if there's a trash can, be able to understand
if there's a horizon, if you're a boat. The second is something Ongo knows really well, which is
location and mapping. So, okay, what do I see? How do I find out where I am within that world
based on what I can see and what other sensors I can detect, whether it's GPS, which often isn't
available in battlefields or in warehouses, et cetera.
The third is planning and coordination.
So that's, okay, how do I take a large task and turn it into a series of smaller tasks?
So what is more of an instant reaction?
I don't have to really think about how to take a drink of water, but I might have to think
about how to make a glass of lemonade from scratch.
So how do I think about compute across those different types of fridge?
regimes when something is more of an instinct versus when something has to be sort of taken down and processed into discrete operations. And then the last one is control. So that's like how does my brain talk to my hand? Like how do I know what are the nerve endings doing in order to pick up this water bottle and take a drink out of it? And that's a really interesting kind of field that's existed for decades and decades. But for the first time, probably since the 70s, we're starting to see really interesting stuff happen in the space of controls around autonomy and robotics.
And I would say, like, all of these are pre-existing areas.
None of this is wildly new, but I think in the last two years, especially with everything
that's happening with deep learning video language models, broadly speaking, AI has pushed
every single one of these kind of to its limit and to a new state of the art.
And there just aren't tools that exist to tie all that together.
So every single robotics company, every single autonomous vehicle company is basically
like rebuilding this entire stack from scratch, which we see as.
investors as a really interesting opportunity as the ecosystem evolves.
And as you think about that ecosystem, people kind of say that as soon as you touch hardware,
you're literally working on hard mode compared to just a software-based business.
So what are the unique challenges, even with maybe that AI wave today that's pushing things ahead?
How would you break down what becomes so much harder?
I think Ange touched on this a little bit before, but the more you can commoditize the hardware
stack, the better.
So the most successful hardware companies are the ones that aren't necessarily inventing
a brand new sensor but are just taking stuff off the shelf and putting it together. But still,
like, tying everything together is really hard. Like, when you think about releasing a phone,
for example, Apple has a pretty fast shipping cadence, and they're still releasing a new phone
only every once a year. So you have to essentially tie a lot of your software timelines to
hardware timelines in a way that doesn't exist when you can just sort of ship whenever you want
in a cloud. If you need a new sensor type or you need a different kind of computer, you know,
construct or you need something fundamentally different in the hardware. You're bound by those
timelines. You're bound by your manufacturer's availability. You're bound by how long it takes to
quality engineer and test a product. You're bound by supply chains. You're bound by figuring out
how these things have to integrate together. So the cycles are often just quite a lot slower.
And then the other thing is when you're interacting with the physical world, you get into
use cases that touch safety in a really different way than we think about with pure software alone.
And so you have to design things for a level of like heartiness and reliability that you don't
always have to think about with software by itself. If your chat GPT is a little slow,
it's fine. You can just try again. But if you have an autonomous vehicle that's like driving a tank
on a battlefield autonomously and something doesn't work, like you're kind of screwed. So you have to
have a much higher level of rigor and testing and safety built into your products, which slows down
the software cycles. The holy grail is sort of general purpose intelligence for robotics,
which we still don't have. When you train a general model, you basically get the ability to
build hardware systems that don't have to be particularly customized. And that reduces hardware
iteration cycles dramatically because you can basically say, look, roughly speaking, these are the
four or five commodity sensors you need, the smarter your brain, so to speak, the less specialized
your appendages have to be. And I think what a number of really talented teams are trying to
solve today is, can you get models to generalize across embodiments, right? Can you train a model
that can work seamlessly on a humanoid form factor or a mechanical arm, a quadruped, whatever
it might be? And I'm quite bullish that it will happen. I think the primary challenge there
that teams are struggling with today is the lack of really high quality data, right? The big on
known is just how much data, both in quantity and quality, do you really need to get models
to be able to reason about the physical world spatially in a way that abstracts across any
hardware? I'm completely convinced that once we unlock that, the applications are absolutely
enormous, right? Because it frees up hardware teams, like Aaron was saying, from having to
couple their software cycles from hardware cycles. It decouples those two things. And I think that's
the holy grail. I think what Tesla, the victory of the autonomy team,
there is having realized eight years ago, the efficacy of what we call early fusion foundation
models, right, which is the idea that you take a bunch of sensors at training time and different
inputs of vision, depth, you take in video, audio, you take a bunch of different six-d-off sensors,
and you tokenize all of those, and you fuse them all at the point of training, and you build
an internal representation of the world for that model. In contrast, the LLM world does what's
called late fusion. You often start with a language model that's trained just on language data and
then you duct tape on these other modalities, right, like image and video and so on. And I think
the world has now started to realize that early fusion is the way forward. But of course, they
have an eight-year head start. And so I get really excited when I see teams either tackling the data
sort of challenge for general spatial reasoning or teams that are taking these early fusion
foundation model approaches to reasoning that then allow the most talented hardware teams to focus
really on what they know best.
Where are these companies getting training data?
You mentioned Tesla, for example, yes, we've had cars on the road, tons of them with these
cameras and sensors.
I still think that one of the smartest things Elon did was turn on full self-driving
for everybody for like a month-long trial period last summer.
I have a Tesla and I turned it on for my free month and it was like a life-changing experience
using and I obviously couldn't get rid of it.
And so now, not only do I now pay for full self-driving, but I also, I'm just, I'm giving him all my data.
So to me, that's really clever.
And so I'm curious if you talk about some of these other applications, do they have the number of devices, or in this case, cars for Tesla, capturing this data?
Or how else are we going to get this spatial data?
This is the big race in robotics right now.
I think there are several different approaches.
Some people are trying to use video data for training.
Some people are investing a lot in simulation and creating digital 3D world.
And then there's a mad rush for every kind of generated data that you could possibly have.
So whether that's robotic teleoperated data, whether that's robotic arms and offices,
most of these robotics companies have pretty big outposts where they're collecting data internally.
They're giving humanoids to their friends to have in their homes.
It's a yes-and scenario right now where everyone is just trying to get their hands on data.
literally however they can.
I think it's the Wild West, but if you're Tesla, then the secret weapon you have is
you've got your own factories, right?
So the Optimus team has a bunch of humanoids walking around the factories, constantly learning
about the factory environment, and that gives them this incredible self-fulfilling sort
of compounding loop.
And then, of course, he's got the Tesla fleet, like Aaron was singing earlier with FSD.
I'm proud to have been a month one subscriber for it.
And I'm happy that I'm contributing to that training cycle because it makes my Model X smarter
next time around. So the challenge then is for companies that don't have their own full stack
sort of fully integrated environment, right, where they don't have deployments out in the field.
And to Aaron's point, you can either take the simulation route for that and say we're going to
create these sort of synthetic pipelines, or we're seeing this, yeah, huge buildout of
teleop fleets. Like with language models, you had people all around the world in countries
showing up and labeling data. You have teleop fleets of people piloting mechanical arms halfway
around the world. I think there's an interesting sort of third new
category of efforts we're tracking, which is crowdsourced coalitions, right? So an example of this
is the deep mind team put out this maybe a year and a half ago, robotics data set called RTX,
where they partnered with a bunch of academic labs and said, hey, you send us your data. We've got
compute and researchers. We'll train the model on your data and then send it back to you. And what's
happening is there's just different labs around the world who have different robots of different kinds.
Some are arms, some are quadruped, some are bipeds. And so instead of needing all of those to be
centralized in one place, there's a decoupling happening where some people,
people are saying, well, we'll specialize in providing the compute and the research talent,
and then you guys bring the data. And then it's a give-to-get model, right, which we saw in
some cases with the internet early on. And Vita is an example of this where their research team
is stacking a bunch of robots in-house. So they instead partnering with people like
pharma labs who have arms doing the betting and wet lab experiments and saying, you send us the
data. We've got a bunch of GPUs. We've got some talented deep learning folks. We'll train the model,
send it back to you. And I think it's an interesting experiment. And there's reason to believe this
sort of give-to-get model might end up actually having the highest diverse.
of data, but we're definitely in sort of full experimentation land right now. Yeah, and my guess is
we'll need all of it. So it sounds like data is a big gap, and it sounds like some builders are
working on that. But where would you guys like to see more builders focused in this like hardware
software arena, especially because I do think there are some consumer-facing areas where people
are drawn to. They see an event like this and they're like, oh, I want to work on that.
Yeah, I'm pretty excited about the long tail of really unsexy industries that have outsized
impact on our GDP and are often really critical industries, where people haven't really been
building for a while, things like energy, manufacturing, supply chain, defense. These industries
that really carry the U.S. economy, where we have underinvested from a technology perspective,
probably in the last several decades, are poised to be pretty transformed by this sort of hardware
software melding and autonomy. I'd love to see more people there. I'm very excited for all the
applications they're in talked about. And I think to unlock those, we really need a way to solve
this data bottleneck. So startups, builders who are figuring out really novel ways to collect
that data in the world, get it to researchers, make sense of it, curate it. I think that's
sort of a fundamental limiter on progress across all of these industries. We just need to sort of 10x
the rate of experimentation in that space. All right, that is all for today. If you did make it
this far, first of all, thank you. We put a lot of thought into each of these episodes, whether
guests, the calendar tetras, the cycles with our amazing editor, Tommy, until the music is just
right. So if you like what we put together, consider dropping us a line at rate thispodcast.com
slash A16C. And let us know what your favorite episode is. It'll make my day, and I'm sure Tommy's
too. We'll catch you on the flip side.