No Priors: Artificial Intelligence | Technology | Startups - The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn
Episode Date: March 20, 2025This week on No Priors, Elad speaks with Chelsea Finn, cofounder of Physical Intelligence and currently Associate Professor at Stanford, leading the Intelligence through Learning and Interaction Lab. ...They dive into how robots learn, the challenges of training AI models for the physical world, and the importance of diverse data in reaching generalizable intelligence. Chelsea explains the evolving landscape of open-source vs. closed-source robotics and where AI models are likely to have the biggest impact first. They also compare the development of robotics to self-driving cars, explore the future of humanoid and non-humanoid robots, and discuss what’s still missing for AI to function effectively in the real world. If you’re curious about the next phase of AI beyond the digital space, this episode is a must-listen. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @ChelseaFinn Show Notes: 0:00 Introduction 0:31 Chelsea’s background in robotics 3:10 Physical Intelligence 5:13 Defining their approach and model architecture 7:39 Reaching generalizability and diversifying robot data 9:46 Open source vs. closed source 12:32 Where will PI’s models integrate first? 14:34 Humanoid as a form factor 16:28 Embodied intelligence 17:36 Key turning points in robotics progress 20:05 Hierarchical interactive robot and decision-making 22:21 Choosing data inputs 26:25 Self driving vs robotics market 28:37 Advice to robotics founders 29:24 Observational data and data generation 31:57 Future robotic forms
Transcript
Discussion (0)
Hi, listeners. Welcome to No Pryors. This week we're speaking to Chelsea Finn,
co-founder of physical intelligence, a company bringing general-purpose AI into the physical world.
Chelsea co-founded physical intelligence alongside a team of leading researchers and minds in the field.
She's an associate professor of computer science and electrical engineering at Stanford University,
and prior to that, she worked at Google Brain and was at Berkeley.
Chelsea's research focuses on how AI systems can acquire general purpose skills through interactions
with the world. So Chelsea, thank you so much for joining us today on No Pryors. Yeah, thanks for having
me. You've done a lot of really important storied work in robotics between your work at Google
at Stanford, et cetera. So I would just love to hear a little bit firsthand your background in terms
of your path in the world of robotics, what drew you to it initially and some of the work that
you've done. Yeah, it's been a long road. At the beginning, I was really excited about the impact
that robotics could have in the world, but at the same time, I was also really fascinated by
this problem of developing perception and intelligence and machines. And robots embody all of that.
And also there's sometimes there's some cool math that you can do as well that keeps your brain active, makes you think.
And so I think all of that is really fun about working in the field. I started working more seriously in robotics more than 10 years ago at this point at the start of my PhD at Berkeley.
and we are working on neural network control
trying to train neural networks
that map from image pixels
to directly to motor torques
on a robot arm.
At the time, this was not very popular,
and we've come a long way
and it's a lot more accepted in robotics
and also just generally something
that a lot of people are excited about.
Since that beginning point,
it was very clear to me that we could train robots
to do pretty cool things,
but that getting the robot to do,
do one of those things in many scenarios with many objects was a major, major challenge.
So 10 years ago, we were training robots to screw a cap onto a bottle and use a spatula
to lift an object into a bowl and kind of do a tight insertion or hang up like a hanger on
a clothes rack.
And so pretty cool stuff.
But actually getting the robot to do that in many environments with many objects, that's
where a big part of the challenge comes in.
And I've been thinking about ways to make broader data sets, train on the
those broader data sets and also different approaches for learning, whether it be reinforcement learning,
video prediction, imitation learning, all those things. And so, yeah, moved from spent a year at Google
Brain in between my PhD and joining Stanford, became a professor at Stanford, started a lab there,
did a lot of work along all these lines, and then recently started physical intelligence almost a year
ago at this point. So I've been on leave from Stanford for that. And it's been really exciting
to be able to try to execute on the vision that the co-founders that we collectively have and
do it with a lot of resources and so forth. And I'm also still advising students at Stanford as
well. That's really cool. And I guess we started physical intelligence with four other co-founders
and an incredibly impressive team. Could you tell us a little bit more about what physical intelligence
is working on in the approach that you're taking? Because I think it's a pretty unique slant on the whole
old and approach. Yeah. So we're trying to build a big neural network model that could ultimately
control any robot to do anything in any scenario. And like a big part of our vision is that
in the past robotics is focused on like trying to go deep on one application and like developing a
robot to do one thing. And then ultimately gotten kind of stuck in that one application. It's really
hard to like solve one thing and then try to get out of that and broaden. And instead we're really
in it for the long term to try to address this broader problem of physical intelligence in
the real world. We're thinking a lot about generalization, generalists, and unlike other robotics
companies, we think that being able to leverage all of the possible data is very important.
And this comes down to actually not just leveraging data from one robot, but from any robot platform
that might have six joints or seven joints or two arms or one arm. We've seen a lot of
lot of evidence that you could actually transfer a lot of rich information across these different
embodiments and allows you to use data. And also, if you iterate on your robot platform, you don't
have to throw all your data away. I have faced a lot of pain in the past where we got a new version
of the robot and then your policy doesn't work. And it's a really painful process to try to get
back to where you were on the previous robot iteration. So yeah, trying to build general's
robots and essentially kind of develop foundation models that will power.
the next generation of robots in the real world.
That's really cool because, I mean, I guess there's a lot of sort of parallels to the large
language model world where, you know, really a mixture of deep learning, the transformer
architecture in scale has really proven out that you can get real generalizability in different
forms of transfer between different areas.
Could you tell us a little bit more about the architecture you're taking or the approach
or, you know, how you're thinking about the basis for the foundation model that you're developing?
At the beginning, we were just getting off the ground. We were trying to scale data collection. And a big part of that is, unlike in language, we don't have like Wikipedia or an internet of robot motions. And we're really excited about scaling data on real robots in the real world. This is this kind of real data is what has fueled machine learning advances in the past. And a big part of that is we actually need to collect that data. And that looks like teleoperating robots in the physical world. We're also exploring other ways.
of scaling data as well, but the kind of bread and butter is scaling real robot data.
We released something in late October where we showed some of our initial efforts around
scaling data and how we can learn very complex tasks of folding laundry, cleaning tables,
constructing a cardboard box.
Now, where we are in that journey is really thinking a lot about language interaction
and generalization to different environments.
So what we showed in October was the robot in one environment, and it had data in that
environment.
We were able to see some amount of generalization, so it was able to fold shirts that had never seen
before, fold shorts that has never seen before, but the degree of generalization was very
limited.
And you also couldn't interact with it in any way.
You couldn't prompt it and tell you what you want to do beyond kind of fairly basic things
that it saw in the training data.
And so being able to handle lots of different prompts and lots of different environments, and
lots of different environments is a big focus right now. And in terms of the architecture,
we're using transformers. And we are using pre-trained models, pre-trained vision language models.
And that allows you to leverage all of the rich information in the internet. We had a research
result a couple years ago where we showed that if you leverage vision language models, then you
could actually get the robot to do tasks that require concepts that were never in the robots
training data, but we're in the internet. One famous example is that you can pass the Coke can,
to Taylor Swift, or a picture of Taylor Swift, and the robot has never seen Taylor Swift in
person, but the internet has lots of images of Taylor Swift in it. And you can leverage all of
the information in that data and then the weights of the pre-trained model to kind of transfer
that to the robots. So we're not starting from scratch, and that helps a lot as well. So that's a little
bit about the approach, happy to dive deeper as well. That's really amazing. And then what do you
think is the main basis then for really getting to generalizability? Is it scaling data further? Is it
scaling compute. It's a combination of the two. It's other forms of post-training. I'm just sort of
curious, like, as you think through the common pieces that people look at now, I'm sort of curious
what you think needs to get filled in. Obviously, on the, again, the more language model world,
people are spending a lot of time on reasoning modules and other things like that as well. So
I'm curious, like, what are the components that you feel are missing right now? Yeah, so I think
the number one thing, and this is kind of the boring thing, is just getting more diverse robot
data. So for that release that we had in late October last year,
We collected data in three buildings, technically.
The internet, for example, and everything that is fueled language models and vision models
is way, way more diverse than that, because the internet is pictures that are taken by lots of people
and text written by lots of different people.
And so just trying to collect data in many more diverse places and with many more objects,
many more tasks.
So scaling the diversity of the data, not just the quantity of the data is very important.
And that's a big thing that we're focusing on right now, actually bringing our work.
robots into lots of different places and collecting data in it.
As a side product of that, we also learn what it takes to actually get your robot to be operational
and functional in lots of different places.
And that is a really nice byproduct because if you actually want to get robots to work in
the real world, you need to be able to do that.
So that's the number one thing.
But then we're also exploring other things, leveraging videos of people.
Again, leveraging data from the web, leveraging pre-trained models, thinking about reasoning,
although more basic forms of reasoning, in order to, for example, put a dirty shirt into a hamper.
If you can recognize where the shirt is and where the hamper is and what you need to do to accomplish
that task, that's useful, or if you want to make a sandwich, and the user has a particular
request in mind, you should reason through that request. If they're allergic to pickles, you
probably shouldn't put pickles on the sandwich, things like that. So there's some basic things
around there, although the number one thing is just more diverse robot data. And then I think
a lot of the person you've taken the data has really been an emphasis on releasing open source
models and packages for robotics. Do you think that's the long-term path? Do you think it's open
core? Do you think it's eventually proprietary models? Or how do you think about that in the context
of the industry? Because it feels like there's a few different robotics companies now each taking different
process in terms of either hardware only, excuse me, hardware plus software, and they're focused on a
specific hardware footprint. There's software and there's close source versus open source if you're
just doing the software. So I'm sort of curious where in that spectrum, physical intelligence lies.
Definitely. So we've actually been quite open. We've, not only have we open source some of the
weights and release details and technical papers, we've actually also been working with hardware
companies and giving designs of robots to hardware companies. And some people have actually, like, when I tell
people that, sometimes they're actually really shocked that, like, what about the IP? What about,
I don't know, confidentiality and stuff like that? And we've actually made this, made a very
intentional choice around this. There's a couple of reasons for it. One is that we think that the
field, it's really just the beginning. And these models will be so, so much better. And the robots
should be so so much better in a year, in three years. And we want to support the development
of the research. And we want to support the community, support the robots.
so that when we hopefully develop the technology of these generalist models,
the world will be more ready for it.
We'll have better, like, more robust robots that are able to leverage those models,
people who have the expertise and understand what it requires to use those models.
And then the other thing is also, like, we have a really fantastic team of researchers
and engineers and really, really fantastic researchers and engineers want to work at companies
that are open, especially on researchers, where they can get kind of credit for their work and
share their ideas, talk about their ideas.
And we think that having the best researchers and engineers will be necessary for solving this
problem.
The last thing that will mention is that I think the biggest risk with this bet is that it won't
work.
Like, I'm not really worried about competitors.
I'm more worried that no one will solve the problem.
Oh, interesting.
And why do you worry about that?
I think robotics is, it's very hard.
And there have been many, many failures.
the past. And unlike when you're like recognizing an object in an image, there's very little
tolerance for error. You can miss a grasp on an object or like not make, like the difference
between making contact and not making contact in an object is so small. And it has a massive
impact on the, on the outcome of whether the robot can actually successfully manipulate the
object. And I mean, that's just one example. There's challenges on the data side of collecting
data. Well, just anything involving hardware is hard as well.
I guess we have a number of examples now of robots in the physical world.
You know, everything from autopilot on a jet on through to some forms of pick and pack
or other types of robots and distribution centers.
And there's obviously the different robots involved with manufacturing, particularly in automotive.
So there's been a handful of more constrained environments where people have been using them
in different ways.
Where do you think the impact of these models will first show up?
Because to your point, there are certain things where you have very low.
tolerance for error. And there's a lot of fields where actually it's okay or maybe you can constrain
the problem sufficiently relative to the capabilities of the model that it works fine. Where do you
think physical intelligence will have the nearest term impact or in general the field of robotics
and these new approaches will substantiate themselves? Yeah, as a company, we're really focused
on the long-term problem and not at like any one particular application because of the failure
modes that can come up when you focus on one application. I don't know where the first applications
will be. I think one thing that's actually challenging is that typically in machine learning,
a lot of the successful applications of like recommender systems, language models, like image
detection, a lot of the consumers of that of the model outputs are actually humans who could actually
check it and the humans are good at the thing. A lot of the very natural applications of robots
is actually the robot doing something autonomously on its own, where it's not like a human consuming
the commanded arm position, for example. And then,
checking it and then validating it and so forth.
And so I think we need to think about new ways of having some kind of tolerance for mistakes
or scenarios where that's fine or scenarios where humans and robots can work together.
That's, I think, one big challenge that will come up when trying to actually deploy these.
And some of the language interaction work that we've been doing is actually motivated by this challenge
where we think it's really important for humans to be able to provide input for how they want
the robot to behave and what they want the robot to do, how they want the robot to help
in a particular scenario. That makes sense. I guess the other form of generalizability, to some
extent, at least in our current world, is the human form, right? And so some people are specifically
focused on humanoid robots like Tesla and others under the assumption that the world is designed for
people and therefore is the perfect form factor to coexist with people. And then other people
have taken very different approaches in terms of saying, well, I need something that's more specialized
for the home in certain ways or for factories or manufacturing or you name it. What is your view on
kind of humanoid versus not?
On one hand, I think humanized are really cool, and I have one in my lab at Stanford.
On the other hand, I think that they're a little overrated.
And one way to practically look at it is I think that we're generally fairly ball-necked
on data right now.
And some people argue that with humanoids, you can maybe collect data more easily because
it matches the human form factor.
And so maybe it'd be easier to mimic humans.
And I've actually heard people make those arguments.
But if you've ever actually tried to teleoperate a humanoid, it's actually a lot harder to teleoperate than that a static manipulator or a mobile manipulator with wheels.
Optimizing for being able to collect data, I think is very important because if we can get to the point where we have more data than we could ever want, then it just comes down to research and compute and evaluations.
and so we're optimizing for that's one of the things we're kind of optimizing for and so we're
using cheap robots we're using robots that we can very easily develop teleoperation interfaces
for in which you can do teleoperation very quickly and collect diverse data collect lots of data
yeah it's funny there was that viral fake kum Kardashian video of her going shopping with the robot
following her around carrying all of her shopping bags when I saw that I really wanted a humanoid
robot to follow me around if it were as that would be really funny to do that so I'm hopefully
that someday I can use your software to cause a robot to follow me around to do things. So
exciting future. How do you think about the embodied model of development versus not
on some of these things in terms of that's another sort of, I think, set of tradeoffs that some
people are making or deciding between? A lot of the AI community is very focused on just like
language models, vision language models and so forth. And there's like a ton of hype around
like reasoning and stuff like that. Oh, let's create like the most intelligent thing.
I feel like actually people underestimate how much intelligence goes into motor control.
Many, many years of evolution is what led to us being able to use our hands the way that we do.
And there are many animals that they can't do it, even though they had so many years of evolution.
And so I think that there's actually so much complexity and intelligence that goes into being able to do something as basic as like make a bowl of cereal or pour a glass of water.
And yeah, so in some ways I think that actually embodied intelligence or physical intelligence is very cool.
core to intelligence and maybe kind of underrated compared to some of the less embodied models.
One of the papers that I really loved over the last couple years in robotics was your Aloha paper.
And I thought it was a very clever approach.
What is some of the research over the last two or three years that you think has really caused
this flurry of activity?
Because I feel like there's been a number of people now starting companies in this area because
a lot of people feel like now is the time to do it.
And I'm a little bit curious what research you feel was the basis for that shift in
and people thinking this was a good place to work.
At least for us, there were a few things that we felt like were turning points
that felt like where it felt like the field was moving a lot faster
compared to where it was before.
One was the SACAN work where we found that you can plan with language models
as kind of the high-level part and then kind of plug that in with a low-level model
to get a model to do long horizon tasks.
One was the RG2 work, which showed that you could
do the Taylor's Fifth example that I mentioned earlier and be able to plug in
kind of a lot of the web data and get better generalization on robots.
A third was our RTX work where we were, we actually were able to train models across
robot embodiments and significantly, we basically took all the robot data that different
research labs had.
It was a huge effort to aggregate that into a common format and train on it.
And we also, when we trained on that, we actually found that we could,
take a checkpoint, send that model checkpoint to another lab halfway across the country,
and the grad student at that lab could run the checkpoint on their robot,
and it would actually more often than not do better than the model that they had specifically iterated
on themselves in their own lab.
And that was like another big sign that, like, this stuff is actually starting to work
and that you can get benefit across, by pooling data across different robots.
And then also, like you mentioned, I think the Alo work and later the mobile loa work was
work that showed that you can teleoperate and get models to train pretty complicated dexterous
manipulation tasks. We also had a follow-up paper with the shoe lace tying. That was a fun project because
someone said that they would retire if they saw a robot tie shoe laces. So did they retire? They did not
retire. Oh, that's awful. We need to force them into retirement. Whoever that person is, we need to
follow up on that. Yeah. So I think those are a few examples. And so yeah, I think we've seen a ton of
progress in the field. I also, it seems like after we started Pi that that was also kind of
assigned to others that if the experts are really willing to bet on this, then something,
maybe something will happen. So one thing that you all came out with today from Pi was what you
call a hierarchical interactive robot or high robot. Can you tell us a little bit more about that?
So this is a really fun project. There's two things that we're trying to look at here. One is that
if you need to do like a longer horizon task, meaning a task that might take minutes to do,
then if you just train a single policy to like output actions based on images,
like if you're trying to make a sandwich and you train a policy that's just outputting the next motor command,
that might not do as well as something that's actually kind of thinking through the steps to accomplish that task.
That was kind of the first component.
That's where the hierarchy comes in.
And the second component is a lot of the times when we train robot policies, we're just saying, like, we'll take our data, we'll annotate it and say, like, this is picking up the sponge.
This is putting the bowl in the bin.
This segment is, I don't know, folding the shirt.
And then you get a policy that can, like, follow those basic commands of, like, fold the shirt or pick up the cup, those sorts of things.
But at the end of the day, we don't want robots just to be able to do that.
We want them to be able to interact with us where we can say, like, oh, I'm a vegetarian.
Can you make me a sandwich?
oh, and I'm allergic to pickles, so, like, maybe don't include those.
And maybe you also be able to interject in the middle and say, like, oh, hold off on the tomatoes or something.
It's actually kind of a big gap between something that can just follow, like, an instruction, like, pick up the cup,
and something that could be able to handle those kinds of prompts and those situated corrections and so forth.
And so we developed a system that basically has one model that takes as input the prompt,
and kind of reasons through is able to output, like, the next step.
that the robot should follow and that might be, that that's kind of like it's going to tell it to
then the next thing will be pick up the tomato, for example. And then a lower level model that
takes its input, pick up the tomato and outputs the sequence of motor commands for the next
like half second. That's the gist of it. It was a lot of fun because we actually got the robot
to make a vegetarian sandwich or a ham and cheese sandwich or whatever. We also did a grocery
shopping example and a table cleaning example. And I was excited about it at first because it was just
like cool to see the robot be able to respond to different problems and do these challenging
tasks. And second, because it actually seems like a, like the right approach for solving the
problem. On the technical capability side, one thing I was wondering about a little bit was
if I look at the world of self-driving, there's a few different approaches that are being taken.
And one of the approaches that is the more kind of waymo-centric one is really incorporating a
variety of other types of sensors besides just visions. We have LIDAR and a few other things as ways
to augment the self-driving capabilities of a vehicle.
Where do you think we are in terms of the sensors that we use in the context of robots?
Is there anything missing?
Is there anything we should add?
Or there are types of inputs or feedback that we need to incorporate that haven't been
incorporated yet?
So we've gotten very far just with vision, with RGB images even.
And we typically will have one or multiple external kind of what we call base cameras
that are looking at the scene.
And also cameras mounted to each of the room.
wrists of the robot. We can get very, very far with that. I would love if skin, if we could give
our robot skin. Unfortunately, a lot of the tactile sensors that are out there are either far less
robust than skin, far more expensive, or very, very low resolution. So there's a lot of kind of challenges
on the hardware side there. And we found that actually that mounting RGB cameras to the wrists
ends up being very, very helpful and probably giving you a lot of the same information
that tactile sensors can give you.
Because when I think about the set of sensors that are incorporated into a person, obviously
to your point, there's the tactile sensors effectively, right?
And then there's heat sensors.
There's actually a variety of things that are incorporated that people usually don't
really think about much.
Absolutely.
And I'm just sort of curious, like, how many of those are actually necessary in the context
of robotics versus not?
What are some of the things we should think about?
Like, just if we extrapolate off of humans or animals.
or other, you know.
It's a great question.
I mean, for the sandwich making, you could argue that you'd want the robot to be able
to taste the sandwich to know if it's good or not.
Or smell it at least, you know.
Yeah, I've made a lot of arguments for smell to Sergei in the past because there's a lot
of nice things about smell, although you've never actually attempted it before.
Yeah.
In some ways, the redundancy is nice.
For example, and I think like audio, for example, like a human, if you hear something
that's unexpected, it can actually kind of alert you to something. In many cases, it might actually
be very, very redundant with your other sensors, because you might be able to actually see
something fall, for example, and that redundancy can lead to robustness. For us, it's not
currently not a priority to look into these sensors, because we think that the bottleneck right
now is elsewhere, is on the data front, is on kind of the architectures and so forth.
The other thing that I'll mention is actually right now, we're most, like, our policies right now
do not have any memory, they only look at the current image frame. They can't remember even
half a second prior. And so I would much rather add memory to our models before we add other sensors.
We can have commercially viable robots for a number of applications without other sensors.
What do you think is a time frame on that?
I have no idea. Yeah. Some parts of robotics that make it easier than self-driving and some parts
that make it harder. On one hand, it's harder because you're not just, like, it's just a much higher
dimensional space. Even our static robots have 14 dimensions of seven for each arm. You need to be
more precise in many scenarios than driving. We also don't have as much data right off the bat.
On the other hand, with driving, I feel like you kind of need to solve the entire distribution
to be have anything that's viable. You have to be able to handle an intersection at any time of day
or with any kind of possible pedestrian scenario or other cars and all that.
Whereas in robotics, I think that there's lots of commercial use cases
where you don't have to handle this whole huge distribution.
And you also don't have as much of a safety risk as well.
That makes me optimistic.
And I think that also like all the results in self-driving have been very encouraging,
especially like the number of Waymos that I see in San Francisco.
Yeah, it's been very impressive to watch them scale up usage.
I think I found striking about the self-driving world is, you know, there was two dozen
startups started roughly, I don't know, 10 to 15 years ago around self-driving.
And the industry is largely consolidated, at least in the U.S., and obviously the China market's
a bit different, but it's consolidated into Waymo and Tesla, which effectively were two incumbents,
right? Google and Tesla was an automaker.
And then there's maybe one or two startups that either spacked and went public or still kind
of working in that area.
and then most of us kind of fallen off, right?
And the set of players that existed at that starting moment 10, 15 years ago
was kind of the same players that ended up actually winning, right?
There hasn't been a lot of dynamism in the industry other than just consolidation.
Do you think that the main robotics players are the companies that exist today?
And do you think there's any sort of incumbency bias that's likely?
A year ago, like, it would be completely different.
And I think that we've had so many new players recently.
I think that the fact that self-driving was,
like that suggested that it might have been a bit too early 10 years ago for and I think that arguably
it was like I think deep learning has come a long long way since then and so I think that that's
also part of it and I think that the same with robotics like if you were to ask you 10 years ago
or even even five years ago honestly I think it would be too early I think the technology like wasn't
there yet we might still be too early for all we know I mean it's a very hard problem and
like how hard self-driving has been is I think is a testament to how hard it is to build
intelligence in the physical world. In terms of like major players, there's a lot of things
that I've really liked about the startup environment and a lot of things that were very
hard to do when I was at Google. And Google is an amazing place in many, many ways. But like,
as one example, taking a robot off campus was like almost a non-starter just for code security
reasons. And if you want to collect diverse data, taking robots off campus,
is valuable. You can move a lot faster when you're a smaller company when you don't have
kind of restrictions, red tape, that sort of things. The really big companies, they have a ton of
capital so they can last longer, but I also think that they're going to move slower too.
If you were to give advice to somebody thinking about starting a robotics company today,
what would you suggest they do or where would you point them in terms of what to focus on?
I think the main advice that I would give someone trying to start a company would be to try to learn
as much as possible, uh, quickly. And I think that actually like trying to deploy quickly and
learn and iterate quickly, um, that's probably the, the main advice and try to, yeah,
do, like actually get the robots out there, learn from that. Um, I'm also not sure if I'm the
best person to be giving startup advice because I've only been an entrepreneur myself for 11 months.
But, uh, yeah, that's probably their advice. That's cool. I mean, you're running an incredibly
exciting startup. So I think you have, uh,
full ability to suggest stuff to people in that area for sure.
I've heard a number of different groups doing is really using observational data of people
as part of the training set.
So that could be YouTube videos.
It could be things that they're recording specifically for the purpose.
How do you think about that in the context of training robotic models?
I think that data can have a lot of value, but I think that by itself, it won't get you
very far.
And I think that there's actually some really nice analogies you can make where, for example,
if you watch like an Olympic swimmer, swimmer race, even if you had their strength, just their
practice at moving their own muscles to do the, to accomplish what they're accomplishing
is like essential for being able to do it.
Or if you're trying to learn how to hit a tennis ball well, you won't be able to learn it
by kind of watching the pros.
Now, maybe these examples seem a little bit contrived because they're talking about like experts.
The reason why I make those analogies is that we humans are experts at motor
control, low-level motor control already for a variety of things that our robots are not.
And I think the robots actually need experience from their own body in order to learn.
And so I think that it's really promising to be able to leverage that form of data,
especially to expand on the robot's own experience, but it's really going to be essential to
actually have the data from the robot itself too.
In some of those cases, is that just general data that you're generating around that robot,
or would you actually have it mimic certain activities or how do you think about the data
generation, because you mentioned a little bit about the transfer and generalizability.
It's interesting to ask, well, what is generalizable or not and what types of data are
and aren't and things like that?
I mean, when we collect data, we have, it's kind of like puppeteering, like the original
Aloha work.
And then you can record both the actual motor commands and the sensor, like the camera images.
And so that is the, like, experience for the robot.
And then I also think that autonomous experience will play a huge role, just like we've seen in language models after you get an initial language model.
If you can use reinforcement learning to have the language model bootchrap on its own experience, that's extremely valuable.
Yeah.
And then in terms of what's generalizable versus not, I think it all comes down to the breadth of the distribution.
It's really hard to quantify or measure how broad the robot's own experiences.
and there's no way to categorize the breadth of the tasks,
like how different one task is from another,
how different one kitchen is from another, that sort of thing.
But we can at least get a rough idea for that breadth
by looking at things like the number of buildings or the number of scenes,
those sorts of things.
And then I guess we talked a lot about humanoid robots and other sort of formats.
If you think ahead in terms of the form factors that are likely to exist in N-years
as this sort of robotic future comes into play, do you think there's sort of one singular form
or there are a handful? Is it a rich ecosystem, just like in biology? How do you think about
what's going to come out of all this? I don't know exactly, but I think that my bet would be on
something where there's actually a really wide range of different robot platforms. I think
Sergey, my co-founder, likes to call it a Cambrian explosion of different robot hardware types and so forth.
once we actually can have the technology that can, the intelligence that can power all those
different robots.
And I think it's kind of similar to, like, we have all these different devices in our kitchen,
for example, that can do all these different things for us.
And rather than just like one device that cooks the whole meal for us.
And so I think we can envision like a world where there's like one kind of robot arm
that does things on the kitchen that has like some hardware that's optimized for that
and maybe also optimized for it to be cheap for that particular use case.
And another hardware that's kind of designed for like folding clothes or something like that,
dishwashing those sorts of things.
These are all like speculation, of course.
But I think that a world like that is something where, yeah, it's, I think different from what a lot of people think about.
In the book The Diamond Age, there's sort of this view of like matter pipes going into homes
and you have these 3D printers that make everything for you.
And in one case, you're like downloading schematics and then you 3D.
print the thing, and then people who are kind of bootlegging some of this stuff end up with
almost evolutionarily based processes to build hardware and then select against certain functionality
is the mechanism by which to optimize things. Do you think a future like that, is it all likely,
or do you think it's more just, hey, you make the foundation model really good, you have a couple
form factors, and you know, you don't need that much specialization if you have enough
generalizability in the actual underlying intelligence? I think a world like that is very possible.
And I think that you can make a cheaper hardware piece of hardware if you are optimizing for a particular use case and maybe it would be like also be a lot faster and so forth.
Yeah, obviously very hard to predict.
Yeah, it's super hard to predict because one of the arguments for a smaller number of hardware platforms is just supply chain, right?
It's just going to be cheaper at scale to manufacture all the subcomponents and therefore you're going to collapse down to fewer things because unless there's a dramatic cost advantage of those fewer things will be more easily scale.
scalable, reproducible, cheap to make, et cetera, right, if you look at sort of general hardware
approaches. So it's an interesting question in terms of that trade-off between those two tensions.
Yeah, although maybe we'll have robots in the supply chain that can manufacture any customizable
device that you want. It's robots all the way down. So that's our future.
Well, thanks so much for joining me today. It was a super interesting conversation and we
covered a wide variety of things. So I really appreciate your time. Yeah, this is fun.
Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel if you want to
see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new
episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.