No Priors: Artificial Intelligence | Technology | Startups - Building the factories of the future with Covariant CEO Peter Chen
Episode Date: January 25, 2024Building adaptive AI models that can learn and complete tasks in the physical world requires precision but these AI robots could completely change manufacturing and logistics processes. Peter Chen, th...e co-founder and CEO of Covariant, leads the team that is building robots that will increase manufacturing efficiency, safety, and create warehouses of the future. Today on No Priors, Peter joins Sarah to talk about how the Covariant team is developing multimodal models that have precise grounding and understanding so they can adapt to solve problems in the physical world. They also discuss how they plan their roadmap at Covariant, what could be next for the company, and what use case will bring us to the Chat-GPT moment for AI robots. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @peterxichen Show Notes: (0:00) Peter Chen Background (0:58) How robotics AI will drive AI forward (3:00) Moving from research to a commercial company (5:46) The argument for building incrementally (8:13) Manufacturing robotics today (12:21) Put wall use case (15:45) What’s next for Covariant Brain (18:42) Covariant’s customers (19:50) Grounding concepts in Ai (25:47) How scaling laws apply to Covariant (29:21) Covariant’s driving thesis (32:54) the Chat-GPT moment for robotics (35:12) Manufacturing center of the future (37:02) Safety in AI robotics
Transcript
Discussion (0)
Hi, listeners.
Welcome to another episode of No Pryors.
This week, I'm joined by Peter Chem, the co-founder and CEO of Covariant, a robotics startup that is developing AI robots.
Before he started Covariant, Peter was a research scientist at OpenAI and a researcher at the Berkeley AI Research Lab, where he focused on reinforcement learning, metal learning, and then supervised learning.
He is a prolific publisher and now a founder.
I'm so excited to have you on today to talk about what's going on in robotics.
Welcome, Peter.
Thanks, Sarah.
It's great to be here.
There are many exciting reasons to be here.
One is I have been a frequent listeners of the podcast.
And the second one is just because of the name, like I just have to be on this show.
So it's great to be here.
Right.
Let's go establish some priors for everybody in a very unknown landscape, right?
Can we start with just why you were drawn to robotics and the beginning of your research journey?
Yeah.
When I was working on research at both UC Berkeley as part of my PhD and at OpenA.I, there were two topics that were particularly exciting to me.
One topic is, like, as you have introduced, unsupervised learning, like, how can we build models that learn from vast amount of data?
And we now more colloquially known this as generative AI, because we train this large models on large.
amount of text, images, videos, and you learn from them in an unsupervised manner.
That topic has always been very interesting to me because if you want to train very capable
AIs, you want to have a lot of data. And where you can get a lot of data is through this kind
of unsupervised data set. And then the second topic that was really interesting to me was
reinforcement learning. Like it's not just building models that understand, but building models
that can make decisions and reinforcement learning teach these models to make decisions by having them
make trials and errors and learn from the better decisions and do less of the worst decisions.
And robotics is just such a great combination of these fields in order to build really capable
robots. They need to really understand the world in a very, very robust way. And they are not just
passive agents that just understand text or what's in an image. They actually need to take actions in the
real world and the consequences do matter. And so we found robotics to be such a great way to
both utilize the advances in AI, but also we think of it as a way to also propel AI forward.
Like this is where you get the grounded data. This is where you get that embodied data of not
just AI that is trained on browsing the Internet, but AI that is trained with physical
interactions with the world. And so we also believe robotics would be a key way to advance
AI. That makes sense. You were at places that are great places to do research. Why did you
decide to start a commercial company? It's a really good question. I mean, there are a lot of
companies that are funded by prior PhDs that are kind of the classic journey of,
there's a technology that was built in a lab environment and it got to enough a level of maturity
that we should start to commercialize it in the real world. That was kind of not the journey of
covariian. When we started covariant, there was not AI that was good enough to make robots do
useful things commercially. And so it was not a classic journey of technology developed in academia
and then transition to a commercial landscape. The key insight that we had at that time when we left
OpenAI in 2017 to start Covariant was the future of AI is going to be the future of foundation
models. These models that are truly multitasked.
learn from large amount of data
and as such, be more generalizable.
They can solve new tasks more easily
and are also more capable
at every single one of the tasks
because of the transfer you get across tasks.
We just had early conviction
that there was the path to build AI
and that is also going to be true
for the physical world, for robotics.
But there's one big problem,
which is you have no data set
to build robotics foundation model.
There's no data set that you can build
this AI that understands the physical world and take actions in the physical world.
And so in order to build this foundation models for robotics,
you really have to build a company that can collect data to do it.
And the only way to collect enough data is to build fleets of robots
that are actually creating value for customers so that you can collect those data in production.
Because even if you try to scale up data collection in a lab environment,
There's a limit on how much you can do that.
In that perspective, we strongly believe in the Tesla approach,
like where they have the most self-driving car data,
because they ship a great cost that people want to drive
and a good enough entry-level autopilot that people are willing to use it,
and they're creating value for their customers,
like customers use their products,
and those data that they collect can allow them to build much more capable models and AI.
And so why we left Open AI,
and academia to start covariant is very much disbelief that in order to build foundation models
for robots, you have to have a lot of data. And in order to have a lot of data, you have to
build autonomously working systems for customers. And the only way to do that is to build a
company to serve those customers. Yeah, there's a really interesting tension if you're trying to
build a, let's say, AI capability that doesn't exist yet because there's no model that is good
enough of how much you invest in that upfront versus deliver the product that already exists in the
world, right? Like, you could just go build a bunch of robots and deploy them on mass. Or, you know,
if we draw a analogy to the prior generation or current existing generation of autonomy companies.
Like, we were, you know, I involved early in my prior role in Aurora and Neuro and then I was
personal investor in Kodiak, right? Like a lot of these companies, you were trying to build.
a brain is an alternative to the Tesla approach. And I think the economics of collect as you go
is getting very, very compelling just in terms of how expensive it is to try to sequence it
the other way. Yeah, like this definitely needs to be a incremental approach. Like you have to
just find like the right sequence of what is the technology events that I want to build now
that enable enough of a products that I can deliver, which then in turn allow you to build more
capable models that then in turn like a larger service of area.
And this is like, I mean, we, we have seen this play out in the non-robotics world as well.
Right.
Like if we think about Open AI and Thorpeg, cohere a lot of these big language models players,
like the models that they have are not fully general language models yet, right?
But they are good enough that can solve a large section of problems that is worth
productionizing them
getting commercial value
out of it, which then in turn
allow you to build the next incrementally
better system. And I think
of it as the same kind of
roadmaping
exercise that you have to do in autonomy.
You cannot just go straight
to the full
general physical AGI
at the beginning. You have to build
something that represents
a justifiable R&B spend as well as
timeline that you can justify. But that allows you to build something that is valuable that you
can ship to customers. And from that process, you get more data, you get more learning that then
in turn allow you to build a next generation model. So we think of it as very much an iterative
approach and having real products and having real customers allow you to ground that approach
as opposed to just be in a philosophical debate of like how we build this super, super general thing
that is very far out in the future. Then I think the right way to start actually be to ground the
conversation and kind of like the application landscape. Can you walk us through the sort of limitations
of robotics in warehousing and manufacturing that are commonplace right now and how much
intelligence these robots have? Robots are extremely common nowadays. So what we typically work on
are robotic arms. So think of these as six axes, seven axes, robotic arms that can do
very flexible movements. They are super precise. They're super fast and super doable and very cheap. And
very cheap. Lots of factories around the world have robots, but the challenge is like 99 plus
percent of the robots that are deployed in the world are dumb robots. These robots are pre-programmed to do
the same thing again and again, and they don't really have any kinds of intelligence that can
adapt to new circumstances, communicate with people, and change what they do on the flight. And so think
of robotics that exist today are extremely rigid. And so really the problem that we are solved,
is we're not trying to make the existing dumb robot use cases better, right?
Like we're not trying to say, oh, instead of manually programming this robot,
you could just have an AI that program that robot.
We're not talking about that.
Like, we're really talking about, like, opening up a couple orders of magnitude
or more use cases where the robots actually need to be smart.
Like, they need to adapt what they do based on the scenario that is presented to them, right?
So, like, the good way to visualize this is, on one hand, like, think about a robot, for example, in a Tesla factory that is handling a car body.
Okay, this is, this has, this is a very incredible feat of engineering that can move, like, multi-ton object very fast, very precisely, but it's just doing the same thing again and again.
Like, and then imagine another robot in a e-commerce warehouse that has hundreds of thousands of,
unique items that it has to distinguish pickup and pack carefully into a box that gets shipped
to you. That's a very different kinds of diversity that we're talking about. And so when we think
about building AI for robots, when we think about building foundation models for robots,
we're thinking about really lifting robotics as a category from this former category of just
being able to do repeated things to this category of really being able to handle diversity.
of environments, changes in the environments, and being able to understand what's around it and make
intelligent decisions and actions to handle a diverse set of circumstances. And we think like this
would enable really a whole different wave of robotics that is not how robotics is used today. And for
covariance specifically, we are starting from logistics and warehouses as an industry that we
focused on. So this is, think of it as the explosive demand that is driven by the growth of
e-commerce. There's a lot of complexities that's been injected into the logistics and supply chain.
And at the same time, coupling that with demographics change, changing immigration landscape makes
few and few people want to do this kind of warehouse jobs, like drive an hour and a half to
the suburb and then have to work through the midnight. Like, these are not the kind of job.
that people want to do and our customers have extremely high turnover rate, like an average
warehouse that we serve have typically more than 100% year over year turnover rate. And so like these
are the type of places that we have an extreme shortage of people that want to do those kind of
jobs. And yet at the same time, there are no prior robots that can solve pick, pack, ship
in warehouses because like traditional robots are just machines that do the motion
that you program it to do repeatedly.
But here, you actually need systems that's actually adaptive
and do it at a very high level of reliability.
Can you describe, like, how we should imagine the physical?
Like, you obviously have coherent brain,
but then you have the physical instantiation.
Like, what's a put-wall just for our listeners?
Yeah, so a common use case that we have for our customers
is what we typically call a put-wall use case.
A put-war is a turn that is used in e-commerce fulfillment.
and which is like when you click a button to buy something online and and then the box
show up to your door and you might wonder like, well, how is that done?
Well, there's a complex sets of operations that's happening in the background and a put war
is one step of that.
And this step is typically used to sort a mix of customer orders to different customers,
like let's say both you and I have order a new generation of iPhone.
And then a robot would be sitting there and picking up one iPhone and say, oh, this one should go to Sarah and this one should go to Peter.
If you think about what that robot needs to do, like the robot needs to have an incredibly great ability to grasp items without damaging it and have the accurate ability to identify what is the item and then route them to the appropriate customer, like in this case, like either you or me.
And so put war, you can think of it as a sortation mechanism.
You can think of it as a physical router that exists in the world.
So instead of thinking about network router that sends digital packets around,
you can think about put war as a physical router that sends goods to different places.
Is it fair to say that identification and routing are more solved problems than grasping?
I would say identification and routings are typically more considered more solved problem than grasping.
like because if you there are other like more mechanical way to solve those problems like you can design a piece of conveyor that like if you always put an item to the same place then you can route it to a design location and so like that becomes the most of the mechanical problem and anything that is a mechanical problem is typically more solved and so that is very much true like I would say like out of this grasping identification and routing like definitely the grasping park involves more
AI. But as we build more advanced AI and bring it into a more traditional fields like robotics,
like what we actually find is that even in the identification step, even in the routing steps,
there are a lot of ways that AI can make more traditional mechanical systems smarter, right?
Like for example, like a classic way to do identification is through scanning the barco.
But where's the barco? Like how do you scan the barco? Well, that's actually something that
AI can inform it. Right. And like oftentimes like,
human can identify an item without even scanning the barcode because you can read the packaging,
like you can infer like what is in there. And that is also something that AI can help. And so like while
it is true that there are some steps of the problems that can be solved by more traditional mechanical
and robotic systems, what we have found is that like once you have a very flexible AI, you can
actually rethink a lot of the processes. Like you would make something that was previously impossible
possible, like grasping. And then you can also improve a lot of the other steps of the processes
that were previously possible, but now you can do them in a more intelligent way.
Is the next step of expansion that you are excited about for covariance still within pick and pack,
or are there other tasks within warehousing and logistics that you think are really interesting
to expand into? Or, you know, there's other phrase into different robotic applications.
like, you know, humanoid robots like the Tesla Optimist or other industrial applications.
Yeah, a couple, like, starting at a very highest level, right?
When we think about the covariance brain, this foundation model that we are building,
we are not building it just for warehouse applications.
We are not just building it for pick employees applications within warehouses.
So definitely, like, everything that you're talking about, it's very exciting to us.
like so both applications outside of warehouses as well as applications to newer hardware form factors like
humanoid robots and so like that definitely is the long-term path for us i would say like in the
very immediate future as a company we have focused in the manipulation space of warehouses just
because there are so much demand and there are so many different kinds of use cases that exist
in the warehouse domain already
because a warehouse for
a apparel company is very different
from a warehouse for a cosmetics company
which is very different from a warehouse
for a Mew Prep company
and across all of these
you actually have very different
manipulation skills that you need
and very different kinds of data
that you can collect to train the foundation model
and also very different large markets
that we can tap into.
But we are very intentional
in how we build the models
in a way that makes sure it's generalizable and so you can actually extend into new domain.
And one more comment on the humanoid question, like, I think that would be one of the most
exciting advances in robotics is to make humanoid as a form factor possible, like, because our
world is designed around human bodies. So humanoid is the universal hardware form factor that
can be dropped into any place in our world. And so like we really,
we cannot
we really cannot wait
for the human noise
to be commercially
and also technologically available
because when that platform
is available, that is really the best mechanism
for us to deploy
covari brain this foundation model
to go to more places, more quickly.
Fortunately, we
are not relying on it.
Even by using the existing
industrial robots, hardware,
we can build a scaling
business. We can continue to
both strap and build incrementally more capable models.
But when it comes, that would be a really big acceleration for us.
One more question on the sort of application or maybe just the covariance side before.
I would love to talk a little bit more about the research is can you give our listeners
a sense of your five years into covariance, like how big is the team, you have robots in
the production, what are your types of customers?
Yeah, so Covarian is about 200 people company, and we are extremely international.
I would say roughly half of our customers are in Europe, half of our customers in North America.
And we have robots deploy across three continents at this point and more than 10 countries.
And what is really remarkable, all of these customers, all of these different robots are networked together.
Like, it's one single foundation model and everything that.
they learn come back and make this central model better and our customers are typically large retailers
large e-commerce brands and essentially anyone that runs a large distribution centers or a network
of distribution centers like would likely choose covariant as their model that power their physical world
amazing can we talk a little bit just about the research and i think the first thing on
asks you to explain as just a very high-level concept is what the concept of grounding in
understanding of the real world or, you know, foundation models that understand physics and
objects interaction, like what that means or, you know, how that's missing today.
Yeah. So grounding is this interesting idea of, like, if you just read the text on the
internet, like you learn a lot about abstract concepts, right? But they could be like purely
symbolic. Like, you might read, apple is delicious. Okay, I have this association that,
okay, like something that is apple could be delicious. And if I ask for a delicious thing,
you can say apple is a delicious thing. But that is very symbolic. Like, there has,
like, no actual grounding in our physical world. Like, what does an apple look like? If I give
you an image of an apple, can you recognize it? And can you recognize, like, the different
other physical properties of an apple.
And so, like, the first thing that you want to do is, like, grounding is to ground all of
these symbolic abstract concepts into something that is real, that is physical.
And there are actually a lot of advances of this, like even outside of robotics that's happening
already.
Like, we have a lot of multi-model model that exist in the world.
Like, if you go to GPD4V, like, you're actually given an image.
and then it can answer something for you intelligently about what's in the image.
So, like, GPD4V has grounded, like, these type of multimodal language models,
like, already have an understanding of those grounded concepts.
So where does it get those grounding from?
Like, it gets those grounding from essentially the image and text pairs that happen on the Internet, right?
Like if you look at an Instagram image, it might have a set of captions along with it.
So we can train this kind of multimodal models with a combination of those data.
Like after you have seen enough of the Instagram image of an Apple and enough of people tag them as Apple,
then after you have trained on a large amount of such data, you start to get that grounding.
You start to pick up that associations.
So that's like, I would say outside of robotics, like how typically grounding happens
and how you typically get this kind of multimodal understanding that understands beyond just
pure symbolic concepts, but actually has an understanding of how it gets associated with
the real physical world, typically manifested through an image of the real world.
And if I think about just the concept of an Apple is in many videos on YouTube,
they are kind of round, they are affected by gravity, they have some mass, like what's missing
from those captioned images and videos when you talk about the data that's missing that you need
to go collect for robotics to improve? Yeah, so there are a couple aspects of it. So like obviously
this kind of internet scale data is very useful. Like you can already pick up a lot of association
and grounding with the physical world. But there's still a lot of things that's missing.
Right. So for example, like when you think about this kind of naturally occurring text and image pair data, they are typically about high level concepts.
Like they're typically not about something that is very precise.
Like, so for example, like when I present an apple to you, like you don't typically describe like the precise shape of the apple, right?
Like is this like a very round shape apple?
Is this like a very full apple?
Like you might use some high level concept to describe it.
But there's really nothing that describe it, say, down to sub-minameter level precision,
which is kind of like the level of, like, precise understanding that you need to interact with
the wheelhole.
You don't just say, well, there's kind of an apple there, but there might be like up to a two-centimeter,
like, difference in understanding of where the boundary of that apple is and how should I do it.
And so, like, here's like the first dimension of, like, things that is missing, which is,
like, there's really no very, no precise graph.
grounding. There's no precise understanding of the physical world that's naturally occurring on the internet.
So that's like one of the first thing that you find kind of the departure of robotics foundation models from like other general
multimodal foundation models. Like it's this idea of precision. Like you now actually need to understand
things to a much higher level of precision that don't otherwise exist in this kind of data set.
And so that's like one big thing.
And then another really big thing is like this ability to understand effects of your own actions.
And a large part of this is just because there were not a lot of robots that are doing interesting things in the world.
And so like there are not a lot of data sets that are in the format of robot does something and you know the outcome of it.
Like is this a good way to pick up something?
Like if I move an item too quickly, like would it damage it?
I press, like, for example, a tomato, like, what is the force that is appropriate that
is possible? Like, you don't have a lot of these kind of action and outcome pairs that exist
in the world. Like, the closest thing to that is probably on the YouTube, you have human doing
those things. But then there's a research question of, like, well, can you have a robot that
learns from just watching a human does it? And you don't actually fully know, like, how hard
does a human press on the tomato or like how you precisely decide something.
So you're still lacking a good amount of the data that like completes this feedback loop.
Do you have some sense of like how or if scaling laws apply for you?
Like do you know how many robots you need to deploy or how much data you need to go collect
to get to certain levels of improvement?
Or can you try to predict it now?
So I would say the most technical definition of scaling law does apply and we have seen it
apply in this domain. And it's somewhat not surprising because if you think about the scaling
law in the most technical sense, which is if you scale up data and you scale up your model capacity
and you scale up the compute that you throw at it, you get lower loss function, like training
loss function out of it. And we have seen this play out across so many different domains,
like more than just language model that is not surprising. I think the question that you're asking
is probably the more, not the most technical definition of scaling law, but it's the general
definition of scaling law, which is, as you scale those up, would you get emerging capabilities
out of it? Like, would you kind of like get something that's like modeled as orders of
magnetel smarter in some loose definition of it? Like, which is kind of the thing that we see
from the large language model world, like when you go from GBT 3 to GPD 4, when you go from
Claude 1 to cloud 2, you kind of like see this step change improvement in reliability in
generalization that you get from it. So I assume that's like probably what you're what you're asking.
Yes. Do you believe in some emergent? So I would say we see some element of it, but it is something
that we rely less on. And here's like where I think there was a really interesting crucial
distinction between a called full general model that is designed to solve everything.
in the world to what I think of as a domain-specific foundation model, like in our case,
like solving robotic manipulations. So in a full general model, like, for example, like GPT-5 that you
wanted to solve everything in the world, then you have this problem of essentially out-of-domain
generalization. Like when we say, like, as you scale it up, like, do you get something that
is much smarter out of it? Like, we are not saying, like, whether GPT-5 would
fit the training data better.
Like we are saying, like, if you give a scenario that is completely outside of training data,
like, how well does it work?
And that is where you kind of like need to rely on this strong form of scaling law.
But you kind of don't need that when you are in a more restricted domain like robotics.
Because, like, you actually could have so much data coverage that your test scenarios are just part of your training scenario.
So to some degree, like, we actually don't need to rely on this strong form of scaling law to hold for us to build really valuable technology out of it.
And so I expect, like, something similar like that would happen, like, would follow the similar trend that you see in the language world.
But at the same time, like, we don't, we don't require it.
Like, we know that, like, as you get more customers, as you get more data, like, these systems would get better.
and especially if you have targeted data coverage for specific domains, for specific customers,
they would be guaranteed to get better.
So to some degree, like, we, whether you believe, like, robotics can scale or not,
it's a simpler bet.
Like, it's just like whether you can get data of that domain.
And if you can get it, like, then you can for sure that you can fit it.
Last question in this research area.
Is there a specific scientific insight that, or bet that covariate has made?
Or should we think of this as,
Not at all trivial, but a full stack play with the right people, very well-prepared engineers and scientists doing the relevant data collection that doesn't exist today that will support increased robotic intelligence versus, let's say, like, a architectural bed or whatever it is.
Yeah, it's like the architectural has changed like maybe five times already.
Like it has gone through like significant transformation like every year.
Like, I don't think you can be married to any single specific architecture in a field that is moving so quickly.
But there is one unique bet that we are placing, right?
So that one unique bet is we believe the future of robotics would be built by whoever that has most robotics data.
And essentially, the whole company is built around that thesis.
And, like, you can say, like, what is an alternative belief?
Like, an alternative belief would be, can we just solely rely on simulation?
Like, we actually don't need much.
We will data.
Like, there would be a different philosophical bet on it.
Like, we also use simulation, but we think of simulation as more of a way to augment the data,
not as the way to replace everything.
There are lots of smart Tesla and X Tesla people, where Tesla has been a, I guess,
big proponent of high quality simulation, including for, you know, training data generation.
Right? Where are the gaps? Or why do you believe that's insufficient?
So when we think about simulation, it's actually somewhat different for different kinds of autonomy domain.
So when you think about simulation in self-driving car, like we are really mostly thinking about
systems that hopefully don't physically interact with each other, right?
Like if two cars get in contact with each other, that's a really terrible thing.
And so the simulation there is more about simulation of avoidance.
multi-agent behaviors, like avoidance of contact.
But if you think about like manipulation, like if you never contact something,
that's also a big problem, like,
because like then you actually don't do any work.
And whenever you involve contact, simulation of those things become very, very difficult.
Like items that can deform, like like the contact dynamics is incredibly challenging.
And so those are where simulation becomes very difficult.
Like it's when it involves contact, complex dynamics.
And then there's the second.
The second thing that makes simulation difficult is, like, I mentioned earlier that a typical
customers that we serve, like, may have 100,000 distinct objects in a warehouse.
Like, so, like, if you want to fully recreate that in your simulation, like, that is actually
more work than just learning a system that can deal with the real world.
Like, so there's a vacation problem.
Like, in order to specify the real world in your simulation, like, that actually might
require more data or more work or whatnot.
And that being that, like, we believe in learn role model.
Like, we believe in foundation models that can learn from the real world.
And you can simulate new scenarios of what would happen if you do things differently.
But I think of that as, like, different from the classical simulation that I referred to earlier,
which is program-based, and you are just hard-coding the rules of reality
and then building agents that learn from the mechanical interpretations of the rule of realities
that you encode in your simulator.
So for our last couple of minutes,
should we zoom out and talk a little bit about the future?
Yeah.
So you have said we're pre-chat GPT for the robotics industry.
What is the chat GPT moment for robots?
What do you imagine?
The chat-GPD moment for robots,
you want AI that is as general as chat-GPD.
So you would be able to throw a robot into any arbitrary new scenarios,
and it will be able to learn how to deal with it very quickly.
But in addition to that, which is kind of like what ChartGBT allow people to experience is you can ask it arbitrary problems, like, and then they can solve to some degree to you.
So you want the same kind of generality.
But in addition to that, what you also need is really high reliability, because like you really don't want robots that only succeeds in like the tasks that you ask it to do 70% of the time.
And then there's like, there might be 30% like really catastrophic outcomes.
come out come with it so i would say like the bar for the chat chavity moment for robotics is higher like
you you need to solve the generality like which is the same kind of problem but you need to solve
it with high level of reliability and this is like where like one of this concept that we talk about
earlier comes in like you really need large amount of high quality data to densely cover
like this robotic fuels that you want and so that would be what i think about as the
side of the chat GPD moment for robotics.
And then you also need to think about the hardware portion of it, right?
Like even if you have a robot AI that is very smart,
unless you are just interacting with this robot AI in some
metaverse digital 3D world,
you still need some hardware body for robots.
And before human noise are fully widespread,
I think we will see that the chat GPD of moment for robotics being articulated
in the industrial.
settings earlier than in the commercial settings, like, because those are the places that can
actually justify the hardware investments, because the hardware is being used 24-7, as opposed
to, like, home robots that might only be used two hours a week. Like, that's a very different
ROI from the hardware piece that you need to put in it. What does the, like, warehouse or factory
or logistics center of the future look like? It lights out, no humans? I don't think it would be
fully light cells and no human, at least in the near future. But I think of it as would be
very robotics augmented. So think of one person would be able to oversee 10, 20, 30 robots.
So like instead of like one person have to manually do all those work, like you actually work
with a fleet of robots. So think of like kind of as a physical co-pilot type of setup.
Like you just get this like large amplification of like what it was.
one person can do. But most likely it wouldn't be completed lights out, like you will still
have people there. I think this form of expression of AI would probably be true not just for robotics,
but many other fields of AI as well. I realize you just said industrial applications first from
an ROI perspective. That makes sense. But do you have a guess or hope for what the first form
or use case for intelligent robot that your average human, like your consumer,
interacts with? If I have to guess, it probably would be a home robots that don't involve
much manipulation. So think of it as like a home robots that might be like a Roomba. You can
follow you around. Like you can talk to it. So like it has that navigation of movement aspects of it,
but not necessarily the manipulation aspects of it, like not actually manipulating the physical
world around it. I think that would be the most technologically feasible version. So think of it as similar
to Amazon's astro robot, like this kind of like cute robot that has two wheels that can
follow you around and someone calls it, it can go there. And so like I think that type of
form factor would probably be like when we would see it earlier. Robotics AI work, it triggers
a lot of concern around safety in both like the short term practical sense and in sort of the
AGI breaking into the real world sense. How do you think about safety at covariant? We have a simple
carve out to this question, like, because we focus on industrial applications. And, well, all
industrial robots, like, have a set of safety rules that they need to conform to, like,
because it's not just AI can be dangerous. Like, manual programming can be dangerous. Like,
you could make, you could program a robot to do dangerous things already. And so there's a really
robot sets of rules around, you have to put safety cages around robots. And if you have, you don't
have safety cages, you need to have certain kinds of certified controller that make sure a robot
doesn't do anything that's dangerous to the surrounding equipment, people. And so from that sense,
because we're just following the same rules, like any kinds of robots that we build and
deploy are by definition safe or by construction safe. But that is very different from like when
you say, well, what if we hook up like an arbitrarily expressive agent into a home robot that also
has, like, how do you limit that to be safe? It's much harder. Like, just similar to, like,
if you just hook up a language agent to give it arbitrary Python code execution capability and
arbitrary ability to access the internet, it just becomes very difficult to say, well, how can
you make sure, like, it doesn't do anything dangerous? And that's where the alignment problem
comes in, and that's where there's a lot of this good safety research comes in. But we have
a simpler carve out, like, at least for the near term, in this kind of industrial applications.
What advancement in AI research or application outside of robotics are you most personally interested in?
Looking backward or looking forward?
Looking forward. I can only look forward.
I think the same kind of events that we have seen in last year, like we would see at least the same more order of many of them in the coming year.
It's just if you look really behind like all these advances in large language models, image generations, they are still using relatively.
primitive technology like so like if you're especially large language models like they are
mostly still trained just on next token prediction like which in for people that study
reinforcement learning we call it behavior cloning which means you're just asking the AI to clone
the behavior of another agent and that is like one of the most primitive way possible to train
this type of systems like because if you're just mimicking something like there's a natural
ceiling on how good you can get on that
And then there was just so many other proven two boxes that we have not deployed yet, that I would say progress is guarantee in everything that we have seen so far.
And I'm super excited about that.
And I'm also super excited about the open source movement continuing in the AI world, like where a lot of these advances make available to a broad set of communities that can continue to build on an experiment with.
it. And so I think it will continue to be a very exciting year of AI progress.
Okay. Then looking backward and forward at the same time, last question is your favorite
sci-fi book with robots in it, realistic or not?
It's not a book, but I really like Westworld.
Okay. Great. Westworld, the future comes.
Peter, thank you so much for joining us on No Priors. Until next time.
Thanks.
Find us on Twitter at No Prior's Pod.
Subscribe to our YouTube channel if you want to
to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no dash priors.com.