No Priors: Artificial Intelligence | Technology | Startups - Building the factories of the future with Covariant CEO Peter Chen

Starting point is 00:00:00 Hi, listeners. Welcome to another episode of No Pryors. This week, I'm joined by Peter Chem, the co-founder and CEO of Covariant, a robotics startup that is developing AI robots. Before he started Covariant, Peter was a research scientist at OpenAI and a researcher at the Berkeley AI Research Lab, where he focused on reinforcement learning, metal learning, and then supervised learning. He is a prolific publisher and now a founder. I'm so excited to have you on today to talk about what's going on in robotics. Welcome, Peter. Thanks, Sarah.

Starting point is 00:00:36 It's great to be here. There are many exciting reasons to be here. One is I have been a frequent listeners of the podcast. And the second one is just because of the name, like I just have to be on this show. So it's great to be here. Right. Let's go establish some priors for everybody in a very unknown landscape, right? Can we start with just why you were drawn to robotics and the beginning of your research journey?

Starting point is 00:01:03 Yeah. When I was working on research at both UC Berkeley as part of my PhD and at OpenA.I, there were two topics that were particularly exciting to me. One topic is, like, as you have introduced, unsupervised learning, like, how can we build models that learn from vast amount of data? And we now more colloquially known this as generative AI, because we train this large models on large. amount of text, images, videos, and you learn from them in an unsupervised manner. That topic has always been very interesting to me because if you want to train very capable AIs, you want to have a lot of data. And where you can get a lot of data is through this kind of unsupervised data set. And then the second topic that was really interesting to me was

Starting point is 00:01:49 reinforcement learning. Like it's not just building models that understand, but building models that can make decisions and reinforcement learning teach these models to make decisions by having them make trials and errors and learn from the better decisions and do less of the worst decisions. And robotics is just such a great combination of these fields in order to build really capable robots. They need to really understand the world in a very, very robust way. And they are not just passive agents that just understand text or what's in an image. They actually need to take actions in the real world and the consequences do matter. And so we found robotics to be such a great way to both utilize the advances in AI, but also we think of it as a way to also propel AI forward.

Starting point is 00:02:39 Like this is where you get the grounded data. This is where you get that embodied data of not just AI that is trained on browsing the Internet, but AI that is trained with physical interactions with the world. And so we also believe robotics would be a key way to advance AI. That makes sense. You were at places that are great places to do research. Why did you decide to start a commercial company? It's a really good question. I mean, there are a lot of companies that are funded by prior PhDs that are kind of the classic journey of, there's a technology that was built in a lab environment and it got to enough a level of maturity that we should start to commercialize it in the real world. That was kind of not the journey of

Starting point is 00:03:20 covariian. When we started covariant, there was not AI that was good enough to make robots do useful things commercially. And so it was not a classic journey of technology developed in academia and then transition to a commercial landscape. The key insight that we had at that time when we left OpenAI in 2017 to start Covariant was the future of AI is going to be the future of foundation models. These models that are truly multitasked. learn from large amount of data and as such, be more generalizable. They can solve new tasks more easily

Starting point is 00:03:57 and are also more capable at every single one of the tasks because of the transfer you get across tasks. We just had early conviction that there was the path to build AI and that is also going to be true for the physical world, for robotics. But there's one big problem,

Starting point is 00:04:14 which is you have no data set to build robotics foundation model. There's no data set that you can build this AI that understands the physical world and take actions in the physical world. And so in order to build this foundation models for robotics, you really have to build a company that can collect data to do it. And the only way to collect enough data is to build fleets of robots that are actually creating value for customers so that you can collect those data in production.

Starting point is 00:04:46 Because even if you try to scale up data collection in a lab environment, There's a limit on how much you can do that. In that perspective, we strongly believe in the Tesla approach, like where they have the most self-driving car data, because they ship a great cost that people want to drive and a good enough entry-level autopilot that people are willing to use it, and they're creating value for their customers, like customers use their products,

Starting point is 00:05:11 and those data that they collect can allow them to build much more capable models and AI. And so why we left Open AI, and academia to start covariant is very much disbelief that in order to build foundation models for robots, you have to have a lot of data. And in order to have a lot of data, you have to build autonomously working systems for customers. And the only way to do that is to build a company to serve those customers. Yeah, there's a really interesting tension if you're trying to build a, let's say, AI capability that doesn't exist yet because there's no model that is good enough of how much you invest in that upfront versus deliver the product that already exists in the

Starting point is 00:05:55 world, right? Like, you could just go build a bunch of robots and deploy them on mass. Or, you know, if we draw a analogy to the prior generation or current existing generation of autonomy companies. Like, we were, you know, I involved early in my prior role in Aurora and Neuro and then I was personal investor in Kodiak, right? Like a lot of these companies, you were trying to build. a brain is an alternative to the Tesla approach. And I think the economics of collect as you go is getting very, very compelling just in terms of how expensive it is to try to sequence it the other way. Yeah, like this definitely needs to be a incremental approach. Like you have to just find like the right sequence of what is the technology events that I want to build now

Starting point is 00:06:43 that enable enough of a products that I can deliver, which then in turn allow you to build more capable models that then in turn like a larger service of area. And this is like, I mean, we, we have seen this play out in the non-robotics world as well. Right. Like if we think about Open AI and Thorpeg, cohere a lot of these big language models players, like the models that they have are not fully general language models yet, right? But they are good enough that can solve a large section of problems that is worth productionizing them

Starting point is 00:07:19 getting commercial value out of it, which then in turn allow you to build the next incrementally better system. And I think of it as the same kind of roadmaping exercise that you have to do in autonomy. You cannot just go straight

Starting point is 00:07:35 to the full general physical AGI at the beginning. You have to build something that represents a justifiable R&B spend as well as timeline that you can justify. But that allows you to build something that is valuable that you can ship to customers. And from that process, you get more data, you get more learning that then in turn allow you to build a next generation model. So we think of it as very much an iterative

Starting point is 00:08:00 approach and having real products and having real customers allow you to ground that approach as opposed to just be in a philosophical debate of like how we build this super, super general thing that is very far out in the future. Then I think the right way to start actually be to ground the conversation and kind of like the application landscape. Can you walk us through the sort of limitations of robotics in warehousing and manufacturing that are commonplace right now and how much intelligence these robots have? Robots are extremely common nowadays. So what we typically work on are robotic arms. So think of these as six axes, seven axes, robotic arms that can do very flexible movements. They are super precise. They're super fast and super doable and very cheap. And

Starting point is 00:08:45 very cheap. Lots of factories around the world have robots, but the challenge is like 99 plus percent of the robots that are deployed in the world are dumb robots. These robots are pre-programmed to do the same thing again and again, and they don't really have any kinds of intelligence that can adapt to new circumstances, communicate with people, and change what they do on the flight. And so think of robotics that exist today are extremely rigid. And so really the problem that we are solved, is we're not trying to make the existing dumb robot use cases better, right? Like we're not trying to say, oh, instead of manually programming this robot, you could just have an AI that program that robot.

Starting point is 00:09:30 We're not talking about that. Like, we're really talking about, like, opening up a couple orders of magnitude or more use cases where the robots actually need to be smart. Like, they need to adapt what they do based on the scenario that is presented to them, right? So, like, the good way to visualize this is, on one hand, like, think about a robot, for example, in a Tesla factory that is handling a car body. Okay, this is, this has, this is a very incredible feat of engineering that can move, like, multi-ton object very fast, very precisely, but it's just doing the same thing again and again. Like, and then imagine another robot in a e-commerce warehouse that has hundreds of thousands of, unique items that it has to distinguish pickup and pack carefully into a box that gets shipped

Starting point is 00:10:21 to you. That's a very different kinds of diversity that we're talking about. And so when we think about building AI for robots, when we think about building foundation models for robots, we're thinking about really lifting robotics as a category from this former category of just being able to do repeated things to this category of really being able to handle diversity. of environments, changes in the environments, and being able to understand what's around it and make intelligent decisions and actions to handle a diverse set of circumstances. And we think like this would enable really a whole different wave of robotics that is not how robotics is used today. And for covariance specifically, we are starting from logistics and warehouses as an industry that we

Starting point is 00:11:10 focused on. So this is, think of it as the explosive demand that is driven by the growth of e-commerce. There's a lot of complexities that's been injected into the logistics and supply chain. And at the same time, coupling that with demographics change, changing immigration landscape makes few and few people want to do this kind of warehouse jobs, like drive an hour and a half to the suburb and then have to work through the midnight. Like, these are not the kind of job. that people want to do and our customers have extremely high turnover rate, like an average warehouse that we serve have typically more than 100% year over year turnover rate. And so like these are the type of places that we have an extreme shortage of people that want to do those kind of

Starting point is 00:11:58 jobs. And yet at the same time, there are no prior robots that can solve pick, pack, ship in warehouses because like traditional robots are just machines that do the motion that you program it to do repeatedly. But here, you actually need systems that's actually adaptive and do it at a very high level of reliability. Can you describe, like, how we should imagine the physical? Like, you obviously have coherent brain, but then you have the physical instantiation.

Starting point is 00:12:27 Like, what's a put-wall just for our listeners? Yeah, so a common use case that we have for our customers is what we typically call a put-wall use case. A put-war is a turn that is used in e-commerce fulfillment. and which is like when you click a button to buy something online and and then the box show up to your door and you might wonder like, well, how is that done? Well, there's a complex sets of operations that's happening in the background and a put war is one step of that.

Starting point is 00:12:57 And this step is typically used to sort a mix of customer orders to different customers, like let's say both you and I have order a new generation of iPhone. And then a robot would be sitting there and picking up one iPhone and say, oh, this one should go to Sarah and this one should go to Peter. If you think about what that robot needs to do, like the robot needs to have an incredibly great ability to grasp items without damaging it and have the accurate ability to identify what is the item and then route them to the appropriate customer, like in this case, like either you or me. And so put war, you can think of it as a sortation mechanism. You can think of it as a physical router that exists in the world. So instead of thinking about network router that sends digital packets around, you can think about put war as a physical router that sends goods to different places.

Starting point is 00:13:53 Is it fair to say that identification and routing are more solved problems than grasping? I would say identification and routings are typically more considered more solved problem than grasping. like because if you there are other like more mechanical way to solve those problems like you can design a piece of conveyor that like if you always put an item to the same place then you can route it to a design location and so like that becomes the most of the mechanical problem and anything that is a mechanical problem is typically more solved and so that is very much true like I would say like out of this grasping identification and routing like definitely the grasping park involves more AI. But as we build more advanced AI and bring it into a more traditional fields like robotics, like what we actually find is that even in the identification step, even in the routing steps, there are a lot of ways that AI can make more traditional mechanical systems smarter, right? Like for example, like a classic way to do identification is through scanning the barco. But where's the barco? Like how do you scan the barco? Well, that's actually something that

Starting point is 00:15:03 AI can inform it. Right. And like oftentimes like, human can identify an item without even scanning the barcode because you can read the packaging, like you can infer like what is in there. And that is also something that AI can help. And so like while it is true that there are some steps of the problems that can be solved by more traditional mechanical and robotic systems, what we have found is that like once you have a very flexible AI, you can actually rethink a lot of the processes. Like you would make something that was previously impossible possible, like grasping. And then you can also improve a lot of the other steps of the processes that were previously possible, but now you can do them in a more intelligent way.

Starting point is 00:15:43 Is the next step of expansion that you are excited about for covariance still within pick and pack, or are there other tasks within warehousing and logistics that you think are really interesting to expand into? Or, you know, there's other phrase into different robotic applications. like, you know, humanoid robots like the Tesla Optimist or other industrial applications. Yeah, a couple, like, starting at a very highest level, right? When we think about the covariance brain, this foundation model that we are building, we are not building it just for warehouse applications. We are not just building it for pick employees applications within warehouses.

Starting point is 00:16:25 So definitely, like, everything that you're talking about, it's very exciting to us. like so both applications outside of warehouses as well as applications to newer hardware form factors like humanoid robots and so like that definitely is the long-term path for us i would say like in the very immediate future as a company we have focused in the manipulation space of warehouses just because there are so much demand and there are so many different kinds of use cases that exist in the warehouse domain already because a warehouse for a apparel company is very different

Starting point is 00:17:02 from a warehouse for a cosmetics company which is very different from a warehouse for a Mew Prep company and across all of these you actually have very different manipulation skills that you need and very different kinds of data that you can collect to train the foundation model

Starting point is 00:17:17 and also very different large markets that we can tap into. But we are very intentional in how we build the models in a way that makes sure it's generalizable and so you can actually extend into new domain. And one more comment on the humanoid question, like, I think that would be one of the most exciting advances in robotics is to make humanoid as a form factor possible, like, because our world is designed around human bodies. So humanoid is the universal hardware form factor that

Starting point is 00:17:50 can be dropped into any place in our world. And so like we really, we cannot we really cannot wait for the human noise to be commercially and also technologically available because when that platform is available, that is really the best mechanism

Starting point is 00:18:07 for us to deploy covari brain this foundation model to go to more places, more quickly. Fortunately, we are not relying on it. Even by using the existing industrial robots, hardware, we can build a scaling

Starting point is 00:18:23 business. We can continue to both strap and build incrementally more capable models. But when it comes, that would be a really big acceleration for us. One more question on the sort of application or maybe just the covariance side before. I would love to talk a little bit more about the research is can you give our listeners a sense of your five years into covariance, like how big is the team, you have robots in the production, what are your types of customers? Yeah, so Covarian is about 200 people company, and we are extremely international.

Starting point is 00:19:02 I would say roughly half of our customers are in Europe, half of our customers in North America. And we have robots deploy across three continents at this point and more than 10 countries. And what is really remarkable, all of these customers, all of these different robots are networked together. Like, it's one single foundation model and everything that. they learn come back and make this central model better and our customers are typically large retailers large e-commerce brands and essentially anyone that runs a large distribution centers or a network of distribution centers like would likely choose covariant as their model that power their physical world amazing can we talk a little bit just about the research and i think the first thing on

Starting point is 00:19:55 asks you to explain as just a very high-level concept is what the concept of grounding in understanding of the real world or, you know, foundation models that understand physics and objects interaction, like what that means or, you know, how that's missing today. Yeah. So grounding is this interesting idea of, like, if you just read the text on the internet, like you learn a lot about abstract concepts, right? But they could be like purely symbolic. Like, you might read, apple is delicious. Okay, I have this association that, okay, like something that is apple could be delicious. And if I ask for a delicious thing, you can say apple is a delicious thing. But that is very symbolic. Like, there has,

Starting point is 00:20:41 like, no actual grounding in our physical world. Like, what does an apple look like? If I give you an image of an apple, can you recognize it? And can you recognize, like, the different other physical properties of an apple. And so, like, the first thing that you want to do is, like, grounding is to ground all of these symbolic abstract concepts into something that is real, that is physical. And there are actually a lot of advances of this, like even outside of robotics that's happening already. Like, we have a lot of multi-model model that exist in the world.

Starting point is 00:21:15 Like, if you go to GPD4V, like, you're actually given an image. and then it can answer something for you intelligently about what's in the image. So, like, GPD4V has grounded, like, these type of multimodal language models, like, already have an understanding of those grounded concepts. So where does it get those grounding from? Like, it gets those grounding from essentially the image and text pairs that happen on the Internet, right? Like if you look at an Instagram image, it might have a set of captions along with it. So we can train this kind of multimodal models with a combination of those data.

Starting point is 00:21:58 Like after you have seen enough of the Instagram image of an Apple and enough of people tag them as Apple, then after you have trained on a large amount of such data, you start to get that grounding. You start to pick up that associations. So that's like, I would say outside of robotics, like how typically grounding happens and how you typically get this kind of multimodal understanding that understands beyond just pure symbolic concepts, but actually has an understanding of how it gets associated with the real physical world, typically manifested through an image of the real world. And if I think about just the concept of an Apple is in many videos on YouTube,

Starting point is 00:22:41 they are kind of round, they are affected by gravity, they have some mass, like what's missing from those captioned images and videos when you talk about the data that's missing that you need to go collect for robotics to improve? Yeah, so there are a couple aspects of it. So like obviously this kind of internet scale data is very useful. Like you can already pick up a lot of association and grounding with the physical world. But there's still a lot of things that's missing. Right. So for example, like when you think about this kind of naturally occurring text and image pair data, they are typically about high level concepts. Like they're typically not about something that is very precise. Like, so for example, like when I present an apple to you, like you don't typically describe like the precise shape of the apple, right?

Starting point is 00:23:29 Like is this like a very round shape apple? Is this like a very full apple? Like you might use some high level concept to describe it. But there's really nothing that describe it, say, down to sub-minameter level precision, which is kind of like the level of, like, precise understanding that you need to interact with the wheelhole. You don't just say, well, there's kind of an apple there, but there might be like up to a two-centimeter, like, difference in understanding of where the boundary of that apple is and how should I do it.

Starting point is 00:23:58 And so, like, here's like the first dimension of, like, things that is missing, which is, like, there's really no very, no precise graph. grounding. There's no precise understanding of the physical world that's naturally occurring on the internet. So that's like one of the first thing that you find kind of the departure of robotics foundation models from like other general multimodal foundation models. Like it's this idea of precision. Like you now actually need to understand things to a much higher level of precision that don't otherwise exist in this kind of data set. And so that's like one big thing. And then another really big thing is like this ability to understand effects of your own actions.

Starting point is 00:24:47 And a large part of this is just because there were not a lot of robots that are doing interesting things in the world. And so like there are not a lot of data sets that are in the format of robot does something and you know the outcome of it. Like is this a good way to pick up something? Like if I move an item too quickly, like would it damage it? I press, like, for example, a tomato, like, what is the force that is appropriate that is possible? Like, you don't have a lot of these kind of action and outcome pairs that exist in the world. Like, the closest thing to that is probably on the YouTube, you have human doing those things. But then there's a research question of, like, well, can you have a robot that

Starting point is 00:25:27 learns from just watching a human does it? And you don't actually fully know, like, how hard does a human press on the tomato or like how you precisely decide something. So you're still lacking a good amount of the data that like completes this feedback loop. Do you have some sense of like how or if scaling laws apply for you? Like do you know how many robots you need to deploy or how much data you need to go collect to get to certain levels of improvement? Or can you try to predict it now? So I would say the most technical definition of scaling law does apply and we have seen it

Starting point is 00:26:01 apply in this domain. And it's somewhat not surprising because if you think about the scaling law in the most technical sense, which is if you scale up data and you scale up your model capacity and you scale up the compute that you throw at it, you get lower loss function, like training loss function out of it. And we have seen this play out across so many different domains, like more than just language model that is not surprising. I think the question that you're asking is probably the more, not the most technical definition of scaling law, but it's the general definition of scaling law, which is, as you scale those up, would you get emerging capabilities out of it? Like, would you kind of like get something that's like modeled as orders of

Starting point is 00:26:47 magnetel smarter in some loose definition of it? Like, which is kind of the thing that we see from the large language model world, like when you go from GBT 3 to GPD 4, when you go from Claude 1 to cloud 2, you kind of like see this step change improvement in reliability in generalization that you get from it. So I assume that's like probably what you're what you're asking. Yes. Do you believe in some emergent? So I would say we see some element of it, but it is something that we rely less on. And here's like where I think there was a really interesting crucial distinction between a called full general model that is designed to solve everything. in the world to what I think of as a domain-specific foundation model, like in our case,

Starting point is 00:27:35 like solving robotic manipulations. So in a full general model, like, for example, like GPT-5 that you wanted to solve everything in the world, then you have this problem of essentially out-of-domain generalization. Like when we say, like, as you scale it up, like, do you get something that is much smarter out of it? Like, we are not saying, like, whether GPT-5 would fit the training data better. Like we are saying, like, if you give a scenario that is completely outside of training data, like, how well does it work? And that is where you kind of like need to rely on this strong form of scaling law.

Starting point is 00:28:11 But you kind of don't need that when you are in a more restricted domain like robotics. Because, like, you actually could have so much data coverage that your test scenarios are just part of your training scenario. So to some degree, like, we actually don't need to rely on this strong form of scaling law to hold for us to build really valuable technology out of it. And so I expect, like, something similar like that would happen, like, would follow the similar trend that you see in the language world. But at the same time, like, we don't, we don't require it. Like, we know that, like, as you get more customers, as you get more data, like, these systems would get better. and especially if you have targeted data coverage for specific domains, for specific customers, they would be guaranteed to get better.

Starting point is 00:29:05 So to some degree, like, we, whether you believe, like, robotics can scale or not, it's a simpler bet. Like, it's just like whether you can get data of that domain. And if you can get it, like, then you can for sure that you can fit it. Last question in this research area. Is there a specific scientific insight that, or bet that covariate has made? Or should we think of this as, Not at all trivial, but a full stack play with the right people, very well-prepared engineers and scientists doing the relevant data collection that doesn't exist today that will support increased robotic intelligence versus, let's say, like, a architectural bed or whatever it is.

Starting point is 00:29:47 Yeah, it's like the architectural has changed like maybe five times already. Like it has gone through like significant transformation like every year. Like, I don't think you can be married to any single specific architecture in a field that is moving so quickly. But there is one unique bet that we are placing, right? So that one unique bet is we believe the future of robotics would be built by whoever that has most robotics data. And essentially, the whole company is built around that thesis. And, like, you can say, like, what is an alternative belief? Like, an alternative belief would be, can we just solely rely on simulation?

Starting point is 00:30:28 Like, we actually don't need much. We will data. Like, there would be a different philosophical bet on it. Like, we also use simulation, but we think of simulation as more of a way to augment the data, not as the way to replace everything. There are lots of smart Tesla and X Tesla people, where Tesla has been a, I guess, big proponent of high quality simulation, including for, you know, training data generation. Right? Where are the gaps? Or why do you believe that's insufficient?

Starting point is 00:30:57 So when we think about simulation, it's actually somewhat different for different kinds of autonomy domain. So when you think about simulation in self-driving car, like we are really mostly thinking about systems that hopefully don't physically interact with each other, right? Like if two cars get in contact with each other, that's a really terrible thing. And so the simulation there is more about simulation of avoidance. multi-agent behaviors, like avoidance of contact. But if you think about like manipulation, like if you never contact something, that's also a big problem, like,

Starting point is 00:31:32 because like then you actually don't do any work. And whenever you involve contact, simulation of those things become very, very difficult. Like items that can deform, like like the contact dynamics is incredibly challenging. And so those are where simulation becomes very difficult. Like it's when it involves contact, complex dynamics. And then there's the second. The second thing that makes simulation difficult is, like, I mentioned earlier that a typical customers that we serve, like, may have 100,000 distinct objects in a warehouse.

Starting point is 00:32:03 Like, so, like, if you want to fully recreate that in your simulation, like, that is actually more work than just learning a system that can deal with the real world. Like, so there's a vacation problem. Like, in order to specify the real world in your simulation, like, that actually might require more data or more work or whatnot. And that being that, like, we believe in learn role model. Like, we believe in foundation models that can learn from the real world. And you can simulate new scenarios of what would happen if you do things differently.

Starting point is 00:32:36 But I think of that as, like, different from the classical simulation that I referred to earlier, which is program-based, and you are just hard-coding the rules of reality and then building agents that learn from the mechanical interpretations of the rule of realities that you encode in your simulator. So for our last couple of minutes, should we zoom out and talk a little bit about the future? Yeah. So you have said we're pre-chat GPT for the robotics industry.

Starting point is 00:33:02 What is the chat GPT moment for robots? What do you imagine? The chat-GPD moment for robots, you want AI that is as general as chat-GPD. So you would be able to throw a robot into any arbitrary new scenarios, and it will be able to learn how to deal with it very quickly. But in addition to that, which is kind of like what ChartGBT allow people to experience is you can ask it arbitrary problems, like, and then they can solve to some degree to you. So you want the same kind of generality.

Starting point is 00:33:34 But in addition to that, what you also need is really high reliability, because like you really don't want robots that only succeeds in like the tasks that you ask it to do 70% of the time. And then there's like, there might be 30% like really catastrophic outcomes. come out come with it so i would say like the bar for the chat chavity moment for robotics is higher like you you need to solve the generality like which is the same kind of problem but you need to solve it with high level of reliability and this is like where like one of this concept that we talk about earlier comes in like you really need large amount of high quality data to densely cover like this robotic fuels that you want and so that would be what i think about as the side of the chat GPD moment for robotics.

Starting point is 00:34:23 And then you also need to think about the hardware portion of it, right? Like even if you have a robot AI that is very smart, unless you are just interacting with this robot AI in some metaverse digital 3D world, you still need some hardware body for robots. And before human noise are fully widespread, I think we will see that the chat GPD of moment for robotics being articulated in the industrial.

Starting point is 00:34:50 settings earlier than in the commercial settings, like, because those are the places that can actually justify the hardware investments, because the hardware is being used 24-7, as opposed to, like, home robots that might only be used two hours a week. Like, that's a very different ROI from the hardware piece that you need to put in it. What does the, like, warehouse or factory or logistics center of the future look like? It lights out, no humans? I don't think it would be fully light cells and no human, at least in the near future. But I think of it as would be very robotics augmented. So think of one person would be able to oversee 10, 20, 30 robots. So like instead of like one person have to manually do all those work, like you actually work

Starting point is 00:35:38 with a fleet of robots. So think of like kind of as a physical co-pilot type of setup. Like you just get this like large amplification of like what it was. one person can do. But most likely it wouldn't be completed lights out, like you will still have people there. I think this form of expression of AI would probably be true not just for robotics, but many other fields of AI as well. I realize you just said industrial applications first from an ROI perspective. That makes sense. But do you have a guess or hope for what the first form or use case for intelligent robot that your average human, like your consumer, interacts with? If I have to guess, it probably would be a home robots that don't involve

Starting point is 00:36:22 much manipulation. So think of it as like a home robots that might be like a Roomba. You can follow you around. Like you can talk to it. So like it has that navigation of movement aspects of it, but not necessarily the manipulation aspects of it, like not actually manipulating the physical world around it. I think that would be the most technologically feasible version. So think of it as similar to Amazon's astro robot, like this kind of like cute robot that has two wheels that can follow you around and someone calls it, it can go there. And so like I think that type of form factor would probably be like when we would see it earlier. Robotics AI work, it triggers a lot of concern around safety in both like the short term practical sense and in sort of the

Starting point is 00:37:07 AGI breaking into the real world sense. How do you think about safety at covariant? We have a simple carve out to this question, like, because we focus on industrial applications. And, well, all industrial robots, like, have a set of safety rules that they need to conform to, like, because it's not just AI can be dangerous. Like, manual programming can be dangerous. Like, you could make, you could program a robot to do dangerous things already. And so there's a really robot sets of rules around, you have to put safety cages around robots. And if you have, you don't have safety cages, you need to have certain kinds of certified controller that make sure a robot doesn't do anything that's dangerous to the surrounding equipment, people. And so from that sense,

Starting point is 00:37:53 because we're just following the same rules, like any kinds of robots that we build and deploy are by definition safe or by construction safe. But that is very different from like when you say, well, what if we hook up like an arbitrarily expressive agent into a home robot that also has, like, how do you limit that to be safe? It's much harder. Like, just similar to, like, if you just hook up a language agent to give it arbitrary Python code execution capability and arbitrary ability to access the internet, it just becomes very difficult to say, well, how can you make sure, like, it doesn't do anything dangerous? And that's where the alignment problem comes in, and that's where there's a lot of this good safety research comes in. But we have

Starting point is 00:38:37 a simpler carve out, like, at least for the near term, in this kind of industrial applications. What advancement in AI research or application outside of robotics are you most personally interested in? Looking backward or looking forward? Looking forward. I can only look forward. I think the same kind of events that we have seen in last year, like we would see at least the same more order of many of them in the coming year. It's just if you look really behind like all these advances in large language models, image generations, they are still using relatively. primitive technology like so like if you're especially large language models like they are mostly still trained just on next token prediction like which in for people that study

Starting point is 00:39:24 reinforcement learning we call it behavior cloning which means you're just asking the AI to clone the behavior of another agent and that is like one of the most primitive way possible to train this type of systems like because if you're just mimicking something like there's a natural ceiling on how good you can get on that And then there was just so many other proven two boxes that we have not deployed yet, that I would say progress is guarantee in everything that we have seen so far. And I'm super excited about that. And I'm also super excited about the open source movement continuing in the AI world, like where a lot of these advances make available to a broad set of communities that can continue to build on an experiment with. it. And so I think it will continue to be a very exciting year of AI progress.

Starting point is 00:40:18 Okay. Then looking backward and forward at the same time, last question is your favorite sci-fi book with robots in it, realistic or not? It's not a book, but I really like Westworld. Okay. Great. Westworld, the future comes. Peter, thank you so much for joining us on No Priors. Until next time. Thanks. Find us on Twitter at No Prior's Pod. Subscribe to our YouTube channel if you want to

Starting point is 00:40:42 to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Building the factories of the future with Covariant CEO Peter Chen

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.