This Week in Startups - Reverse-engineering autonomy in humanoid robots with Sanctuary AI CEO Geordie Rose | E1832

Starting point is 00:00:00 I want to just make this very clear that my perspective on AI and automation is that there's an upward spiral. When you have more energy, you have more intelligence, you have more capability. These drive all the metrics of human flourishing up. They don't take. So when we think about the answer, when will you get lights out manufacturing? I think the answer is never because people will always find new things to do with the tools that we've built, even very powerful tools that that can think and maybe are even self-aware. These will only increase the number of jobs, the increase wages, but there'll be different

Starting point is 00:00:39 kinds of jobs. There'll be the sorts of things that maybe we can't even imagine now. This weekend startups is brought to you by Intouch CX. Looking for ways to make your startup more efficient? Intouch CX has a groundbreaking suite of AI-powered tools for end-to-end optimization to give your business the edge it needs to thrive. Get started with your free consultation at in-touchcx.com slash twist. Fount.

Starting point is 00:01:06 Do you want access to the performance protocols that pro athletes and special ops use? With Fount, an elite military operator supercharges your focus, sleep, recovery, and longevity, all powered by your unique data. Want a true edge and work in life? Go to Fount.bio slash twist for $500 off. and dot-tech domains has a new program called Startups.Tech, where you can get your company featured on this week in startups. Go to startups.com slash Jason to find out how.

Starting point is 00:01:40 Hey, everybody, welcome to this week in startups. We've been focused a ton on AI this past year. Of course, we talked about it over the last decade on the show, but things have heated up with language models and, you know, the very forgotten, category of startups is, of course, robotics. We see once in a while on the internet a trending video. And the trending video tends to be one of Boston Dynamics robots doing a backflip, or we see maybe some surgery being done on a grape. You've seen all these viral videos.

Starting point is 00:02:17 But the idea of humans leaving the factory floor and going and doing things in the real world, but we don't see many startups doing that. We have one in our portfolio, Café making coffees at SFO's airport right now. And of course, our friends over at Tesla are making Optimus, and there's another startup called figure. They're working on a humanoid robot. Sanctuary AI is today's guest. They're another startup working on this problem, and they're specifically focused on

Starting point is 00:02:44 building robots with general intelligence. What does this mean? Well, it's not verticalized, and they're not just making a cup of coffee, but what if these robots could solve problems in the same way? do as biological creatures, as human beings. And if that works, well, that's going to have the economic impact of humanity. And it's going to go well beyond just the steam engine. And we have the founder, or I should say the co-founder and CEO, Sanctuary AI on the program. His name is Jordy Rose. Welcome to the program, Jordy. Thanks for having me. Great name. I am reminded of the

Starting point is 00:03:25 Mark Knopfler lyric from the amazing song Selling Philadelphia where he says, I am Jeremiah Dixon, I am a Jordy boy. Do you understand the reference Jority Boy? I do. I do. Yeah. Yeah.

Starting point is 00:03:41 So let's talk a little bit about the company and I know you were founded in 2018, so you've been working on this for a while. He's raised close to 100 million bucks. Where are you at with building this humanoid? robot and I would love to see the latest.

Starting point is 00:03:58 I should start by saying that our approach to the problem and the reasons for us working on it are slightly different than most people who work in robotics. For us, the motivation for doing it was a belief that human-like intelligence and more generally the intelligence of animals, which is kind of our model for what intelligence means, is very intimately tied to our presence in the world. We have a body, we are a thing. We experience the world through our sense. senses, we develop understanding of it through interacting with it, and then we act on it to

Starting point is 00:04:33 achieve our goals. All of those things are very difficult to do if you're not actually physically present in the world. So the starting point of this, which actually goes back more than a decade now through two different companies, was to explore this idea that intelligence, by which I mean general intelligence, emerges as a consequence of having to deal with the real world. The real world, you never see the same thing twice. You have to be able to generalize from your previous experiences to new experiences.

Starting point is 00:05:04 You have to be able to understand the common sense ways that the world is. So we've been building software, which you could call general intelligence or AI, but it's also control systems for robots. And we've always viewed the problem of artificial general intelligence through that lens is that for us, a true general intelligence can be thought of as a control system for a robot, that converts what it sees, hears, touches, feels about the world into actions that are intended to reach goals. So for us, the robot is some of a means to an end. And because of that thesis, we focused almost exclusively on a very hard but very fundamental problem,

Starting point is 00:05:51 which is the building and use of hands. So much of the humanoid robotics videos are performative. They show robots doing things, but they're not valuable things. And for us, I think that the key value of doing this is to understand how an entity, a robot or a person, understands the world well enough to be able to manipulate it with its five-fingered opposable thumb hands. You know, I believe that the hand was a bit of, played a big part in our technological evolution and also in the development of language,

Starting point is 00:06:28 which are related things. So that's how did that happen. How did the hand play a role in language? I'm curious. Was it writing or the ability to hold pen? Speaking. So how does a hand help you speak? Yeah. So there's, although this is speculative, there's a lot of evidence that the the earliest spoken language was very strongly connected to the things that our hands do, like point, touch, grasp, and so on. And some of the evidence for that is in neuroscience where the part of your brain that controls the grasping or the use of the hands overlaps with your language center. These things are not disconnected.

Starting point is 00:07:11 And when you actually try to build a system that touches, feels the world, and can interact with it in the way we do, you see this explicitly, that the cognition, you think of it as the domains of intelligence, many of them and maybe all of them are required in order to do something with your hands. It's a remarkable thing that planning, reasoning, logic, all of these things are connected to the way that we interact with the world through our hands. That's fascinating. Like when you were saying that, I was thinking, so I put my hand on my chin. And then if I were to, if you and I were navigating the world, we were you know, early settler or somewhere, we might point towards the direction we want to go,

Starting point is 00:07:55 or I might put my hand on my chest to refer to myself, or I might put my hand out and my palm up to refer to you in some sort of gracious way. Is that what we're referring to, this sort of instinctual thing that happens with our hands as we're talking? Yes. Our view in the position that we've taken is that the hands and their use and the mind are interwoven in an inseparable way in people. So if you want to understand human-like intelligence, the kind of intelligence that we have, the hands are the appropriate starting point. And that's why we focus so much on them.

Starting point is 00:08:32 Now, you ask to see some things. I can actually show you one of the hands. All right. So I see on the screen here, you've got, yeah, very interesting looking hand, five digits, yeah, four fingers and a thumb and a palm. And it looks like something out of Terminator, but a little more elegant, in fact. Well, I think that I would not characterize it that way. I think that the way that we imagine this hand is that it's the best that the technologies that the global community knows how to build.

Starting point is 00:09:10 It's the best that we can get to human hands. There's a lot of things in this hand that aren't immediately obvious just by looking at it. And those are mostly about the sensors. Our sense of touch is a very important thing for our intelligence and how we are in the world. We tend to take it for granted because it's always there. And when we look at screens and things, you know, people are very visual and they think about the world in terms of seeing, which is fine. But there's an interesting observation that seeing is about the future. It's about planning because the things that you see are away from you.

Starting point is 00:09:47 So, for example, if you look at a cup and you want to pick it up, the part of your brain that plans thinks about the future, but touch is a little different. Touch is an immediate thing that's in the now. Touch is not, doesn't have foresight. It's all about the present moment. And when you make contact with the world, say you're seating in a chair or you're picking something up or you're turning a Rubik's cube in your hand, the sense of now is, is intimately connected with the touch sense. And without it, you can't be who you are. So this is an important thing for building robots, is that touch is not a second-class citizen

Starting point is 00:10:26 if you're trying to build a system that behaves and thinks like we do. And so these hands are covered in very sophisticated touch sensors that allow them to feel the world, something like we do. Yeah, so when we're looking at each digit, I guess we have a couple of knuckles. And so the tip of your finger has one, pad, then that middle

Starting point is 00:10:48 knuckle, I guess, has another pad, and then there's a longer pad in that third spot. So if you're looking at your own hand, you have those sort of three segments of a finger. They each have a pad on this robot, and the thumb obviously has the same configuration. And I guess when they touch each other, that's telling

Starting point is 00:11:04 it something's hitting that pad. Is that correct? It's more general than that. Yeah. So the sensors in your hand are not just about yes or no question about you're touching something, they're very rich. You can feel temperature, you can feel if something's sliding past your fingers, which is very important when you're trying to hold something or turn it in your hand.

Starting point is 00:11:27 Imagine trying to put a key in a lock and turn it. A lot is going on in that thing in your brain, and a lot of it is driven by touch. If you didn't have the sense of touch, it would be very difficult to insert a key into a lock and turn it, even something as simple as that. Because of resistance, right? You have a certain resistance on either side and the or the bottom of your finger where it's touching the key. I'm imagining this as you're saying it. Yeah, the way that we do things in the world is not, we take it for granted. There's a thing called Morvex paradox where the things that we take for granted and are easy

Starting point is 00:12:01 are actually some of the hardest problems that there are. And the reason they're easy is that we've had a billion years of evolution to create a system that is fine-tuned to be able to deal with things like, you know, picking things up, putting things in things and things like this. But the reason why we have AI systems that can write at the level of GPT4 or create images, you know, that from scratch that are as beautiful as any of human artists could draw, the wonders of the digital age. But we don't have a robot that can do laundry is that doing laundry is a fundamentally much more difficult problem than any of the ones that modern AI has managed. to master. Yeah, it's fascinating when you think about how complex this is.

Starting point is 00:12:50 And that paradox you mentioned, that seems like, you know, like a fascinating evolutionary moment. These systems are so complex that they must be automated. Because to actually, with cognition, to try to think, okay, I'm going to have to give some resistance as this key goes into it. And I'm going to have to feel like click a couple of times and then I'm going to have to twist it left, but I'm going to need to put more pressure. pressure on the inside of my index finger versus my thumb.

Starting point is 00:13:18 And if I put too much pressure, I'm going to break the key off in the locks. I mean, it's incredible when you think of all of that occurring. And it's occurring in just an automated fashion. It's just a chunk of a task. Open the door. It's not even one task either. It's probably open the door, which is the key, getting taken from your pocket, being put into a lock, twisting, open the door, closing it the whole shebang.

Starting point is 00:13:40 It's just abstracted into one instruction set, huh? Well, that's an interesting phrase, and I'm glad you mentioned it because that's the way that most serious embodied cognition efforts work is they have an idea of an instruction set, which is very similar to the way that processors work. I worked my first half of my career in building computer systems, and the marvels that we've built and the computing side are fundamentally based on a very non-trivial fact that every program that you're, could write boils down to the execution of roughly order 100 different tiny little programs that just happen in different orders. So every program that you can write on a computer is basically composed of only about 100 building blocks in modern processors. The reason you can do that is that processes of a natural way to turn the analog nature

Starting point is 00:14:36 of the world into a digital character that allows error correction, which is the fundamental the reason why you can do all the things that we do today, that in the computer, there's a thing called a transistor, which is the basis of this digitization, going from things that are just any number at all when you measure them like voltages and currents to something that's only a zero one. In motion, that is the taking of actions in the physical world, you need to be able to find a way to do that same thing. And so what we've done is created this type of instruction set, which is a very small number of building blocks that you can compose in different orders to create massive complexity of tasks and potentially all of them.

Starting point is 00:15:17 So if you think of a robot moving through the world or a person as a program, then you can imagine that any program could be written in, say, maybe just a hundred things in different orders. And that's what we're trying to do here is to figure out what those hundred things are. And then use a technique called task planning, which is the idea that given a goal, like say, I ask the robot in natural language to do something. something, the robot can figure out how to sequence the things that knows how to do in order to achieve the goal and thereby achieve general intelligence. Because if I can ask the robot

Starting point is 00:15:52 to do anything at all, like in the human sphere, and the robot can actually perform the task, then it would be fair, I think, to think of these things as being, as reaching the goal of having general intelligence. By the way, I should mention that there's a concept that David Chalmers, who's a philosopher, introduced called a philosophical zombie. where a system can have the appearance of being like us in the sense that it can do things like we do, but it doesn't have the first person conscious experience that we have. So there's a lot of mysteries about the relationship between being able to actually achieve goals, do things, and whether that is related or different than the experience we have of being people,

Starting point is 00:16:35 this thing that it feels like to be a thing. That's a deep, deep mystery. All right, listen. efficiency needs to be top of mind for every founder in 2023. Fundraising is drying up so you need to extend your runway. And one great way to do that is automation. But it's hard to apply automation in your day-to-day operations, isn't it? So here's an amazing solution.

Starting point is 00:16:58 InTouch CX provides easily integrated automation tools for customer support. You're wondering how it works? Well, let me tell you. InTouch CX provides automated and live chat, email, and voice support. this eliminates unnecessary process and cost and it will make you faster. IntouchCX will streamline your customer support process, cut back on repetitive and time-consuming tasks, and increase productivity by 30%. And it's going to simplify your business.

Starting point is 00:17:27 So refamp your workflows with Intouch CX. Intouch CX partners are experiencing 45% average cost savings in customer support ops so far. Find out how Intouch CX can improve your startup's efficiency. get a free consultation with their automation experts and get started at in touchcx.com slash twist. That's in touchcx.com slash twist. Yeah, consciousness, the big C is for philosophers and religious people. What is consciousness? Is it just some illusion that we're having in this brain of ours,

Starting point is 00:18:03 which is a collection of a bunch of subroutines as you're sort of alluding to here? or is it, you know, this God molecule in our brains making us sentient and driving us to do things? I guess this is one of the exciting things about AI is that we're in some way, or in your case, quite literally, trying to deconstruct and then reconstruct what is happening in cognition. But it has to start with, hey, pour me a glass of milk. So it has to know what milk is. Easy enough to do now with visual computing. but poor, okay, we know what that word means.

Starting point is 00:18:40 It's moving some liquid from one place to another, and then we have to then, of course, make it do that accurately. So if we were breaking down a task like that, and you said there's about a hundred things you're teaching it, what are those little subroutines or those microbehaviors or you used a term for it? What was the term you used? We call them micro policies.

Starting point is 00:19:03 So in the world of... Microlices, interesting. Yeah, in the world of reinforcement learning, which a lot of this is grounded in. We came up through the reinforcement learning school of thinking about cognition. A policy is an action that you take from a current state. So it's a prescription for how you act,

Starting point is 00:19:23 given the observation of the world that you have. So these micro-polices are a collection of very specific types of behaviors, like say, for example, turning a key in the lock, that we train individually in isolation from any other use. So the way that works is that we take the robot and a person who is teleoperating the robot, which is a process of a person controlling and being kind of immersed in the robot, receiving the senses of the robot and moving the robot through a rig, which is another type of robot that the person is strapped to.

Starting point is 00:20:01 So the person moves the robot to accomplish the task. because a person knows what it means to pick up a key input in a lock and turn it. And we collect order hundreds of episodes, which are instances of solving that problem. And then we use that to seed a thing we call large behavior model. So large behavior model is much like a large language model, except the fundamental data is the data of experience. It's vision, audio, pro preception,

Starting point is 00:20:30 which is the information from the servos and the robot where it is and so on, how fast is moving and touch haptics. So if you can use the same idea where you take a bunch of data and instead of predicting the next word or token in a text prompt to response, you take the past, which is the things that have happened to the robot and you predict the future, which are the sort of analog to the large language model predicting the next tokens. but in an interesting twist, the predictions of proprioception are predictions of where you will be, how you will move. And you can then send those predictions to the actual motors and the motors can move.

Starting point is 00:21:15 So that one of the most fundamental change-shurning points in my professional career happened about 10, 12 years ago, when I and some colleagues read a Jeff Hawkins' book called On Intelligence, which was really the first thing that I read that made sense about a potential model of human cognition. And central to that story was the idea that our brains are predictors, is that we imagine the future,

Starting point is 00:21:44 and then we implement the imagining. So if I decide to pick up a cup, my brain is predicting how my motor's signals will fire, and then it sends those predictions to my muscles, and then I perform the task. So these large behavior models that we and others are working on now are of this sort is that they predict the future based on the statistical properties of the data that they've looked at,

Starting point is 00:22:08 and then they execute the tasks, and they work quite well. So for a human being, we're going to perform a task. We're going to pour this glass of milk. We will, in our minds, and we do this either consciously or maybe even subconsciously, okay, I'm going to be pick up the glass, I'm going to pour the milk, I'm not going to spell it, it's going to pour in some kind of an arc, I'm going to watch it fill, I don't want to splash, I'm going to stop pouring at a certain point, and you kind of visualize this movie in your head, this potential

Starting point is 00:22:38 future behavior. And so our minds are so powerful, they can actually essentially play a scenario, almost like a screenplay, like a little vignette, and then our muscles actually go play that routine? Is that the concept here in terms of intelligence of what happens in our brains? Yeah. So an analogy would be, let's say you take a piano keyboard and each of these little micro policies that we're talking about are one of the keys on the keyboard. Your brain, your mind, because I want to make a distinction here is that I've come to believe very strongly recently that the mind is a creator of stories about the future. you, the awareness that you are, your conscious presence is not your mind.

Starting point is 00:23:29 It's a different thing. You touched on it briefly. I don't have any proof of this, but I'm much more of the mind that the, there's a big mystery there about what it is to be the thing that is you. But it's not your mind. Your mind is a machine, just like your heart. And the job of your mind is to produce stories. So in the analogy to the piano, think of the mind as creating sheet music. when the and then the sheet music is automatically put on the piano and the keys are played and you hear a melody or a song in the analogy the song is the behavior like for example picking up a glass of milk and drinking it all of the behaviors that we exhibit in this model are all different songs that are generated by pressing the keys in different orders and with you know different different styles so the mind is is the creator of the sheet music that it

Starting point is 00:24:24 It does always, this is, brains are always doing this. And then it's played on your body like a song. And our conscious perspective is sometimes not aware that these are separate things. You know, we have difficulty introspecting our minds and our behaviors for a variety of reasons. But I think that after working on this problem for a long time and seeing the, the synthetic analogs of us, how it actually works in robots. I think this is a good model, is that your mind is a machine for creating sheet music that is immediately played on the instrument, which is you,

Starting point is 00:25:05 and your awareness is a separate thing that kind of watches this. And sometimes we get confused and we think we are the mind, but I don't think we are. I think we're something separate. So there's the mind, and then there's the machine. And this machine is going out, conceiving of these tasks, executing on them, playing the sheet music, running through the script. I use the analogy of a film. You're using the analogy of piano.

Starting point is 00:25:29 But the script gets played. The sheet music gets played. It happens. But our awareness that I am a human, I am Jason, you're Jordi. We are on a podcast. We're having a conversation. I'm trying to understand what you're doing. You're trying to understand my questions.

Starting point is 00:25:44 And then there's going to be 100 or 200,000 people who listen to this. And they're also going to try to that. That's consciousness. the awareness of each other and ourselves and our place in the universe and that there is even a universe. These are two different things. But for some reason, we perceive these two different things that are occurring, the mechanical execution of tasks through this very interesting project, process, which you're now recreating,

Starting point is 00:26:09 is different than consciousness. And consciousness, who knows when we're going to ever figure that out or if we can figure it out, this idea that we're aware that we are a living being. But we can figure out, at least at this point in time, it feels to you like we're going to figure out and we're close to figuring out how our brains break down complex tasks and do them so elegantly. Am I reframing, am I repeating that back to you correctly? Yes, that's right. There's a spiritual leader, I suppose you could say, called Eckart Tolle, who refers to the first thing, which is the not knowing that you are. are different than the plans that your mind makes as being unconscious is the phrase he uses.

Starting point is 00:26:57 And I think that it's a natural state of people is to not be aware that the mechanical following of scripts, which is most of our behavior, and that's the sort of thing that you can, there's a shot to doing in a robot. So I'm fairly sure that we can build machines that can do all work, like all of it, at least as it's currently, you know, understood, things like automotive manufacturing, logistics, bringing parcels to your home, I think that all of those things are within scope to do within, say, a decade, at least have the capability. So this idea of building a thing that appears as though it's intelligent and does all the things that you'd want, that's within reach. But the thing

Starting point is 00:27:41 that I'm really kind of taken with is this other notion. You know, I used to be a theoretical physicist a long time ago, and I worked on foundational problems and quantum mechanics and general relativity. And I've always had, you know, at the base of who I am, I've always been interested in understanding how things work at some fundamental level. And it's always bothered me that all we ever experience of the world is this first person thing, the feeling of being you in the moment. But we don't understand that at all. And I think that this neglect of what is probably the most central direct experience we have of the world means that there is a discovery waiting to be made about the relationship between our experience and notions of space and time.

Starting point is 00:28:26 And I think that this project is somehow, in some ways aimed at that, is that it sort of starts from a weird spot because you see these robots and there's mechanical hands. And then I'm talking about, you know, some fundamental relationship between the emergence of space and time and how we perceive it through our conscious perspective. And as he seems to be not related, but I actually think that they are very tightly related. You see these blue light glasses I'm wearing, I'm not wearing for style, although they are very stylish. They've totally changed my life. Why? I started having headaches, right?

Starting point is 00:29:02 And had eye strain. So I got these blue light blocking glasses that do a little magnification because I need readers. Yeah. I look nuts. But my eye strain's gone down. My headaches have gone away and I'm sleeping better. Do you know how I got on this? I got on it because I now have a health coach.

Starting point is 00:29:19 Who's my health coach? It's found. F-O-U-N-T. It's a health company that's created custom health and performance programs that are tailored to your body, obviously, also your goals, and they take into account your lifestyle. My coach is incredible. I text with them all the time. They did a blood work for me.

Starting point is 00:29:36 They check out my wearable data. and we do weekly calls to see if I'm on track and getting the results I want. They also told me about some supplements I should be taking based on the blood work, and they do it at a fraction of the cost. We upgraded my diet. I'm doing a little more protein. We've optimized my sleep. That's great.

Starting point is 00:29:52 I got the supplement packs. I feel great. I feel like I'm in control of my destiny. If you want to be like me and you're concerned about your health and you want to just try to do better, have some experts on your team. Build your own program. Go to fount. BIO slash twist.

Starting point is 00:30:04 That's F-O-U-N-T-B-I-O-S-T-T-T-T-T-T-T-T-T-T-T-G. get your free consultation, mention Twist, you get $500 off your first month, and get your own personal health coach. Health is well. And if you're running a startup, if you're a CEO, if you're a capital allocator, take it seriously. I love this service. Fount.

Starting point is 00:30:20 Bio slash Twist. Well, if we think about the experience of being human and our place in this universe that we're trying to figure out, performing the tasks, as you said earlier in our conversation, is how we navigate the world. And it's how we are actually doing this act of trying to figure out what it is to be human. And this all then starts to open up all kinds of possibilities, free will.

Starting point is 00:30:47 When we pour that glass of milk, when we play that sheet music, where is our decision to do that occurring? You know, what parts of it are automated, which parts of it are just wrote and just get executed on? And so it does open up, and I agree with you,

Starting point is 00:31:03 this is the question. that we will always try to figure out. And this is why science fiction, you know, always winds up here, which is what does it mean to be human, whether we're talking about Blade Runner or Prometheus and the Alien series and really Scott's take on it. So let's get back to reality here. When you're training the robot, you are not saying, hey, we're going to pick up a tennis ball here.

Starting point is 00:31:29 If we're going to pick up a tennis ball, it has this size, therefore we're going to program it to pick it up. That's what people did with robots before they very, very, very much. explicitly had to do some very narrow verticalized task. You're having a human being, like the guy who played Gollum, I guess, in Lord of the Rings, use gloves or something to send the instructions to the robot's actual physical actuator hands, and they're incredibly sensitive and have those pads on them. So we're teaching it, hey, I'm going to just pick up Andy Circus.

Starting point is 00:32:05 was a guy who played a Gollum. We're going to actually just pick up the tennis ball, and then the AI that we train is going to know what happened, and that's what's going on here? Yeah, so I have got a, this might be helpful. Can you see the video? Yeah, yeah. So this is a robot, yeah.

Starting point is 00:32:23 Yeah, so this is Phoenix. And what you're seeing is the, this process that we're talking about, where there's a person in a suit, they have haptic gloves with force feedback. so they can feel the world. They see through a heads-up display, so they can, if they feel like they're looking through the eyes of the robot,

Starting point is 00:32:41 and they're connected to it, their own robot, that when they move, moves the robot in an analogous fashion. So the, yeah, so this is what, this is the, what it looks like when you watch the robot side of teleoperation. You can see that these machines are, um, uh, capable of doing lots of different things.

Starting point is 00:33:01 I mean, that might not be obvious from watching this, but the systems are, nearly capable of doing anything that a person can do under this type of control, as long as they don't have to move around the world. This is focused on the upper body stuff and the problem that I mentioned, which is the dexterous manipulation of the world. And you're seeing it there, you know, basically pick up an object and then scan it with a barcode as if you were working in an Amazon, let's say factory and shipping and packing boxes,

Starting point is 00:33:29 or even doing something as delicate as using a Ziploc bag, which we do unconsciously. We feel it. It feels like the Ziploc bag, yellow and blue, made green, and you have that color system, but you also have the feeling of it. So humans do the tasks, and then take me to what the software then does with the human having done the task. What does it do next in terms of building a model

Starting point is 00:33:56 to then go do the next thing in the world? Yeah, so imagine you have a Reddit post, which is some sequence of words that someone says, I really love Diablo too because Amazon is my favorite character. So somebody's written something. That sentence is the expression of a thought that a person has had into words. Now, when you train something like a GPT large language model, that sentence is used to help figure out the statistical likelihood of each word, let's just keep it simple, following the preceding ones.

Starting point is 00:34:36 And if you give this model enough words that people have written, the expression of their thoughts, then if I was to say, type in a prompt, which is my favorite game is, then of all of the words that have ever been written

Starting point is 00:34:51 to some approximation, there's a probability of what the next token will be given that prompt. And then the thing can unroll, which means I put the next word in, and now I ask with all those four words, what's the fifth word? Okay, put the fifth word in. What's a six? And each time it's a probabilistic thing.

Starting point is 00:35:07 So you roll a random number and you pick the thing that the random number says it should be. So with this type of model, these large behavior models, the data is a little different. It's the data of the sequence of successive nows. It's the time data from the person performing the task. So if I ask the robot to open a Ziploc bag, let's say that's. the micro policy that we're going to train. So a person picks up the bag from the table through the robot, opens the bag and, you know, maybe pulls it open a little bit. So that then becomes the analog of a sentence. It's a piece of data, which we're now going to use to train a model,

Starting point is 00:35:47 where instead of predicting the next word, we're going to predict the next sequence of actions, and we unroll the same way we would a sentence. So every successive prediction becomes a movement pattern for the system. And in this case of the kinds of things we build, while it's similar in some sense, there are some very big technical difficulties in actually doing this that require the synthesis of many different kinds of artificial intelligence advances.

Starting point is 00:36:20 For example, you could send the pixels from the camera in at every step to one of these models. but the pixels are not they're not the thing that you really care about. What you really care about is where are the things and what are they? Which is a much lower dimensional thing. So machine learning computer vision techniques have been developed that will take the camera feed

Starting point is 00:36:43 and extract what you could think of as the semantic or important information about the scene and those are typically the things that you put into these types of models. And that's not just true for vision. It's true for haptics and audio and pro preceptions. So on the audio side, the obvious thing is if a person speaking, you could use the actual audio waveform, but you could also use the text. And text is a much more compressed and high quality version of the data than the actual audio itself. So we tend to do text extraction from speech before we send them into these types of models as well.

Starting point is 00:37:18 That's fascinating. So you can know with machine learning, hey, there's a bag in this scene and the bag is open. and but the bag is upside down so it needs to be up we should flip it around so the things don't fall out of the bag etc and so where are you at

Starting point is 00:37:33 let's I think I understand what's happening here in terms of the language model analogy and then just translating that into predicting the next best thing to do and so

Starting point is 00:37:46 where are you at in terms of training this in the real world I assume that factories and the example you give looks like a you know packing and shipping, probably one of the most boring, monotonous soul-crushing jobs a human being

Starting point is 00:38:00 could have, so why not give that to a robot? And sure, you could do it 24 hours a day or whatever the robots are going to be capable of. So where are you at in terms of taking this and actually having it at a fulfillment center, packing boxes and making sure that it scans them and puts the right objects into the box and then ships them onto the next person and being on this distribution center floor. To be clear, the initial go-to-market is in automotive manufacturing. It's not in logistics and retail. We focus almost exclusively on that with one exception. In automotive manufacturing, if you take a look at a video that, say, Toyota makes of their factory floor, automotive manufacturers is one of the most automated systems there are in any industry. But if you watch

Starting point is 00:38:52 what actually happens, there are hundreds of of thousands of people in automotive manufacturing facilities all the time. The question is why? Why aren't they being automated? So when you look at what they're doing, there's kind of two categories of answer. One is it may be beyond the bounds of science. We may not know how to do the thing that they're doing. But there's another answer, which is that often people are used to connect machines.

Starting point is 00:39:19 So let's say I have a machine for stamping apart and I have a machine for putting making the part in the first place. So moving the part from the one machine to the other machine is a very difficult process that involves all of these things that we talk about. You need to be able to know what a thing is, where it is, localize it, use your hands to pick it up, sometimes out of a cluttered mess, put it somewhere, which often requires putting something on a jig, which is a difficult thing. You need to be able to move around and so on.

Starting point is 00:39:47 So a lot of the work that's done in automotive manufacturing specifically is a combination of different solved problems that have never been put together in a way that you could make economic. And one of the key factors of these general purpose machines that we and others are building is that this is exactly the kind of thing that's required in order to actually do this for real. Is that if I was to spend all my time and energy building a machine that did one of the things that somebody in this factory floor does, it would be very difficult to build a business. But if you could build a machine that could do, say, 50 of the kinds of things, now we're talking. So our initial use cases are nearly all of this sort.

Starting point is 00:40:31 They're automotive manufacturing. They're the connector problems where you're moving between machines with parts or things of material. And even the things that aren't automotive manufacturing that we've looked at all share the same feature. For example, in warehouses, which was my last business built robots for e-commerce distribution centers, there's a problem called induction. So induction is the problem of taking things that usually come off trucks, you know, big pallets and just stuff. You know, imagine all of the things that you could buy on Amazon coming into a warehouse. And then taking them from their point of delivery and then getting them wherever they should be in the system,

Starting point is 00:41:15 on a shelf, in a box, whatever. So induction is another kind of problem that's related to this, where you're dealing with a system, things that you need to manipulate with your hands, opening boxes, closing boxes, putting things in boxes, taking things out of boxes, and so on.

Starting point is 00:41:29 And so that's another category of things that's related, but nearly everything we're doing now is helping automotive manufacturers dramatically improve the efficiency and productivity of their workforce. We're back with another pitch it to J-Cal. This is the segment brought to you by our friends at dot-tech domains. Dot-Tech domains are giving twist listeners the chance to show off their startup on this week in startup. So go to

Starting point is 00:41:59 startups.com slash JSON. That's startups.t, t-e-ch-j-j-j-j-j-j-j-j-tason to apply. There's only one rule. You need to have a dot-tech domain name to get featured. This week, I received a great pitch from label drive, which you can find at label, LabelDrive. Tech. Label Drive helps other companies manage their AI data, and they've built a tool for collecting and labeling data that's especially focused on identifying and catarizing objects to save your time, save your money, and build better products. And as we all know, that's crucial for AI training. So I want you to go right now to label drive.com. And if you're interested in getting featured on this weekend startup with your new.combe domain name, I want you to go to

Starting point is 00:42:44 Startups. Tech slash Jason and apply today. That's Startups. dot tech slash Jason and fill out the form to apply. You know, if this works, when do you think you'll have the ability to have the robot, you know, find those 50 different things to do? Let's say you nail that and it feels like you're well on your way. The first question, when do you get that solved and in factories and just doing it day in and down. The plan that we've got takes us from where we are now to the full automation of

Starting point is 00:43:22 important tasks by which I mean there are markets for say billion dollars of annually recurring revenue for us. So let's say that's a kind of a thing that we want to target. We enter into agreements with our customers where the first step is that we mock up their situation in our facility here in Vancouver. Think of it as a digital twin, or not a digital twin, a real world twin. There's also a digital twin, by the way, but the real world twin. And then the processes that they pay us to show that we can automate using this type of thing, the kinds of tasks that they want as a first step. So there's a, there's a period of roughly two and a half years that we see where we go from where we are today to being able to really do something for real in the

Starting point is 00:44:11 of the sort that you could then scale. So that's the first step. When we start scaling is likely around the middle of 2026, where you're going to start to see the increasing number of these types of machines actually deployed inside automotive manufacturing plants, contributing to the productivity of the plant. So this is the plan of record. Now, I've done quantum computing and all sorts of things

Starting point is 00:44:39 where it's very difficult to predict how things will go. So in something like this, you have a plan. Things could go faster. It could go slower. It's unclear. But that's what we're aiming at. I think you'll start to see the beginnings of large-scale deployments of these machines somewhere in 2026. And so 2026, you start seeing the deployment of these.

Starting point is 00:45:03 And then when do you think factories start to remove humans? I guess they call that the lights out moment. you don't need to have the light, you don't even have to install lights in a space. I know it's funny, but when do you think you have that lights out moment and factories don't need to have humans in it? So I want to make a point about this.

Starting point is 00:45:21 There is a myth that AI and automation reduces labor. It's not true. Throughout history, there have been a series of moral panics where the next big technology thing is believed to do something terrible to employment. It's never happened. Every single time there's been a new thing introduced. And I think that the central reason for us thinking this is that it's the lump of labor fallacy,

Starting point is 00:45:49 the idea that there's a fixed amount of work. And if you give the work to the robots, there's nothing left. That's simply wrong. The way that it actually works in practice is that when you give, say like you give a bunch of labor, like I want to build 80 million cars. So that's a a fixed amount. Let's say we could do that all with robots. The amount of work that's available for the general human population expands as a consequence of that. It doesn't shrink. So I want to just make this very clear that my perspective on AI and automation is that there's an upward spiral when you have more energy, you have more intelligence, you have more capability. These drive all the metrics of human flourishing up. They don't take. So when we think about the answer,

Starting point is 00:46:38 when will you get lights out manufacturing? I think the answer is never. Because people will always find new things to do with the tools that we've built, even very powerful tools that can think and maybe are even self-aware. These will only increase the number of jobs, the increase wages, but there'll be different kinds of jobs. There'll be the sorts of things that maybe we can't even imagine now that are made possible by these things. Like, look at the internet, 20 years ago or 30 years ago. This is a great example. Yeah. Yeah. Now we have an entire podcasting history. We have people who take pictures or, you know, there's an incredible company called Song Finch. What they do is you go there and you tell it you want to make a song for your mom or your dad. They pair you with

Starting point is 00:47:23 an artist and you pay them $200 and they'll write a song about your mom for her birthday. That's very cool. Well, I mean, just there's a, there's, there's, humans out there, and I guess these used to be bards or, you know, Corchesters or whatever who would do these kind of tasks as well, but we find things. And you're just thinking about your robot and, oh, well, we have this new problem, forest fires. Well, how are we going to, you know, clean up the, how are we going to rake up as our former president, you know, joked of, you know, how are we going to, how are we going to

Starting point is 00:47:50 rake up all that debris under the trees there, you know, in the mountains in California? You know, if somebody had 10 of these robots able to do a test, them, I'd say, oh, you have an interesting idea. Maybe we could clean up and do some deforestation with them. And they will eventually, in your mind, a decade from now or two decades from now, not just be in factories. They'll be in our lives. They'll be side by side with us

Starting point is 00:48:13 solving problems in the real world. Yeah, that's the ultimate vision here, so they can leave the factories. Yeah, I think of them as being a kind of thing like the automotive industry, where at some point they'll be ubiquitous and parts of, in our entire civilization will be built in synergy, with this new thing, like we did with cars,

Starting point is 00:48:32 you know, roads and so on. By the way, I wanted to mention that this happened, this business of the job upgrade happened to me. When I was starting school, there were no quantum computers at all, except maybe theoretically. And we started a company to try to build one.

Starting point is 00:48:49 This is an example where the, we probably hired about over time, I don't know, maybe 300, 400 people who had PhDs and physics in that company. That, and this is D-Wave, yeah. This is D-Wave, yeah.

Starting point is 00:49:04 That, that was a new kind of job that was created as a consequence of a revolutionary new idea. So this is the sort of thing that always happens with innovation is that, and I'm kind of emphasizing this a little bit

Starting point is 00:49:20 because we're at a very weird time right now where there's an attempt to do regulatory capture and artificial intelligence. It's a very dangerous idea, this idea of de-acceloration or stagnation or holding back, which are connected to ideas of the old ideas that were rooted in communism.

Starting point is 00:49:38 These are very dangerous social ideas that I think it's important that we don't stay silent about. And people like me who have very strongly helped beliefs that technology is the solution to maybe all of our problems, not only the ones we create, but also the ones that might emerge as a consequence. of our natural habitat, you know, global warming or meteors or whatever. The idea that we, the better we can get at creating new things, the better we are all is a very important policy idea that I don't think is being communicated effectively by the community of people who build technologies. There are a group of people who want to have the government, and they've specifically

Starting point is 00:50:27 gone to Washington and said, hey, please regulate us. to be the people who are at the, maybe at the forefront or some amongst the people at the forefront. And, you know, building a bunch of regulation into this would benefit the people who have the lead today, as opposed to say open source people or, or,

Starting point is 00:50:43 uh, folks who are coming up. Is that the thinking of what their motivation is? Because this is a group of technologists who are on the cutting edge. Why would they go to Washington and want to in, have a bunch of, you know, uh,

Starting point is 00:50:57 non-technical politicians. slow things down. What do you think their motivation is? I think the best answer to this is somebody that you had on Bill Gurley. Yeah. So his take on this

Starting point is 00:51:12 I really resonated with. It was one of the most, you know, sometimes you watch something and you're like, I'm disagreeing with everything this guy's saying right now. I think I would,

Starting point is 00:51:20 if people are interested in this subject and they haven't seen that, I would most definitely recommend it. Bill Gurley, all in talk. Yeah, we'll put on the show notes for everybody. Yeah.

Starting point is 00:51:28 But the regulatory capture is, what they're going for. It calcifies the winners as the winners. It builds up a mode for them. And this could be just cataclysmic for humans, right? We need this technology to solve problems. Yeah, and it's, and it's, I think that's the point is that the solving of problems comes from, from innovation and, and growth. And the, the forces of stagnation, the people who are pushing for not that are very strong right now. And I think it's dangerous because I think that my view of this is that civilizations metrics, like how well people are doing are very strongly correlated with growth.

Starting point is 00:52:15 And there's an idea that we have to slow everything down, which is, I think, a dangerous idea. I think that what would happen if we were to implement policies that were restrictive is the same thing that happened. I'll use an example with nuclear power. A lot of the problems that we face today in the global warming sense and catastrophic, you know, potential futures that we might be looking at are connected very strongly to the precautionary principle, which in the nuclear industry was, well, we don't want to build nuclear power plants because we're afraid of nuclear bombs, which is ridiculous, by the way, because it's not the same thing at all. not the same thing. Yeah.

Starting point is 00:52:58 And, you know, maybe a reactor melted down once or twice. Yeah, three mile island, yeah. You don't count all the deaths that happened in all these other industries, which were massively higher. If we had not done that with nuclear instead embraced it, we would not be where we are today. And so there's examples of this fear, which is a rational fear that can become policy that it could be very dangerous here, because I think that these technologies, these

Starting point is 00:53:25 technologies we're talking about, which is the AI robotics to a certain extent, but not just those things more generally. We should take the attitude that the upward spiral is the objective. We want more. We want more energy. We want everybody on the planet to come to the energy consumption of us. We don't want to reduce energy consumption. We want to increase it.

Starting point is 00:53:48 And then we want to increase everybody by another thousand times. And we need to be able to find ways that technology can enable that. then enable solving the problems that might come of it from second order effects like global warming. These things are all solved by innovation and technology. It's innovation and change is not the enemy. It's the, it's our friend. It's a necessary part. And it's connected to who we are as, as people. You know, people are explorers. We're adventurers. We want novelty. We want to go to, you know, to places that no one's ever gone, either literally or figuratively. And that is the essence of the human spirit to me.

Starting point is 00:54:25 And we want to be advocating for that as technologists and leaders in our fields. Yeah. And it's so paradoxical. I remember when I was a kid, all these great musicians who I loved, Bob Dylan and et cetera, did the no nukes concerts. And we really were indoctrinated into this fear of nuclear. And the second order effect is that we burned more coal and we burned more oil. and we heated up the planet

Starting point is 00:54:54 and now we're trying to solve the problem and the solution was there in the 70s and then sometime in the 80s we decided hey let's stop doing this and now 80s 90s 2000s we're sitting here four decades later and finally people are starting to realize 40 years later oh you know what

Starting point is 00:55:11 maybe that was a mistake should we start building these again and now we've got to reconvince everybody that we went on a 50 year side quest that made no sense and it's incredibly frustrating and you know it's yeah to some of our friends people have been on the pod

Starting point is 00:55:27 Sam Altman, Reid Hoffman Mustafa like I think they are misguided here we could have conversations about this right I mean there's nothing more with having a conversation hey how do you make nuclear safer hey could these robots I mean it sounds farcego

Starting point is 00:55:42 but could the robots escape and do bad things in the world sure we could have this conversation but that doesn't mean that we need to have a bunch of regulators come in and say oh, somebody in Washington's going to approve your language model and your code? I don't know doesn't make much sense to me. That seems like they're doing regulatory capture. I agree 100%.

Starting point is 00:56:00 And you know, the other thing I realized about what you're saying, Jordy, is there's something about solving problems, I realize in this conversation that when we were talking about jobs, and there's a sensitivity to that, with good reason, we ought to bait things and a large amount of jobs could go away quickly. And there could be displacement, of course. But when the mind and consciousness is left alone, our minds are designed in a very interesting way to think and find the next problem to solve. There's something fundamental about human consciousness and this brain and, you know, Darwin and evolution that our species survived, dominated, and evolved with something inherent in our code, which is understand the world and find the next problem to solve.

Starting point is 00:56:48 Is that any of that resonate with you? Oh, yeah. I mean, everybody who's light awake at night and they can't get their mind to stop spinning through all the negative scenarios that could happen. Everybody, I think, experiences this. You're exactly right. Is that this tool that we've got, this beautiful mind that does all these wonderful things, it creates the worst nightmare as possible about what will happen as a consequence of it working well. And so with technology, our mind spins up all of these horror science fantasy.

Starting point is 00:57:18 ideas, we turn them into movies like Terminator or Black Mirror, none of that is real. I think there's a very important, powerful message here is that the terrible stories our minds tell us when you lie awake at night about your personal life is the same process that generates fear about the outcomes of change. So when we do something new, we innovate, we discover something about the world. There's a natural tendency that all of us have. to imagine what might go wrong. And... Exactly.

Starting point is 00:57:53 Yeah. Yeah. So my, I would advocate for being aware of that is that it's a story your mind is telling you. The Terminator thing is not true. It's not real. It's not never going to happen.

Starting point is 00:58:03 It's just a story that somebody made up that resonates with our, with our base nature, fears and concerns about the future and so on. But it's not real. What's real is very different. Yeah. And in our mind. there was a reason this obviously existed, or the person who worried,

Starting point is 00:58:22 hey, I wonder if these berries are poisonous or not, or I wonder if there's something dangerous in that body of water, maybe I should be cautious. Yeah, a little bit of caution, thoughtfulness, probably extended life. And people who were reckless probably had shorter lives.

Starting point is 00:58:37 And so, yeah, the gene pool probably evolved this way. But you must be aware of how catastrophizing it is. I mean, people can get really wound up. We see this with social media presenting us with so much bad news in the world. Our brains are not designed to process that, are they? No, and this is an example of how technology can have unintended consequences that are negative. It's social media hijacks this propensity that we have to tribalize, to fear, to other, to see people other people as being different.

Starting point is 00:59:17 You know, what part of this idea of thinking of this conscious perspective that you have is separate from your brain carries with it another idea that we're all connected. You know, we all have this thing. We all share in it. The analogy that Eckart Tolle uses is that there's an ocean and we're ripples on the ocean,

Starting point is 00:59:35 but this ocean is the same for all of us. This idea is a powerful one when you're trying to think about why you're reacting in a certain way to certain things. You know, like the social media stuff is an amplifier of the negative aspects of how we function as people. But that doesn't mean that we shouldn't have done it. I think this is a point. Is it like you said before, we want to talk about it.

Starting point is 01:00:01 We want to have a frank discussion about it. But the solution to these things doesn't come from shutting things down. It comes from having this discussion and making good good, good clear-minded decisions about how to build, not how to build. One of the great paradoxes of all of this might be, we build up this AI and we get to some general intelligence. It might tell us, it's a non-zero chance,

Starting point is 01:00:29 it might explain things to us about our own consciousness, why we're here, and what consciousness is that we ourselves could not come to the answer. So we may unlock some mysteries, that explain our own existence in a way. And that is just to me would be a wonderful gift of accelerating this, you know, is what if this machine, what if this artificial intelligence can be more objective about us and can teach us something, right?

Starting point is 01:01:00 That would be a pretty mind-blowing outcome. I sure would. Yeah. All right, listen, continue success with this from, yeah, just working. on quantum computing and now to robotics and figuring out how to make these sequences play. It's going to be very interesting to watch your progress. And listen, accelerate it all. Let's go. I'm assuming you're hiring and this must be one of the most fascinating places to work in the world. If people are interested in learning more or maybe applying for a position to build this

Starting point is 01:01:35 out and accelerate human intelligence and augmented so beautifully. Where can I find out more? So I and one of the other founders of the company, Dr. Suzanne Gildert, have a podcast called the Sanctuary Ground Truth podcast. That's a place that you could look. We also at our website, sanctuary.a.I, there is a careers page. We are hiring and growing quite quickly. and there are positions for all sorts of different kinds of people.

Starting point is 01:02:08 We mostly hire technical people, of course, but there are some other things. And if anybody's interested, please watch the Ground Truth podcast and go to the website and check us out. Amazing. All right. And we'll see you all next time on this week and start. Bye-bye.

This Week in Startups - Reverse-engineering autonomy in humanoid robots with Sanctuary AI CEO Geordie Rose | E1832

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.