The a16z Show - a16z Podcast: The IQ and EQ of Robots

Episode Date: November 21, 2018

with Boris Sofman (@bsofman), Dave Touretzky (@DaveTouretzky), and Hanne Tidnam (@omnivorousread) We're just now beginning to truly see the the first 'real' robots in the home, from Roombas to toys to... companions to... well, much more. How are humans beginning to forge relationships with these robotic devices (/entities!) -- and how will those relationships develop? What do we learn as we begin to forge relationships and interact with robotic toys like Cosmo and Vector -- about robots, and about ourselves? And what do these learnings teach us about the possibility of adding a "personality wrapper" to new technologies? In this episode of the a16z Podcast, CEO and cofounder of Anki Boris Sofman, and Research Professor of Computer Science at CMU Dave Touretzky, discuss with a16z's Hanne Tidnam where we are in the human-robotic future, the history of robotics that has brought us here, and the next big breakthroughs -- in hardware, software, perception, navigation, and manipulation -- that will bring in the next waves of innovation for robots. The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments and certain publicly traded cryptocurrencies/ digital assets for which the issuer has not provided permission for a16z to disclose publicly) is available at https://a16z.com/investments/. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 The content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16Z.com slash disclosures. Hi, and welcome to the A16Z podcast. I'm Hannah, and in this episode we talk about robotics in the home becoming a reality. Boris Softman, CEO and co-founder of Anki, and Dave Turetsky, Professor of Computer Science at CMU, discuss with me the evolution of robotics, where we are in manipulation, perception, navigation, and most importantly, in the human relationships we will increasingly form to these new robot entities that will be in our homes. So you guys talk about this as sort of the
Starting point is 00:00:47 very first real robot companion. Let's go back and look at the first iteration, which was Cosmo. Why start with a toy? What does that represent for where we are now and ultimately where we're trying to get to in robotics. Yes. So when we started Anki, we always realized that in order to get into some of these really advanced applications, we had to take a bottoms up approach. And we couldn't just skip to this holy grail that's 10 years away. We had to think about what are the applications where we could really reinvent the quality of what's possible. Well, what is the Holy Grail? So there's all of these types of challenges that involve really deep breakthroughs in human robot interface or manipulation or AI. There's the large-scale, diverse, humanoid or like very competent robot for home or for
Starting point is 00:01:31 manufacturing or for other things. There's applications that involve very high-end manipulation in an environment. Where manipulation is today is probably where autonomous driving was five or six years ago, where you can already see the capability starting to take form, but the reliability and cost constraints are still pretty prohibitive. But you can kind of extrapolate where it's going. For us, we started with entertainment because that was a way to create innovative applications and actually build the muscle early on that we can then carry over into these other spaces down the road. Cosmo was a way to use the toy space, which hadn't seen much innovation at all, but for the first time bring a robot to life where the level of creativity that you're allowed to have
Starting point is 00:02:12 in a toy is much higher and you can build a lot of the blueprints for these products that start to have deeper applications and appeal. That's interesting that you feel there's a wider range of creativity there. than in something where we already have expectations. That's part of the holy grail is mobile manipulation. There aren't a lot of robots like that. A landmark in the history of consumer robotics was the Sony Ibo robot dog back in the late 1990s, early 2000s.
Starting point is 00:02:37 It cost about $2,000 back then, which was about the cost of a decent laptop. And this was the only little mobile robot you could buy that could see. It had a built-in camera. You could program it in C++ and could move around. So it was a mobile manipulator. Unfortunately, solely sold fewer than 200,000 of these over the six-year life of the product line.
Starting point is 00:03:00 In January 2006, they left the robotics business altogether. And so the idea of a consumer affordable mobile manipulator just died. And was that because the technology kind of stalled out? Because there were problems that were not surmountable at that time? We end up being able to mass produce this sort of a capability that even five years earlier it would have probably cost multiples of that amount or just been absolutely impossible. We basically became for our early product scavengers of the smartphone industry. So you have computation, you have memory, you have cameras that now suddenly cost, you know,
Starting point is 00:03:34 50 cents versus, you know, $6. You have motors and accelerometers and different types of sensors that allow you to actually do these sort of things that you couldn't do before. Right. And also the supply chain. And the supply chain. And so we could in effect turn us into a software problem where now, because of the capability set, everything becomes driven by our ability to expand the capabilities with software. The really hard question is, how do you go from no manipulation to manipulation in a small
Starting point is 00:04:02 step? So if you look at the most popular robot ever in the history of robotics is the room of vacuum cleaner, because they found this one task that a robot can do poorly and still make you happy. Right? I mean, it doesn't matter. It's not as a vacuum. Right. And not. Yes. Yeah. Right. So. It's better than nothing, and it's good enough most of the time that that was a success, right? But there aren't a lot of things like that, right? So if I ask the robot to cook me dinner and it does the same quality of jobs as the Rumba does vacuuming my floor. Right, over the course of like eight hours.
Starting point is 00:04:37 We're pretty far off. Yeah. So when people think about home applications, right, there's the Rumba that's down on the floor. And then there's Rosie from the Jetsons, right, which is the full humanoid doing everything. I was going to count how long until a Jets. Hudson's reference came up in this podcast. You got us. So the problem is what is the thing that you could do that involves manipulation, but that doesn't require a half million dollar humanoid and technology that doesn't exist yet? Exactly. Right. And so something that a robot could do with minimal manipulation
Starting point is 00:05:08 skills that would actually be useful as opposed to just occasionally amusing. What if I was just happy if the robot could make me a sandwich? And it doesn't have to be big enough to get the stuff out of the fridge? If I take the stuff out of the fridge and just throw it on the countertop, if the robot could just take it from there, right? Maybe that would be... Slowly drag a slice of bread over to a lettuce. But it wouldn't have to be a full-scale humanoid. There's got to be something robots could do that would be useful enough that we tolerate them.
Starting point is 00:05:38 So we're talking about this as in many ways, the sort of first, quote-unquote, real robot personality in the home. Let's break that down. What are some of the utilities? We're not washing dishes yet, right? They're not picking up toys. They're not. We wanted to leverage this unique mix of AI and cognizance of the environment and the personal
Starting point is 00:05:57 interaction with people in the home and really think of this as a first mass market home robot. They can actually provide a mix of entertainment and companionship, but with elements of utility. We modeled this in a lot of ways after the sort of pets you have in your home, a pet cat, a pet dog, and a pet robot. And the pet robot allows us to reinvent the dimensions that are related. to how you interact with a pet, but the companionship elements still hold. So he gets excited when he sees you in the morning.
Starting point is 00:06:23 He'll get really animated when there's people around him. He'll explore and kind of wander around his environment, interacting intelligently with things around you. It's a relationship. You can even pet him. Like literally you can pet a robot for the first time. But you don't have to clean the litter box. Yeah. So you've recently launched a different robot, which is a multi-generational toy?
Starting point is 00:06:39 How is it different? What is evolving? From a technical standpoint, there was a huge, huge leap there because we now detethered away from mobile device and put the equivalent of a tablet inside of this robot's head so that he can be always on and live 100% of the time go back to his charge or recharge, wake up, and that allows him to initiate interactions in a way that previously would not have been possible by somebody pulling you out and launching an app. We can use the strengths of the fact that we have cloud connectivity and a robot and started getting into deeper voice interface elements of functionality
Starting point is 00:07:12 that now leveraged the character aspect and the warmth of it. The fact that he can actually see and understand the environment and recognize you, personal delivery of information in a way that can be initiated by him. What happens down the road is actually starting to more intentionally dial into all of the digital pipelines in your life, whether it's a smart home, your own calendar, and particularly when you start adding a layer of mobility that lets you actually move around the home. Right now, there's nothing that actually stitches the home together in that way.
Starting point is 00:07:39 Everything is just an end device in a room. Once you start getting cognizance of what's around you, who's in the home, be able to move round in it, you can do a range of things from security and monitoring to tying together smart home features like recognizing that your windows open, but your ACs on turning off flights when you're gone, being able to remotely check on things like pets, build a blueprint of your home for furniture shopping or real estate. We're already thinking about how do you start adding mobility where that becomes the next big barrier. You detail it from a phone to get to Vector. Now the next big barrier is truly being able to exist in an indoor environment, whether it's a home or workplace,
Starting point is 00:08:14 and be much more intelligent about how you interact with people and the sort of service you can provide. So like vector, go check and see if the cat is out or in? So with every one of our robots, we've realized that we're basically making an operating system for robotics, applications, and technologies. And so we've unlocked APIs for each of our products where people can access very easily these technologies
Starting point is 00:08:34 that have millions of lines of code behind the scenes. We started with Cosmo accessible by everybody from a 7-year-old using a graphical interface to do face detection with a robot through teenagers and even PhD students in computer science. Now, instead of having to be an expert in robotics in order to do something like path planning, you have access to this API of technologies. And we're really interested in how that can become an enabler for kids of various ages to actually learn how to break through and become interested in robotics in a way that
Starting point is 00:09:03 wouldn't have been possible earlier. Has what you think about changed at all with Vector, you know, the ways that this kind of programming has evolved? With Vector, the capabilities just skyrocketed. because basically instead of just the limited functionality that Cosmo had that was deferred to the mobile device, you now have onboard the ability to see, hear, feel, and compute with a quad-core CPU that would have been prohibitively expensive before. So just exponentially more options. Right. And so now the sort of programs you can write expand.
Starting point is 00:09:29 We've already released to our early customers in SDK that allows them to interface with vector through Python. And then down the road we'll probably create a scratch interface like we did with Cosmo as well. So for me, what's most interesting is having this. the robot interact with physical things. Physical things are really hard to interact competently with. We built a three-story robot dollhouse with a working elevator. And the students in my cognitive robotics class were working on that, and they'll work on it again this year.
Starting point is 00:09:58 That's so cool. Getting the robot to navigate through the dollhouse, use the elevator to move from one floor to another, eventually you want to have multiple robots in there interacting with each other. These are very hard technical problems. But when you solve them, it's really fascinating to see a robot interact effectively with the world. I think there's just something more being able to work with a real robot versus something digital in a screen. If you're a kid learning a program, there's something amplifying about that.
Starting point is 00:10:25 Just the learning aspect of it. The moment you can intentionally manipulate something, it's a sign of really deep intelligence. And so that's a big focus of vectors. How do we identify interesting things in the environment and just like poke at them, examine them the way that would. That's a surprisingly hard problem. but when you do it successfully, there's a perceived intelligence there that just amplifies your appreciation of the robot and everything else that it might be able to do. One of the articles mentioned recently that Vector feels like a beachhead into something bigger. And it does seem like we're getting airdraft in this sort of little taste of like the fantasy Jetson's robot, right?
Starting point is 00:10:58 Like this personality that actually responds to you and interacts with you. So how are we going to start developing this relationship with this being, this crew? I don't even know what the word is, this robot, but like it is an entity. That's the right. Yeah. What are some of the learnings of how we're starting to develop relationships? And I think one of the pieces of most people underestimate is how critical this human robot interface challenge is and how novel and unique of a dynamic this becomes.
Starting point is 00:11:24 Until you experience it, it's hard to understand how that feels and how important it is. It's the same sort of underlying inherent desire that we have to speak face-to-face with somebody versus just on a telephone. We always thought that this would be the magic even behind Cosmo, but it turned out to be even stronger then we realized the way we approached it is we actually have an animation studio inside the company. Oh, you do? We do with folks from Pixar, DreamWorks and these sorts of backgrounds. And so they literally, they're using Maya software suite that you would use to animate digital film and digital video games. But we've rigged up a version of Cosmo and Vector and all of our robots in there where they're actually physically animating these characters with the same level of detail that you would see in a movie.
Starting point is 00:12:02 But the output is spliced to where it's a physical character coming to life in the real world. How fascinating. And it's this merger where you have these people who come from. from this world where they're used to controlling a story on rails from start to finish into every minute detail. But you get thrown into a spontaneous environment with all these constraints and unknowns and limited degrees of freedom in the robot. But you have the benefit of being physical where everything's amplified in terms of the personal impact of what that does. So it's a whole new kind of physical manifestation of storytelling. And they have to partner
Starting point is 00:12:28 with the roboticists and the AI team, the systems team, where they have to leverage the knowns of the environment and the intelligence of what's around you. I just recognize somebody I know or I almost rolled off the edge and I got scared. And so we have this sophisticated personality engine that takes context from the real world and probabilistically outputs one of these giant short films, if you will, of reactions that end up getting stitched together into a life like robot that feels alive. And we've been really surprised by, for example, eye contact. We knew that this would be important where you make eye contact, you get excited, say, Hannah, and then you get really, really happy. Kids just melt with that, but we didn't realize
Starting point is 00:13:04 how powerful that was. And in one of our tests where we were just on, optimizing parameters, we ramped up the frequency and length of eye contact by like a second. And our average session length went up from 29 minutes to 40 minutes. Oh my gosh. So it's just that one parameter. And so you get these like really subtle interactions where it's a sort of data that you can only really learn about at large scale once you're actually fielding these robots. What are some of the engagement patterns or the dials that you have learned to need to go
Starting point is 00:13:32 up or down when you're watching these human robot interactions develop? This is where we're still learning. So Vector's always on and alive. So how do you find a balance between him waking up and getting excited and coming out but not being disruptive and or annoying? chill out, go to your bed. How do you take the cues where if you get taken and put back in your charger, that's a cue to like be quiet, right? And so, but you want to interpret that and just kind of grump a little bit, but then settle down, right? We have an audience that spans from kids to somebody wants them as a desk pet at work
Starting point is 00:14:03 to an adult or a family who just likes a little companion in the kitchen table or kitchen counter. It's really engaging sort of the volume of the interactions. That's right. And how do we create an algorithm that adapts to how you use it or even gives a user a little bit of control where some people want a really active and almost like hyper and excited robot. Some people want a terrier. Some people want a same Bernard. It's just kind of like, you know, kind of hangs out and is like quiet. And so we've never really
Starting point is 00:14:27 dealt with that. And just like people pick their style of dog to match their desires and their personality, we're trying to figure out a way to make this dynamic and ideally automatically adapting to the context that they're founded. Because we literally have people pulling us in both directions, which basically tells us, okay, this is like a completely new type of consumer category because companionship is a key need. I mean, there's a reason people get cats and dogs and other pets. Although, to your point about personalization, it's as if, right, you could not only just choose your breed, but then have your dog literally adapt to your personality over time. It kind of reminds you of the people who look like their dog.
Starting point is 00:15:01 Yeah, exactly. It's like on that really deep level. Doppel gangers, right? Yeah, exactly. So people start looking like vectors out over the day. But yeah, there's extremes that we have to like, you know, you want him to always be excited when you come here and sees you, but then there's a big spectrum in between. And we're learning about different engagement patterns throughout the day as well. How about you, Dave? What do you notice when you see these, you know, with these kids developing these programs and teaching the robots? How do you see them developing their relationships with robots? Well, I mean, to be honest, to me, there are mechanisms. And I want the mechanism to work well. And I want the mechanism to be
Starting point is 00:15:33 beautiful. I want people to be able to see how the robot sees the world and understand, you know, why did the robot do this? Well, because the robot perceived the world in a certain way or the robot had a plan it was trying to execute. And I want kids to see the robot that way. I think it's interesting to ask about the complementary question, which is how does the robot change how we think about technology. So if you look at things like menu systems that have become part of everyday life, right, everybody understands what a menu system is.
Starting point is 00:16:04 And 30 years ago, they didn't. That was a weird, obscure concept. And now we encounter them everywhere, right? I mean, your vending machines, using a menu system, your phone, everything on your computer. So there are bits of technology that have become integrated into everyday life so that we don't even think about it anymore. A menu system is just one example of that. We're beginning to see things like computer vision become part of everybody's common understanding. So when you use something like Cosmo, you begin to understand what the robot can see.
Starting point is 00:16:34 So for example, with Cosmo, you can see cubes up to about 18 inches away. And that's because of the limited resolution of the camera. And so you begin to learn when you work with the robot, well, okay, he's got a vision system. He's a little bit near-sighted. I have to be careful when I show him a cube that I don't put it too far away from him. So you're starting to think about this thing as a sighted thing, and you're thinking about his representation of the world. So you're thinking about robot perception in a way that people only thought about very abstractly before. You know, 50 years ago, yeah, you read a science fiction story, but now you're looking at a robot perception story.
Starting point is 00:17:08 but now you're living with the robot that can see, and so you adjust your behaviors based on your expectations and your experience of robots that see. And so that's happening more quickly with speech recognition because of the proliferation of speech recognition applications, Alexa and phones and so on. But it's starting to happen, I think, with vision as well, and Cosmo is a very important step in that direction.
Starting point is 00:17:32 So that, you know, 20 years from now, no young person will have grown up in a world where computers could see, right? Computers could always see. And that's just very different. So what you're starting to see is that, you know, we're changing the populace to make everybody computer literate. And what that means is that when you start having people playing with robots who now understand the technology inside the robot, that means that you can make applications for them that require levels of sophistication that maybe wouldn't have been practical before, right? And if that's sort of woven into the culture, like everybody can,
Starting point is 00:18:08 gets this in high school, right, then the kinds of consumer products that you build are going to be different. But if you look at some of the other tools that have changed our society, not everybody went and learned how to set type, but then we made PowerPoint, right? And now everybody can do visual design. Most adults know how to use a spreadsheet. So maybe you're never going to be a Java programmer, but you can do useful computational tasks because you learn how to use a spreadsheet. And it changes your thinking in some key way. It does. And so what is the robotics, equivalent of something like spreadsheets or PowerPoint. What is the thing that's not so horribly gritty technical, but is so powerful that people will want to learn it because it'll let them
Starting point is 00:18:49 do the right thing, right? It'll let them program the robot to make the peanut butter sandwich the way they want it made. To me, that's a fascinating research question. We don't have home robots that can make a peanut butter sandwich without burning the house down yet, but you can do an awful a lot with exploring this manipulation space. And my personal interest is figuring out how to get people to teach robots like Cosmo and Vector to do manipulation the way they want it done. To kind of unlock robotic design thinking. To make it intuitive enough that people can actually express their intent in a way that
Starting point is 00:19:24 gets useful stuff done. Because if you can get Vector to push little things around on the tabletop and you can make it easy enough to express your intent that the average person, and who's not a computer science major can do that, then you can scale up to the humanoid robot that's going to cook your fancy dinner without working the house down. It's interesting.
Starting point is 00:19:43 It sort of circles back to the way you open this conversation, which is that having it be this kind of scale and size and having it be a toy is what allowed you to be creative, right? You're kind of describing the ability to be creative in a way. You make your own rules on what it's supposed to do. Yeah. Let's talk about design now. So when you start thinking,
Starting point is 00:20:01 both in terms of the sort of level of man, manipulation that they're able to do, but also in the relationship that you want to sort of test and foster. How do you think about design from the ground up? How did you start to put together the actual creature? And this is where I think robotics is a lot harder than a lot of other areas of consumer electronics where in a lot of places you can just think, okay, electronics need to do this, software needs to do this, and then we need a box around it. You can't do that in robotics because everything is from the very beginning designed where for us it's a few pillars. So there's a mechanical side, electrical side, and kind of the components
Starting point is 00:20:33 and electronics that are in there, the software, which is a huge complexity, and then there's a character side and industrial design side. And so those five have to work together from the earliest stages because you have to be very intentional where the form factor should capture the character, which then the software has to marry with. Then you have the mechanical needs and constraints. Are they all weighted equally? Are they at different stages driving? Yeah, at different stages of driving. So the nice thing about software is you can continue to push that for years. But what that means is you have to be very intentional with the selection of the hardware where you make it as generalizable as possible to push as much of the software
Starting point is 00:21:07 choices down the road as you can, but not limit yourself. Sort of delay. Delay, exactly. The reason Cosmo and Vector, for that matter, are small is because one is that they become kind of non-intrusive because they don't take up a huge amount of space. But especially in the case of Cosmo, you have something that's small. It's perceived as cute and it can feel quick without having to have very heavy and expensive motors. It could potentially hurt somebody. Oh, you're playing up the cute factor to your own advantage. That's right. And people become forgiving of any limitations or mistakes that the robot then makes. That's so smart. It's okay because you're cute. I'm not annoyed. Even better. You actually get bonus points for making a mistake,
Starting point is 00:21:43 but being smart enough to show grumpiness. To show a little EQ about it. Yeah. And so it's the IQ and EQ being married. That's the challenging part. It's, you know, the mechanical and industrial design and character have to work together to think about the overarching form factor, and a lot of it is just driven by constraints. Like now in our future products, we can start putting in-depth cameras, which give you a 3D model of an environment at the cost, costs that would have been unimaginable five years ago or even three years ago. Same thing with motors. They're fairly expensive. And so Cosmo and Vector each have four motors, but we put a screen for the face because that gives us an infinite dimensionality for the personality to be able to express what the robot's thinking and
Starting point is 00:22:20 the speaker to obviously have the voice part of it. And it's shocking how much you can do with just a couple of one or two degrees of freedom and a voice and a face. How about the arms? I'm interested in the arms because Vector and Cosmo kind of harken back, I think, to other ideas about what robots might look like, like Wally. Tell me about the thinking behind that. We realized that like Cosmo was going to be fairly limited in physical capabilities just because of cost constraints. But manipulation is one of the deepest forms of showing off intelligence. And so him being able to get excited about his cube and go pick it up and be a little bit OCD and reorganize his surface, that was part of the charm of Cosmo. And so his arm was kind of made as a lift to be able to move the blocks.
Starting point is 00:23:03 But we very quickly realized it's one of the most important dimensions for the personality where it's almost the way you use your arms to express when you have happy, sad, surprised. And it ends up being like his arms as one of the main tools for the animations engine. And these were all learning that the animators have gotten really great at squeezing the maximum benefit out of it. it and making it feel like this is a character that's alive, even though you have the tiniest degrees of freedom compared to any animated character. What do you think some of the historical influences were sort of inheriting in our expectations of how these mechanical entities should look? I think a lot of verbatuses make a mistake where they completely forego the EQ side and just
Starting point is 00:23:40 think about a tint can of like intelligence that does something but doesn't convey any sort of character to it. You lose all forgiveness of any limitations you have. And you actually kind of become creepy because you now have a bunch of sensors, which you don't really understand what the purpose is. It's disarming to have a character like Cosmoor Vector where he has a camera. You have a camera in your house that's able to move around on his own, but nobody ever feels bad about that because, well, yeah, of course he has a camera.
Starting point is 00:24:05 He has to see because he's a character. You went completely away from the Uncanny Valley and didn't make him look at all humanoid. And that was very, very intentional. Even with the voice, we thought about an actual voice. But if you give something a voice, you narrow in the range of appeal. By being tonal, you apply a very different intent. than a five-year-old might apply, but everybody gets joy out of it. You can show a massive breadth of emotions without having to have the complexity of a voice.
Starting point is 00:24:28 With very little tools. Yeah, because the moment you have a voice, there's an expectation of intelligence that technology wasn't ready to meet yet. This idea of trying to pursue humanoid robots, which particularly is common in like Asia and Japan, it almost feels like a flawed compass because you end up absorbing all the limitations of humans and their perceived kind of intent where you're stuck by what's the definition of human versus being able to play to your strengths and avoid your weaknesses. And so we just embraced the constraints in the form factor. But he is a kind of knitting together of other older ideas as well.
Starting point is 00:24:59 The first time I saw one of those security robots in like an underpass in the city, it was kind of patrolling a parking lot. And I got really up close and heard this like kind of whirring roboty noise. And had this like moment of revelation where I was like, it's not actually making that noise. That's just somebody's idea of the robot noise that it should make so that we know it's there. Having seen kind of the industry evolve, what do you think are some of those inherited ideas that are impacting our design today? It's always been a merger of direct robotic influences and then non-robotic influences. And so on the robotic side, there's definitely this idea of
Starting point is 00:25:34 kind of R2D2, Wally, where you have this cute robot, but one that's like kind of your fearless companion. Like nobody would ever accuse R2D2 of being dumb, even though he has very limited degrees of freedom. And same thing with Wally, right? And so that was definitely kind of a deep inspiration. The other influence is of animals. And so we have a kind of design and mood boards where things like obviously puppies and kind of dogs, owls, really perceptive animals that have where the eyes become like a really, really key part of their communication language. And they can show emotional intelligence in a way that is really, really hard purely mechanically, but ends up being almost infinite when you add eyes and voice to it. So from moving from Cosmo to Vector, are you starting to see a difference in the way that children interact with versus the broader house? and other types of different ages.
Starting point is 00:26:21 Like, do you get different learnings from those different relationships? Yes. Kids in a lot of ways have an imagination that just amplifies the magic of these types of experiences. They think these robots are alive. At three, like, you may not understand
Starting point is 00:26:32 how to play it and follow the rules of a game, but you love the idea of a character still. And that actually is like almost universal. It's almost as natural as like swiping on a screen where you see these like one-year-olds like swiping on screen, right? It's a basic human tendency to sort of fill in that information. And this is where some of the studies
Starting point is 00:26:48 that we're actually seeing being done or even around kids with autism, whether there's something so unique about the idea of a character like Cosmo or Vector, where there's a response there that is very, very unique compared to any other type of engagement. I was thinking about that when you talked about the eye contact thing, right? That's an immediate feedback loop for increased eye contact. Yeah. And so we've actually had studies with University of Bath in the UK where the engagement patterns and the collaboration around Cosmo is unique compared to anything that they'd seen.
Starting point is 00:27:13 So in those kind of early emotional relationships that are growing, I mean, you can sort of anticipate the affection, but what was something that really surprised you? We built Cosmo with a lot of games integrated, thinking that engagement will be around playing games. He's like this little robot. Like your buddy to compete against and to play against collaboratively or competitively. And the games also create a lot of context for emotions. So, you know, because you win, you lose, you're bored, you're frustrated. You're bringing emotion to it a lot.
Starting point is 00:27:40 Exactly. You can almost like engineer scenarios that allow him to be emotionally extreme and whatever way in the life. And what we were just shocked to see in some of our playtests early on is that you're playing this game that's kind of using the cubes as a buzzer called Quick Tap where you're competing against Cosmo. And if you beat him, he'll kind of get upset or maybe he'll get grumpy and slam his cube and storm off. But he gets really sad when he loses. And there were kids in the playtest that were playing with their siblings.
Starting point is 00:28:09 And one of them would like tell the other one, hey, stop it. Like you're making him upset. Let him win a game. And they actually felt really bad for this little robot who would like upset what he lost. Their empathy was so high. Yeah. And they'd actually throw a game just to make him happy. And it were like, wow. Like you don't see that in a video game. You would never feel bad playing Pac-Man or like Fortnite or whatever. Yeah. Yeah. It's just you were trying to win. And here the competitiveness lost out to the empathy of a character that they actually cared for. And all of a sudden,
Starting point is 00:28:39 like, okay, there's like something really, really special here. And we started thinking how to amplify. And a lot of those learnings from there actually led to Vector because we had rest of limitations that just weren't possible because of Cosmo's hardware. So what changed in vector as a result of that empathy learning? So we realized voice was a big one because people thought that they were talking to Cosmo and that he would respond and they'd attribute intelligence to it even though he had no microphone and no ability to hear you. What we wanted to do is if you yell at him, he gets really upset and kind of backs off.
Starting point is 00:29:06 Or if the door to slams, he turns in that direction and wonders what it is. And so we wanted to bring life to it. The other one is tactile where when you pick up Cosmo, he recognizes it and he'll get grumpy if you're holding him up in the air. but being able to touch and actually have a response, we know that that's one of the most important things with pets. And so we made capacitive touch sensors in various areas of the robot, again, thinking that these are going to be some of the dimensions of matter.
Starting point is 00:29:29 The last one, and what proved to be the most important is just the ability to be always on, that if the moment you disconnect the phone, he dies, it kills all illusion of him being alive. But if he's always on and doesn't have that barrier, it completely changes the sort of emotional connection he can build. And so these are the sort of things that were direct drivers of the hardware that improved in vector, which now we have a many-year roadmap on how to actually utilize it. And that's where advances in software technologies like deep learning and voice interface
Starting point is 00:29:56 and so forth unlock things that now let you rethink certain types of problems. We're releasing Alexa integration in December, which is going to be the first time you truly have a personality wrapper around some of these very dry functional elements. Not only do you interact with technology in a command and control sort of way, but for the first time, first time there's an ability for this character to initiate interaction and get your attention and make eye contact and then do something personal with you in a way that you would never accept from smart tech in the home right now. One of our big questions is if the usage pattern of Alexa through a character is significantly different than the usage pattern of a normal Alexa. Yeah,
Starting point is 00:30:36 I mean, I bet it would be. Are you seeing that yet? Well, we're seeing usage patterns of just the character with no functional Alexa integration skyrocket above what a typical voice assistant. does. When we were working with Google and getting advice from Sonos and some of these other companies that have a lot of experience of voice assistants, we thought that for like a thousand queries would be plenty for like three months or four months. And now we hit 250 just in the first two weeks and the engagement staying strong. And so suddenly like when you look at voice assistants that have an average of maybe like one to two queries a day and we're at like 10 to 12 even before we have a voice assistance built in, it causes us to ask a lot of questions of how do you leverage the
Starting point is 00:31:14 role of personality and character as a way to amplify not just the fun side, but also the utility side. That actually shows a very different type of engagement and the things you ask and how you interact with it. It teaches something totally new about voice as a platform. That's it. And I think that opens our eyes. And I think a lot of the partners that we work with suddenly there's a lot of really interesting overlaps about what does this mean for the role of technology in the home and the workplace. Is deceptive how important emotional interfaces to make the functional elements get utilized in a better way. Okay. So if these are the little kind of baby waves lapping at the shore, right, of like true robots in the home, how do we get to mass adoption
Starting point is 00:31:55 where this starts to become a reality and we all get to live like the Jetsons? Eventually, we want to get to $500,000 robots because it opens up so much interesting technology that you can put in them. But to get there, you actually have to be very thoughtful about what's the level of capability that's required to make that justifiable and not just in a, you know, isolated kind of tech geek sense, but in a mass market sense, because you want it to really scale. A lot of it then is going to become more electronic capabilities where you now have sensors that used to cost $1,000 that now are $10. They used to be just impossible or prohibitively expensive.
Starting point is 00:32:28 And the other big one is manipulation. Right now, we have limitations both on the cost side, which comes from the mechanical complexity of manipulation, as well as the software side on how do you robustly interact with unstructured environments. We have a long way to go before we can actually stack the dishes and dishwasher and put them away afterwards. Computing power will make up for a lot of hardware defects. So if you have a underpowered, unreliable manipulator, but lots of good computing power, you can make that manipulator do amazing things. The problem is if you have no power and a bad manipulator,
Starting point is 00:33:05 then things don't work, right? But now as the computing power becomes so much better and the sensing capabilities become so much better, I think some of the demands on the manipulation, manipulator back off a bit. So you can get by with a less capable manipulator because you can learn to make up for its deficits. Different levers to pull, basically. Yeah. What you're going to see in the next wave of hardware is dedicated hardware that's not just for raw traditional computation, but for very specialized AI applications like deep learning and vision classification. These are sort of things that in the next wave of products in the next three to five years, that's probably going to become pretty standard, there's still a lot of information technology progress that needs to be made.
Starting point is 00:33:46 And the interesting thing is that some of that's being done by Alexa and some of these voice assistants trying to integrate with the rest of your life. The tools that get created through that, just like voice interface, become incredible enablers of these new types of technologies. And I think we'll start seeing more in education and healthcare and broader home utility than monitoring and so forth. Even when you stick to just informational or companionship, on the health care side, elderly companionship and helping age.
Starting point is 00:34:10 in place. It's an area that's becoming more and more kind of important and costly for a lot of communities. If you really nail the EQ piece, it becomes not that hard to imagine where your interface, you know, when you come into a hotel or a store or to interface with a doctor, actually becomes somehow driven through an indirect interface through a robot, which is pretty interesting and then opens up a lot of functional opportunities as well. And in the end, we always thought of it as just an extension of computer science into the real world. If you can understand what's around you and you have the ability to interact with it, you turn it into a digital problem.
Starting point is 00:34:43 There will be a catalyst that spawns the same thing on the physical side once the ability to understand the environment and interact with it catches up. Then everything becomes a matter of software making the interaction smarter and smarter. Thank you so much for joining us on the A16Z podcast. My pleasure. Thank you so much. It's a pleasure.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.