The Vergecast - The 50-year quest for AI virtual assistants

Starting point is 00:00:00 Support for the show comes from Retool. Too many companies run critical operations on duct taped spreadsheets, Slack workflows, and whatever else they could cobble together. Not because they want to, but because building internal tools means weeks of waiting on someone else's backlog. That's where Retool comes in. Build custom internal tools just by describing what you need. Prompts something like,

Starting point is 00:00:22 Build me a revenue dashboard on our Salesforce data. And Retool actually builds it on your company's data, in your cloud with enterprise security built in. Go to retool.com slash vergecast. We all need to retool how we build software. Welcome to the Vergecast, the flagship podcast of Generative Search Queries. I'm your friend David Pierce, and this is the third and final episode in our series about AI. And for this one, I want to talk about one of the oldest stories in AI,

Starting point is 00:00:56 and really one of the oldest stories in the tech industry as a whole, the virtual digital assistant. My favorite story about this is one I learned a few years ago, and I've been thinking about it ever since. So go back to 1979 in Palo Alto, California. This guy named Steve Jobs, you may have heard of him, goes to Xerox Park, which was this ultra-successful R&D lab owned at the time by Xerox. People at Park helped create everything from laser printers to Ethernet to personal computers. It was this incredible factory of cool technology. But the thing Park is probably best known for is what Steve Jobs saw, the graphical user interface,

Starting point is 00:01:34 which went beyond the text-based command-based computers everybody was using, and actually used images, menus, and space, like three-dimensional space, to help people navigate their computers. And as the legend goes, Jobs saw all that, took it back to Apple, made the Macintosh, took over the world. The story is actually slightly more complicated than that. The Macintosh had been in development since about a year earlier, and it seems like what the park visit may have really done is help convince jobs that it was a good idea worth pursuing and building rather than, you know, showing it to him for the first time. But that is not what I want to talk about. The story I want to talk about was happening in a different lab at Park, where a bunch of employees were teaching a computer to understand and communicate in natural language. They had this idea for what they called the conversational user interface,

Starting point is 00:02:25 where instead of typing commands, essentially like writing code into your computer, you could work with your computer by just asking questions and typing like a person. Remember, this all sounds very normal now. This is almost 50 years ago. And actually forget typing as a whole. They thought maybe eventually you'd just talk to your computer out loud like a person. The team at Park in general thought that a computer should act more like an assistant than a machine. You'd say, book me a flight to Tucson for next weekend, and it wouldn't launch a program to do that.

Starting point is 00:02:55 It would just do it for you. They actually built this thing called Gus, the genial understander system, to show off what they were talking about. And in a paper in 1977, again, 46 years ago, the researchers included a sample dialogue of how this interface might work. I'm going to have two AI voices read it right now just so you get the vibe of how it might work. It starts with Gus, the AI system, introducing itself. Hello, my name is Gus. I can help you plan a simple trip by air. Where do you want to go? I want to go to San Diego on May 28th. What time do you want to leave? I must be in San Diego before 10 in the morning. Would you like PSA, Flight 102 that arrives at 9.15am?

Starting point is 00:03:40 What is the next flight? Air California Flight 310 that leaves at 8.30m. and arrives at 10 a.m. I'll take the first one. In what name should I make a reservation, please. Dan Bobrow. I have confirmed the following flight, PSA. Flight 102 on Wednesday, May 28th from San Jose to San Diego that leaves at 7.30m. And arrives at 9.15am. What date do you want to return on?

Starting point is 00:04:07 On Friday in the evening. Would you like the flight that leaves at 7.45 p.m.? That's fine. I have confirmed the following flight, PSA. Flight 307 on Friday, May, Friday, May 30th from San Diego to San Jose that leaves at 7.45 p.m. and arrives at 9.30 p.m. Thank you for calling. Goodbye. I'm pretty sure the voices there wouldn't have sounded nearly that good in 1977, but you get the idea.

Starting point is 00:04:35 Obviously, the graphical user interface won out at the time, and one for a long time. That thing you just heard is not how we book plane tickets, and it hasn't been for the last four decades. But people did keep working on this stuff. The idea of of a virtual digital assistant that can just do things for you, and all you have to do is talk to it like you would talk to a person, has been so seductive to so many people for so long. And in recent years, it actually feels like we're kind of veering back toward the conversational user interface as the future of computing. I mean, chat GPT is nothing, if not a conversational user interface. It's like the whole of the internet in a chat window. Bing is becoming the same thing.

Starting point is 00:05:16 Google has barred, which is, you know, the same deal. And even with things like Alexa and Siri and Google Assistant, we've been heading down this road for almost a decade. But it actually feels like it's really exploding right now. And this push we're on raises some really interesting questions. How far does the chat window go? Is the future of computing really just like messaging? And are we actually, with this advent of modern large language models and all of the underlying tech that makes AI smarter and more useful, getting close to the kind of virtual assistant we've been dreaming about for basically half a century.

Starting point is 00:05:53 To help me figure out those answers, I called up Richard Socher. I'm Richard Socher and the CEO at U.com. U.com is actually one of the most ambitious AI assistant platforms I've ever tried. It's partly an AI search engine so it can answer your questions or help you find information on the web. That's really where it started. It's also a generative AI tool so you can use it to write code, or create images or come up with brilliant emails that just sound like you. And long term, if Richard has it right,

Starting point is 00:06:24 u.com will also become something like the smartphone, a platform on which you can do almost anything all through this one interface. Richard has been in the tech industry for a long time, and he's been working on problems of natural language processing for even longer. He actually did a PhD at Stanford on bringing neural networks, which is kind of an AI system, to natural language processing, which was at that time a huge leap forward in rethinking how computers process speech. The point is he knows the history of all this. He knows how it works.

Starting point is 00:06:55 He knows how hard it has been to pull this off. And he told me that over the last decade, as he saw this tech get better, he thought this might be the moment to do it. What he wanted to build was a chatbot, kind of, and an assistant, sure, but he looked at all his research on language and conversation and natural language processing and how we interact with computers, and he decided the thing he really needed to do was build a different kind of search engine. The biggest application of natural language understanding is in search. That's where billions of people learn something, try to find information, try to get things done, and they start all of these tasks on a search engine. And so do I want to necessarily call that a search engine?

Starting point is 00:07:40 it almost doesn't matter what you call it, but it is that sort of magical spot in the beginning of many people's online journey where you sit to be able to be helpful for those users. And so that's sort of what let me to search. That distinction of what you call it, I actually think is like part of the reason I'm so fascinated by you.com. Because I think you're absolutely, I mean, you're obviously right that search is kind of the dominant interface for this kind of natural language. processing and the idea of just like how people search for information.

Starting point is 00:08:14 But simultaneously, even as you're starting to think about how to apply neural networks to this stuff, like obviously you were thinking about this stuff before chat GPT was around showing, you know, new chat body ideas about how this stuff could work. But you were, you were playing with this technology for a long time. And I can imagine a world that is not that different from this world where you went much more down the like Siri, Alexa, Google Assistant route rather than kind of building something that looks like a search engine, even to accomplish relatively the same goal, right? Am I thinking about that the wrong way? Like, are those two things kind of further apart in your brain than they are in

Starting point is 00:08:52 mine? You couldn't be more spot on. And it's actually very timely too, because we dropped the word search engine from you.com. Oh, interesting. Okay. We're now just like make it very clear that this is going to be a conversation and that we're an AI chat assistant that's going to help you learn and get more things done, help you summarize, and all of these things. And so that's exactly how we're actually repositioned ourselves. After many years, we've learned that really search engines do not inspire excitement from people, but an AI chat assistant that helps you with finding information and answers and learn and so on does excite people. They're ultimately the same thing. It is a marketing question. We still want to be able to say, this can be the default in your browser. You know,

Starting point is 00:09:45 you can start every question you have in your browser and your URL navbar. It defaults to you.com because that's where you get the most useful first step in many different intents that you have. But yeah, we're moving away from the term search engine. And to be honest, and I probably shouldn't say this, but we don't have it finalized yet what the new terminology should be. The place we want to be be is the most helpful assistant that you can be, which honestly is also something that Google has worked on for many years as well. Part of what seems like has changed since you started working at you.com is that I think one thing that you did that was very smart at the beginning was make it look like a search engine, just because I think that's like a thing people understand what to do

Starting point is 00:10:28 with, right? That like even in the way that I think a lot of people open chat GPT now and it's like, well, what do I do with this? People instinctively understand what to do when they open a web page that looks like Google. So building a web page that kind of looks like Google makes a lot of sense, even if what you're doing on the other side of that ends up being very different. Do you think that's changing that you can make something that looks less and less like a Google query page and start to bring people along with you into all that other stuff now? Yeah, it sounds like you're in our meetings. That's very astute, very astute. observation too here. This is something we exactly found that last year we had in the summer,

Starting point is 00:11:08 the first LMs within a search engine context. And we tried to innovate over the last two and a half years along a lot of different dimensions of what could in the past be called a search engine. We tried, you know, we had horizontal scroll where you just see a few sources and then you can always dive deeper into it very quickly. And we had, you know, different apps that are within that search engine context. And what we would often hear from users until late last year was, that's interesting, but I'm just so used to Google. I wouldn't want to switch to anything that's too different.

Starting point is 00:11:46 And we kept getting pulled back by users in our journey to something that is more and more similar to Google. And it was quite frustrating because we had some good ideas. We thought, and in each of these ideas, we had a core. a hardcore group of power users who did love it, but it just wasn't growing enough. And the majority of feedback we kept getting is it's too different to Google. It's two different to Google. I want something like that's more similar, but just a little bit better here and there. And so that all changed thanks to chat GBT. All of a sudden, a lot of users, a lot of them

Starting point is 00:12:24 early adopters, but more and more just normal folks who basically could conceive a new world of finding and learning about things in the world. And so that is kind of the biggest advantage that we've had since chat chivity came out. And, you know, two and a half weeks later, we launched Youth Chat, which basically was the first LM that is connected to the internet and hence can be up-to-date factual and have citations for the facts that it tells you making it much more trustworthy. Okay. We need to take a break.

Starting point is 00:12:58 And then we're going to come back and talk more with Richard. Support for this show comes from Shopify. Starting something new isn't just hard. It can be really scary, too. So much work goes into this thing that you're not entirely sure will even work. But here's a better thought. What if it did all work? What if your instincts were actually right all along?

Starting point is 00:13:27 Shopify wants to help you get there. They're the commerce platform behind millions of businesses worldwide and nearly 10% of all e-commerce in the U.S. from established brands like Allbirds and Hines to companies just getting started. Their design tools make it simple to create the exact online presence you're envisioning with hundreds of ready-to-use templates available. And with built-in marketing tools,

Starting point is 00:13:50 you can launch full email and social campaigns in just a few clicks. So you can connect with customers wherever they are. It's time to turn those what-ifs into with Shopify today. You can sign up for your $1 per month trial today at Shopify.com slash vergecast. You can go to Shopify.com slash vergecast.

Starting point is 00:14:11 That's Shopify. com slash vergecast. Welcome back. Like we've been talking about, the uptick in AI and natural language processing and voice assistance has been happening kind of slowly and then all at once over the last decade or so.

Starting point is 00:14:33 And even though Siri still kind of sucks sometimes, the underlying tech really has come an incredibly long way in that time. But I think if you wanted to pick the moment the modern era of conversational user interfaces really started, it would have to be the launch of chat GPT last fall. Something about this system that felt like it understood all your questions and responded with human and helpful answers felt like magic to a lot of people in a way that even Alexa and Google Assistant just never did.

Starting point is 00:15:05 ChatGPT isn't perfect, obviously, it's a million miles away from perfect, but it seemed to make people think that it might be someday, and that there might be something revolutionary in that. This could be the way we access information, not by going to Google.com or scrolling stuff on your phone, but just by asking for it. That felt huge. But now it's a year later,

Starting point is 00:15:26 and that idea has taken a bit of a hit. For one thing, we've all discovered the many, many, many imperfections of the tech, and also the ways in which conversations maybe aren't all that efficient. I think we're definitely in the low part of the chatbot hype cycle. It'll get better, but it's not. what it once was. Richard, though, started U.com before the chat GPT explosion, and so he's seen everything rise and fall and rise and fall and all the parts of the hype cycle here.

Starting point is 00:15:55 And he's seen people poke at the edges of the tech and really try to figure out where it breaks. So what have we learned? What can a virtual assistant do? And if and when the tech really does work, how do we want it to work? Richard says we're still learning, he's still learning, but he has some pretty big ideas. about it. Like we, I love text, right? I love natural language processing, but text isn't always the best way to communicate something. Multimodal answers are the best way. So if you ask, you chat, what's the stock price of Pallantier or Salesforce or some company? We're not going to

Starting point is 00:16:31 just give you a bunch of numbers, especially not those that are hallucinated by an LM, but we show you a stock ticker that's interactive. You can move your mouse over. You can see the daily trends, the weekly trends, the yearly trends, the market cap, and all of these different numbers, and you just see it directly within the chat window, right? And then you can ask like, well, now who's their CEO? And it still captures all the context and remembers which company you talked about before and then, you know, gives you that company CEO, even though if you, you know, just did in a retrieval back end, who's their CEO, it wouldn't give you anything useful. But, you know, we learn to transform those queries as one of the many things that, you know,

Starting point is 00:17:11 going on the background that you have to get right in a lot of different ways. And very easy and obvious when you get it wrong, easy mistakes to make. And so we believe the answer in the future is going to be multimodal. And that's why we invented also last year this app store where you can essentially, other people can also submit apps to you.com and then they can show up within the search results and within the chat response. And I think that will ultimately be the best interface, similar to, you know, give me some good restaurants around me.

Starting point is 00:17:43 I still remember when speech recognition started to finally work and I started, you know, investing also after my first company's acquisition, I had a bunch of folks say, oh, we're going to use speech recognition and to do all of these different things. And it's going to be your personal assistant. I'm like, okay, well, give me an example. And one of the founders was like, well, the example is like, you ask for a restaurant. And I'm like, so you really say I have this like speech system and I talk to it. and I ask it, like, what are some good restaurants?

Starting point is 00:18:12 And the response is, there are 150 restaurants around you. Number one has 4.3 stars. Their favorite dish is pat-tie and is 0.3 miles away from, and then like 10 hours later, you have a sense of where to go. Like, it doesn't make any sense. And so, you know, I love AI. I love natural language processing, but you have to have some user empathy and try to build it in a way that meets people

Starting point is 00:18:37 where they are and not be too much like, this is the new hammer, everything is a nail kind of thing. Yeah, I mean, it's it's kind of amazing how many systems work like that redo all the restaurants in the neighborhood thing. Like, that's where we are with so many things right now. I guess the App Store thing was going to be my follow-up question, which is like, how much of this do you feel obligated to build yourself versus how much can you kind of create the ecosystem for?

Starting point is 00:19:02 And I guess they do still wonder that, like especially as it grows and the capabilities grow and there's a lot more things to do. how much of that space is kind of you.com's job versus being, you know, the platform on which all of this other stuff exists. Maybe we should hire you in our product team. That is the big, the billion trillion dollar question. I think long term, it's very clear to me that it cannot be all on us. And an assistant that ultimately lets you accomplish anything and encompasses the entire economy. But it's also clear that there's a cold start problem here.

Starting point is 00:19:38 People don't want to build an app on a platform that doesn't have hundreds of millions of users. Hundreds of millions of users aren't going to come to your platform if you don't already provide enough functionality to fulfill most of their needs in a really, really good way. And some needs better than anyone else ever or anywhere else. And so that's the cold start problem. So in the beginning, it is mostly on us. And then I hope over time it transitions and we'll. get more and more high-quality apps into the ecosystem. And I think that will ultimately be a more fair way of structuring that first page of the internet that currently, you know, Google benefits

Starting point is 00:20:16 from everyone else on the internet, but then they don't benefit back that much other than, of course, you know, paying for ads and traffic and having to pay for even their own ads on their own name just so that there's not a competitor that outbids them on their own name and then they show up on top and like all of these things. So I think it's a better way and a fair way. to have ultimately a chat portal again, but I think portals also have a bad marketing connotation, and so we might not call it that. Yeah, I always thought one of the things you guys did really well

Starting point is 00:20:47 from the very beginning was kind of let people just tell you the sources they like better. I've never understood why every search engine doesn't just let me say, I really like these four news websites, and I'm not that interested in seeing other news websites. It seems like such a straightforward thing. And I feel like the App Store model really, just builds on that in a lot of ways, right? You get to curate, like, here is the stuff I want to see and the ways I want to see it,

Starting point is 00:21:11 and others can tap into that, and that's a pretty powerful kind of feedback loop, which I think is very cool. Yeah, in the pre-chat days, Reddit was one of our favorite apps of our users. How do you think about personalization? I mean, it seems to me that the perfect version of this, you know, years down the road or however long it takes, is an assistant that effectively knows everything about, me, right? And knows where I am and what I like and what I'm interested in and on and on and on. And obviously that has big privacy questions, which I want to talk about, but also just like logistically is very hard to do. And I'm just thinking about like search is one thing where

Starting point is 00:21:52 it's just kind of upranking and downranking web pages. But like if you're trying to help me do stuff, getting information from me seems really important, but also like the kind of thing you don't want to too much over you're going to lose people. Like, how do you're going to How do you think about that? That's right. Yeah. It's something we have been working on for many months. You have to know when to do this and when it's not that useful.

Starting point is 00:22:16 If you just want to know, like, when was Barack Obama born, it doesn't really make that much sense to be like, well, maybe he liked mountain biking too sometimes. You know, it's like, that's not what I asked for. I just ask you a simple question. I want to get a simple fact. And so to not overly personalize, but to know when to do it well and when to really use the facts and then also do it make it seamless for the user. So it just feels like, oh, yeah, like the more I use this, the better it gets for me without being too creepy either. And still

Starting point is 00:22:45 having control over their private information in a way that, you know, makes them feel safe. It's very important. So what are you seeing as folks are really starting to embrace the kind of chat assistant piece of this? Obviously, the like, help me write an email, help me with my code thing seem to be too, sort of killer apps for people. Is there other stuff you're seeing that's jumping out at you as kind of nascent use cases here? Yeah, a ton of things. I think students are going to have a really great time on you.com. We just launched it back to school, campaign two. It's now the cheapest way to access GPD4, plus, you know, have access to the internet for that chatbot. And on that front, too, soon we will have a whole new class of questions that are very complex that

Starting point is 00:23:32 no other normal L.M chatbot can currently solve accurately, but often pretends to be able to solve and then does make a lot of mistakes, which I think is very problematic. It's just a whole new class of problems that can be answered with an AI that has just more superpowers than just a large language model. But yeah, for now, I think I still see a lot of potential for parents, you know, for folks who want to find new recipes, for people who want to do travel planning and get like nice ideas for itineraries that are going to be much more personalized also in the coming, coming weeks on you.com and then students getting answers for their questions. I'm excited also to incorporate more and more generative AI capabilities into these.

Starting point is 00:24:21 You know, if you ask like, how can I generate an image with AI? Then we just let you do that right then and there. If you're on your pro, you can do it as often as you want. And so, yeah, lots of. lots of interesting use cases. I think the more, where we're still iterating on is like when to make it just as good as Google for the quick navigational stuff and the quick informational stuff. And then when can we really shine by being much more chat first? Right. And you do kind of have to do both, right? Even though the navigational stuff is ultimately not that interesting, that's the like,

Starting point is 00:24:54 that's a big part of the sort of habit building of it. Right. And if you really want to be the kind of homepage of the internet for people, you have to do that as well or better than Google. That's exactly right. You know, and there are a lot of like chatbots out there that some of them, I think, are a little bit, you know, they have the hype moments, but because they are just not connected to the internet, they're not that factual and up to date, they give you sometimes they even give you citations, but don't actually, when you click on the citation, you don't see the fact that they're saying, you know, that was the citation for. It's not at all on that page. And that loses a lot of trust after a while.

Starting point is 00:25:32 And so, you know, there's a lot of like sort of brief hype chats. But I think to have a really long-term shot, you need to have the utility that is more similar to a search engine, not just be like a dictionary lookup kind of thing that you use once a week. Yeah, totally. I think the idea of an AI chatbot is like a complementary thing is kind of interesting. But it is like a primary interface to the internet is where this stuff starts to get really, interesting. And I think that's coming, but it's displacing Google is harder than most people give it credit for is the thing I have learned over and over and over in the time I've been

Starting point is 00:26:08 in this world. That's exactly right. So last thing, talk to me about privacy. Just for a minute. I know it's a thing you guys have talked a lot about. I know the kind of all of this stuff we're talking about utility and personalization and all this stuff often kind of runs directly against privacy. Do you have a strategy for kind of how to have your cake and eat it too here? Like how do you think you can do right by all those things at the same time? Yeah, great question. We used to talk a lot about privacy and we still care about it, but we don't talk about it as much because, to be honest, it doesn't excite enough people. You just can't grow quickly enough a consumer business by just talking a lot about privacy. There are a few folks that can and they, you know,

Starting point is 00:26:51 Dukta Go obviously is like amazing on that front, but we just didn't see the growth from having a strong focus on it. And it does in many, many cases, slow you down. Like, people want to have no bugs and they want to know that things work. But if you were to never, ever do any analytics, you don't know how happy you are making your users, right? And so in many cases, we didn't use, you know, we don't use Google analytics and so on to have the best privacy of any of these chat engines. When you look at pretty much all our chat competitors, all of them are selling your chat transcripts and your data to others, maybe not always like specifically, but like, like attached to your IP or something like that. But they're all like using Google Analytics and and all of that. And so we're trying our best

Starting point is 00:27:40 to be the most private of those engines. But to be honest, we've also, when you talk about it a lot, there's just some people who will never be happy until you're an open source like nonprofit on the Tor network that like makes no no revenue like and has no ads or anything. Like, and And that's just not a viable, viable business strategy either. And so, yeah, we care about it. We think we're the best private chat engine and chat assistant out there. But we just don't talk about it as much as other features that people seem to care about more. And it helped us grow to many more millions of users than before.

Starting point is 00:28:18 A little while after Richard and I talked, U.com rolled out a feature that he hinted at a couple of times in our interview. It's called U agent. and it's able to do everything from calculations to code execution for you. One of the examples U.com gives is calculating your mortgage. Feed it all the data you care about, the amount of your loan, the time, the interest rate, and the agent will run the numbers and spit a payment back to you. This is a fairly benign example, right? There are a million ways to do mortgage calculations on the internet.

Starting point is 00:28:50 But U.com also says it plans to support things like file uploads and graphical outputs and much more. So the idea, I think, is that you could upload all of your mortgage documents, and then essentially have the agent explain them to you and do things with them and even do things on your behalf with them. That is some seriously cool virtual assistant stuff. I know I've said this already, but it's important to keep saying that the tech for this all still is not finished. The underlying models are getting better, but they're wrong too often, they make things up too often, they don't show their work often enough, and they also just sometimes don't understand what you mean at all. Plus, I really increasingly don't buy the vision that

Starting point is 00:29:30 chat is the future of everything. No matter what open AI or Bing or whoever else would tell you, I don't think typing into a text box is the future of the universe. But I'm more convinced than ever that the superpowered virtual assistant we've been dreaming about and trying to build for more than half a century might finally be here soon. And it's going to be awesome. All right, that's it for the show today. And that's it for our AI miniseries. We have had so much fun making the series. I really hope you've enjoyed listening to it as well. We've also got a bunch more fun special stuff. coming up this fall, and don't worry, we will be back on Wednesday and Friday with your regularly scheduled Virgcast programming. Until then, AI David, you get one more run at the credits. You want to

Starting point is 00:30:13 do it? Here we go. Okay, that's it for the Vergecast today. Thanks to everyone on the show today. And thank you for listening. There's a whole lot more stuff from this conversation at theverge.com. We put some links in the show notes, but also read theverge.com. It's a good website. We like it. If you have thoughts, questions, feelings, or other ideas about how to book plane tickets, you can always email us at Vergecast at theverge.com or keep calling the hotline, 866. Verge 1-1. Send us all your thoughts and questions and ideas for what we should do on the show. This show is produced by Andrew Marino and Liam James. Brooke Minters is our editorial director of audio.

Starting point is 00:30:54 The Verge cast is a Verge production and part of the Vox Media podcast network. We'll be back with your regularly scheduled verge casts on Wednesday and Friday to talk about DVDs, VR, and all the other news of the week. See you then. Rock and roll.

The Vergecast - The 50-year quest for AI virtual assistants

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.