The Vergecast - The 50-year quest for AI virtual assistants
Episode Date: September 25, 2023In part three of our AI mini series, David Pierce dives into the decades-long endeavor to build a virtual assistant. What can a virtual assistant do? What do we want it to do? Email us at vergecast@t...heverge.com or call us at 866-VERGE11, we love hearing from you. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Support for the show comes from Retool.
Too many companies run critical operations on duct taped spreadsheets,
Slack workflows, and whatever else they could cobble together.
Not because they want to, but because building internal tools
means weeks of waiting on someone else's backlog.
That's where Retool comes in.
Build custom internal tools just by describing what you need.
Prompts something like,
Build me a revenue dashboard on our Salesforce data.
And Retool actually builds it on your company's data,
in your cloud with enterprise security built in.
Go to retool.com slash vergecast.
We all need to retool how we build software.
Welcome to the Vergecast, the flagship podcast of Generative Search Queries.
I'm your friend David Pierce, and this is the third and final episode in our series about AI.
And for this one, I want to talk about one of the oldest stories in AI,
and really one of the oldest stories in the tech industry as a whole,
the virtual digital assistant.
My favorite story about this is one I learned a few years ago, and I've been thinking about it ever since.
So go back to 1979 in Palo Alto, California.
This guy named Steve Jobs, you may have heard of him, goes to Xerox Park, which was this ultra-successful R&D lab owned at the time by Xerox.
People at Park helped create everything from laser printers to Ethernet to personal computers.
It was this incredible factory of cool technology.
But the thing Park is probably best known for is what Steve Jobs saw, the graphical user interface,
which went beyond the text-based command-based computers everybody was using, and actually used
images, menus, and space, like three-dimensional space, to help people navigate their computers.
And as the legend goes, Jobs saw all that, took it back to Apple, made the Macintosh, took over the world.
The story is actually slightly more complicated than that.
The Macintosh had been in development since about a year earlier, and it seems like what the park visit may have really done is help convince jobs that it was a good idea worth pursuing and building rather than, you know, showing it to him for the first time.
But that is not what I want to talk about.
The story I want to talk about was happening in a different lab at Park, where a bunch of employees were teaching a computer to understand and communicate in natural language.
They had this idea for what they called the conversational user interface,
where instead of typing commands, essentially like writing code into your computer,
you could work with your computer by just asking questions and typing like a person.
Remember, this all sounds very normal now.
This is almost 50 years ago.
And actually forget typing as a whole.
They thought maybe eventually you'd just talk to your computer out loud like a person.
The team at Park in general thought that a computer should act more like an assistant than a machine.
You'd say, book me a flight to Tucson for next weekend, and it wouldn't launch a program to do that.
It would just do it for you.
They actually built this thing called Gus, the genial understander system, to show off what they were talking about.
And in a paper in 1977, again, 46 years ago, the researchers included a sample dialogue of how this interface might work.
I'm going to have two AI voices read it right now just so you get the vibe of how it might work.
It starts with Gus, the AI system, introducing itself.
Hello, my name is Gus. I can help you plan a simple trip by air. Where do you want to go?
I want to go to San Diego on May 28th. What time do you want to leave?
I must be in San Diego before 10 in the morning. Would you like PSA, Flight 102 that arrives at 9.15am?
What is the next flight? Air California Flight 310 that leaves at 8.30m. and arrives at 10 a.m.
I'll take the first one.
In what name should I make a reservation, please.
Dan Bobrow.
I have confirmed the following flight, PSA.
Flight 102 on Wednesday, May 28th from San Jose to San Diego that leaves at 7.30m.
And arrives at 9.15am.
What date do you want to return on?
On Friday in the evening.
Would you like the flight that leaves at 7.45 p.m.?
That's fine.
I have confirmed the following flight, PSA.
Flight 307 on Friday, May,
Friday, May 30th from San Diego to San Jose that leaves at 7.45 p.m. and arrives at 9.30 p.m.
Thank you for calling. Goodbye.
I'm pretty sure the voices there wouldn't have sounded nearly that good in 1977, but you get the idea.
Obviously, the graphical user interface won out at the time, and one for a long time.
That thing you just heard is not how we book plane tickets, and it hasn't been for the last
four decades. But people did keep working on this stuff. The idea of
of a virtual digital assistant that can just do things for you, and all you have to do is talk to it like you would talk to a person, has been so seductive to so many people for so long.
And in recent years, it actually feels like we're kind of veering back toward the conversational user interface as the future of computing.
I mean, chat GPT is nothing, if not a conversational user interface.
It's like the whole of the internet in a chat window.
Bing is becoming the same thing.
Google has barred, which is, you know, the same deal.
And even with things like Alexa and Siri and Google Assistant, we've been heading down this road for almost a decade.
But it actually feels like it's really exploding right now.
And this push we're on raises some really interesting questions.
How far does the chat window go?
Is the future of computing really just like messaging?
And are we actually, with this advent of modern large language models and all of the underlying tech that makes AI smarter and more useful,
getting close to the kind of virtual assistant we've been dreaming about for basically half a century.
To help me figure out those answers, I called up Richard Socher.
I'm Richard Socher and the CEO at U.com.
U.com is actually one of the most ambitious AI assistant platforms I've ever tried.
It's partly an AI search engine so it can answer your questions or help you find information on the web.
That's really where it started.
It's also a generative AI tool so you can use it to write code,
or create images or come up with brilliant emails that just sound like you.
And long term, if Richard has it right,
u.com will also become something like the smartphone,
a platform on which you can do almost anything all through this one interface.
Richard has been in the tech industry for a long time,
and he's been working on problems of natural language processing for even longer.
He actually did a PhD at Stanford on bringing neural networks,
which is kind of an AI system,
to natural language processing, which was at that time a huge leap forward in rethinking how
computers process speech. The point is he knows the history of all this. He knows how it works.
He knows how hard it has been to pull this off. And he told me that over the last decade,
as he saw this tech get better, he thought this might be the moment to do it.
What he wanted to build was a chatbot, kind of, and an assistant, sure, but he looked at all
his research on language and conversation and natural language processing and how we interact with
computers, and he decided the thing he really needed to do was build a different kind of search engine.
The biggest application of natural language understanding is in search. That's where billions of
people learn something, try to find information, try to get things done, and they start all of these
tasks on a search engine. And so do I want to necessarily call that a search engine?
it almost doesn't matter what you call it, but it is that sort of magical spot in the beginning
of many people's online journey where you sit to be able to be helpful for those users.
And so that's sort of what let me to search.
That distinction of what you call it, I actually think is like part of the reason I'm so
fascinated by you.com.
Because I think you're absolutely, I mean, you're obviously right that search is kind of the
dominant interface for this kind of natural language.
processing and the idea of just like how people search for information.
But simultaneously, even as you're starting to think about how to apply neural networks to this
stuff, like obviously you were thinking about this stuff before chat GPT was around showing,
you know, new chat body ideas about how this stuff could work.
But you were, you were playing with this technology for a long time.
And I can imagine a world that is not that different from this world where you went much more
down the like Siri, Alexa, Google Assistant route rather than kind of building something that
looks like a search engine, even to accomplish relatively the same goal, right? Am I thinking about that
the wrong way? Like, are those two things kind of further apart in your brain than they are in
mine? You couldn't be more spot on. And it's actually very timely too, because we dropped the word
search engine from you.com. Oh, interesting. Okay. We're now just like make it very clear that this is going to be a
conversation and that we're an AI chat assistant that's going to help you learn and get more
things done, help you summarize, and all of these things. And so that's exactly how we're actually
repositioned ourselves. After many years, we've learned that really search engines do not inspire
excitement from people, but an AI chat assistant that helps you with finding information and
answers and learn and so on does excite people. They're ultimately the same thing. It is a marketing
question. We still want to be able to say, this can be the default in your browser. You know,
you can start every question you have in your browser and your URL navbar. It defaults to you.com
because that's where you get the most useful first step in many different intents that you have.
But yeah, we're moving away from the term search engine. And to be honest, and I probably shouldn't say
this, but we don't have it finalized yet what the new terminology should be. The place we want to be
be is the most helpful assistant that you can be, which honestly is also something that Google has
worked on for many years as well. Part of what seems like has changed since you started working at
you.com is that I think one thing that you did that was very smart at the beginning was make it
look like a search engine, just because I think that's like a thing people understand what to do
with, right? That like even in the way that I think a lot of people open
chat GPT now and it's like, well, what do I do with this? People instinctively understand what
to do when they open a web page that looks like Google. So building a web page that kind of looks
like Google makes a lot of sense, even if what you're doing on the other side of that
ends up being very different. Do you think that's changing that you can make something that looks
less and less like a Google query page and start to bring people along with you into all that other
stuff now? Yeah, it sounds like you're in our meetings. That's very astute, very astute.
observation too here. This is something we exactly found that last year we had in the summer,
the first LMs within a search engine context. And we tried to innovate over the last two and a half
years along a lot of different dimensions of what could in the past be called a search engine.
We tried, you know, we had horizontal scroll where you just see a few sources and then you can
always dive deeper into it very quickly. And we had, you know,
different apps that are within that search engine context.
And what we would often hear from users until late last year was, that's interesting,
but I'm just so used to Google.
I wouldn't want to switch to anything that's too different.
And we kept getting pulled back by users in our journey to something that is more and more
similar to Google.
And it was quite frustrating because we had some good ideas.
We thought, and in each of these ideas, we had a core.
a hardcore group of power users who did love it, but it just wasn't growing enough.
And the majority of feedback we kept getting is it's too different to Google. It's two different
to Google. I want something like that's more similar, but just a little bit better here and there.
And so that all changed thanks to chat GBT. All of a sudden, a lot of users, a lot of them
early adopters, but more and more just normal folks who basically could conceive a new world of
finding and learning about things in the world.
And so that is kind of the biggest advantage that we've had since chat chivity came out.
And, you know, two and a half weeks later, we launched Youth Chat, which basically was the
first LM that is connected to the internet and hence can be up-to-date factual and have citations
for the facts that it tells you making it much more trustworthy.
Okay.
We need to take a break.
And then we're going to come back and talk more with Richard.
Support for this show comes from Shopify.
Starting something new isn't just hard.
It can be really scary, too.
So much work goes into this thing that you're not entirely sure will even work.
But here's a better thought.
What if it did all work?
What if your instincts were actually right all along?
Shopify wants to help you get there.
They're the commerce platform behind millions of businesses worldwide and nearly 10% of all e-commerce in the U.S.
from established brands like Allbirds and Hines
to companies just getting started.
Their design tools make it simple
to create the exact online presence you're envisioning
with hundreds of ready-to-use templates available.
And with built-in marketing tools,
you can launch full email and social campaigns
in just a few clicks.
So you can connect with customers wherever they are.
It's time to turn those what-ifs into
with Shopify today.
You can sign up for your $1 per month trial today
at Shopify.com slash vergecast.
You can go to Shopify.com slash vergecast.
That's Shopify.
com slash vergecast.
Welcome back.
Like we've been talking about,
the uptick in AI and natural language processing
and voice assistance
has been happening kind of slowly
and then all at once over the last decade or so.
And even though Siri still kind of sucks sometimes,
the underlying tech really has come
an incredibly long way in that time.
But I think if you wanted to pick the moment the modern era of conversational user interfaces really started,
it would have to be the launch of chat GPT last fall.
Something about this system that felt like it understood all your questions
and responded with human and helpful answers felt like magic to a lot of people
in a way that even Alexa and Google Assistant just never did.
ChatGPT isn't perfect, obviously, it's a million miles away from perfect,
but it seemed to make people think that it might be someday,
and that there might be something revolutionary in that.
This could be the way we access information,
not by going to Google.com or scrolling stuff on your phone,
but just by asking for it.
That felt huge.
But now it's a year later,
and that idea has taken a bit of a hit.
For one thing, we've all discovered the many, many, many imperfections of the tech,
and also the ways in which conversations maybe aren't all that efficient.
I think we're definitely in the low part of the chatbot hype cycle.
It'll get better, but it's not.
what it once was.
Richard, though, started U.com before the chat GPT explosion, and so he's seen everything
rise and fall and rise and fall and all the parts of the hype cycle here.
And he's seen people poke at the edges of the tech and really try to figure out where it breaks.
So what have we learned?
What can a virtual assistant do?
And if and when the tech really does work, how do we want it to work?
Richard says we're still learning, he's still learning, but he has some pretty big ideas.
about it. Like we, I love text, right? I love natural language processing, but text isn't always the best
way to communicate something. Multimodal answers are the best way. So if you ask,
you chat, what's the stock price of Pallantier or Salesforce or some company? We're not going to
just give you a bunch of numbers, especially not those that are hallucinated by an LM, but we show
you a stock ticker that's interactive. You can move your mouse over. You can see the
daily trends, the weekly trends, the yearly trends, the market cap, and all of these different
numbers, and you just see it directly within the chat window, right? And then you can ask like,
well, now who's their CEO? And it still captures all the context and remembers which company
you talked about before and then, you know, gives you that company CEO, even though if you,
you know, just did in a retrieval back end, who's their CEO, it wouldn't give you anything
useful. But, you know, we learn to transform those queries as one of the many things that, you know,
going on the background that you have to get right in a lot of different ways.
And very easy and obvious when you get it wrong, easy mistakes to make.
And so we believe the answer in the future is going to be multimodal.
And that's why we invented also last year this app store where you can essentially,
other people can also submit apps to you.com and then they can show up within the search
results and within the chat response.
And I think that will ultimately be the best interface,
similar to, you know, give me some good restaurants around me.
I still remember when speech recognition started to finally work and I started, you know,
investing also after my first company's acquisition, I had a bunch of folks say,
oh, we're going to use speech recognition and to do all of these different things.
And it's going to be your personal assistant.
I'm like, okay, well, give me an example.
And one of the founders was like, well, the example is like, you ask for a restaurant.
And I'm like, so you really say I have this like speech system and I talk to it.
and I ask it, like, what are some good restaurants?
And the response is, there are 150 restaurants around you.
Number one has 4.3 stars.
Their favorite dish is pat-tie and is 0.3 miles away from,
and then like 10 hours later, you have a sense of where to go.
Like, it doesn't make any sense.
And so, you know, I love AI.
I love natural language processing,
but you have to have some user empathy and try to build it in a way that meets people
where they are and not be too much like,
this is the new hammer, everything is a nail kind of thing.
Yeah, I mean, it's it's kind of amazing how many systems work like that redo all the
restaurants in the neighborhood thing.
Like, that's where we are with so many things right now.
I guess the App Store thing was going to be my follow-up question, which is like,
how much of this do you feel obligated to build yourself versus how much can you kind of
create the ecosystem for?
And I guess they do still wonder that, like especially as it grows and the capabilities
grow and there's a lot more things to do.
how much of that space is kind of you.com's job versus being, you know, the platform on which all of this other stuff exists.
Maybe we should hire you in our product team.
That is the big, the billion trillion dollar question.
I think long term, it's very clear to me that it cannot be all on us.
And an assistant that ultimately lets you accomplish anything and encompasses the entire economy.
But it's also clear that there's a cold start problem here.
People don't want to build an app on a platform that doesn't have hundreds of millions of users.
Hundreds of millions of users aren't going to come to your platform if you don't already provide enough functionality to fulfill most of their needs in a really, really good way.
And some needs better than anyone else ever or anywhere else.
And so that's the cold start problem.
So in the beginning, it is mostly on us.
And then I hope over time it transitions and we'll.
get more and more high-quality apps into the ecosystem. And I think that will ultimately be a more
fair way of structuring that first page of the internet that currently, you know, Google benefits
from everyone else on the internet, but then they don't benefit back that much other than,
of course, you know, paying for ads and traffic and having to pay for even their own ads on their
own name just so that there's not a competitor that outbids them on their own name and then they
show up on top and like all of these things. So I think it's a better way and a fair way.
to have ultimately a chat portal again,
but I think portals also have a bad marketing connotation,
and so we might not call it that.
Yeah, I always thought one of the things you guys did really well
from the very beginning was kind of let people just tell you the sources they like better.
I've never understood why every search engine doesn't just let me say,
I really like these four news websites,
and I'm not that interested in seeing other news websites.
It seems like such a straightforward thing.
And I feel like the App Store model really,
just builds on that in a lot of ways, right?
You get to curate, like, here is the stuff I want to see and the ways I want to see it,
and others can tap into that, and that's a pretty powerful kind of feedback loop, which I think is very cool.
Yeah, in the pre-chat days, Reddit was one of our favorite apps of our users.
How do you think about personalization?
I mean, it seems to me that the perfect version of this, you know, years down the road or however long it takes,
is an assistant that effectively knows everything about,
me, right? And knows where I am and what I like and what I'm interested in and on and on and on.
And obviously that has big privacy questions, which I want to talk about, but also just like
logistically is very hard to do. And I'm just thinking about like search is one thing where
it's just kind of upranking and downranking web pages. But like if you're trying to help me
do stuff, getting information from me seems really important, but also like the kind of thing
you don't want to too much over you're going to lose people. Like, how do you're going to
How do you think about that?
That's right.
Yeah.
It's something we have been working on for many months.
You have to know when to do this and when it's not that useful.
If you just want to know, like, when was Barack Obama born, it doesn't really make that
much sense to be like, well, maybe he liked mountain biking too sometimes.
You know, it's like, that's not what I asked for.
I just ask you a simple question.
I want to get a simple fact.
And so to not overly personalize, but to know when to do it well and when to really use
the facts and then also do it make it seamless for the user. So it just feels like, oh, yeah,
like the more I use this, the better it gets for me without being too creepy either. And still
having control over their private information in a way that, you know, makes them feel safe.
It's very important. So what are you seeing as folks are really starting to embrace the kind of
chat assistant piece of this? Obviously, the like, help me write an email, help me with my code
thing seem to be too, sort of killer apps for people. Is there other stuff you're seeing that's
jumping out at you as kind of nascent use cases here? Yeah, a ton of things. I think students are going to
have a really great time on you.com. We just launched it back to school, campaign two. It's now the
cheapest way to access GPD4, plus, you know, have access to the internet for that chatbot. And on that
front, too, soon we will have a whole new class of questions that are very complex that
no other normal L.M chatbot can currently solve accurately, but often pretends to be able to
solve and then does make a lot of mistakes, which I think is very problematic. It's just a whole
new class of problems that can be answered with an AI that has just more superpowers than just a
large language model. But yeah, for now, I think I still see a lot of potential for parents,
you know, for folks who want to find new recipes, for people who want to do travel planning
and get like nice ideas for itineraries that are going to be much more personalized also
in the coming, coming weeks on you.com and then students getting answers for their questions.
I'm excited also to incorporate more and more generative AI capabilities into these.
You know, if you ask like, how can I generate an image with AI?
Then we just let you do that right then and there.
If you're on your pro, you can do it as often as you want.
And so, yeah, lots of.
lots of interesting use cases. I think the more, where we're still iterating on is like when
to make it just as good as Google for the quick navigational stuff and the quick informational stuff.
And then when can we really shine by being much more chat first? Right. And you do kind of have
to do both, right? Even though the navigational stuff is ultimately not that interesting, that's the like,
that's a big part of the sort of habit building of it. Right. And if you really want to be the kind of
homepage of the internet for people, you have to do that as well or better than Google.
That's exactly right. You know, and there are a lot of like chatbots out there that some of them,
I think, are a little bit, you know, they have the hype moments, but because they are just not
connected to the internet, they're not that factual and up to date, they give you sometimes
they even give you citations, but don't actually, when you click on the citation, you don't see
the fact that they're saying, you know, that was the citation for. It's not at all on that page.
And that loses a lot of trust after a while.
And so, you know, there's a lot of like sort of brief hype chats.
But I think to have a really long-term shot, you need to have the utility that is more similar to a search engine,
not just be like a dictionary lookup kind of thing that you use once a week.
Yeah, totally.
I think the idea of an AI chatbot is like a complementary thing is kind of interesting.
But it is like a primary interface to the internet is where this stuff starts to get really,
interesting. And I think that's coming, but it's displacing Google is harder than most people
give it credit for is the thing I have learned over and over and over in the time I've been
in this world. That's exactly right. So last thing, talk to me about privacy. Just for a minute.
I know it's a thing you guys have talked a lot about. I know the kind of all of this stuff we're
talking about utility and personalization and all this stuff often kind of runs directly against
privacy. Do you have a strategy for kind of how to have your cake and eat it too here? Like how do you
think you can do right by all those things at the same time? Yeah, great question. We used to
talk a lot about privacy and we still care about it, but we don't talk about it as much because,
to be honest, it doesn't excite enough people. You just can't grow quickly enough a consumer
business by just talking a lot about privacy. There are a few folks that can and they, you know,
Dukta Go obviously is like amazing on that front, but we just didn't see the growth from having a strong
focus on it. And it does in many, many cases, slow you down. Like, people want to have no bugs and they
want to know that things work. But if you were to never, ever do any analytics, you don't know how
happy you are making your users, right? And so in many cases, we didn't use, you know, we don't
use Google analytics and so on to have the best privacy of any of these chat engines. When you look at
pretty much all our chat competitors, all of them are selling your chat transcripts and your data
to others, maybe not always like specifically, but like, like attached to your IP or something like
that. But they're all like using Google Analytics and and all of that. And so we're trying our best
to be the most private of those engines. But to be honest, we've also, when you talk about it a lot,
there's just some people who will never be happy until you're an open source like nonprofit on
the Tor network that like makes no no revenue like and has no ads or anything. Like, and
And that's just not a viable, viable business strategy either.
And so, yeah, we care about it.
We think we're the best private chat engine and chat assistant out there.
But we just don't talk about it as much as other features that people seem to care about more.
And it helped us grow to many more millions of users than before.
A little while after Richard and I talked, U.com rolled out a feature that he hinted at a couple of times in our interview.
It's called U agent.
and it's able to do everything from calculations to code execution for you.
One of the examples U.com gives is calculating your mortgage.
Feed it all the data you care about, the amount of your loan, the time, the interest rate,
and the agent will run the numbers and spit a payment back to you.
This is a fairly benign example, right?
There are a million ways to do mortgage calculations on the internet.
But U.com also says it plans to support things like file uploads and graphical outputs and much more.
So the idea, I think, is that you could upload all of your mortgage documents,
and then essentially have the agent explain them to you and do things with them and even do things
on your behalf with them. That is some seriously cool virtual assistant stuff.
I know I've said this already, but it's important to keep saying that the tech for this
all still is not finished. The underlying models are getting better, but they're wrong too often,
they make things up too often, they don't show their work often enough, and they also just sometimes
don't understand what you mean at all. Plus, I really increasingly don't buy the vision that
chat is the future of everything. No matter what open AI or Bing or whoever else would tell you,
I don't think typing into a text box is the future of the universe. But I'm more convinced than ever
that the superpowered virtual assistant we've been dreaming about and trying to build for more than
half a century might finally be here soon. And it's going to be awesome. All right, that's it for the
show today. And that's it for our AI miniseries. We have had so much fun making the series. I really
hope you've enjoyed listening to it as well. We've also got a bunch more fun special stuff.
coming up this fall, and don't worry, we will be back on Wednesday and Friday with your regularly
scheduled Virgcast programming. Until then, AI David, you get one more run at the credits. You want to
do it? Here we go. Okay, that's it for the Vergecast today. Thanks to everyone on the show today.
And thank you for listening. There's a whole lot more stuff from this conversation at theverge.com.
We put some links in the show notes, but also read theverge.com. It's a good website. We like it.
If you have thoughts, questions, feelings, or other ideas about how to book plane tickets,
you can always email us at Vergecast at theverge.com or keep calling the hotline, 866.
Verge 1-1. Send us all your thoughts and questions and ideas for what we should do on the show.
This show is produced by Andrew Marino and Liam James.
Brooke Minters is our editorial director of audio.
The Verge cast is a Verge production and part of the Vox Media podcast network.
We'll be back with your regularly scheduled verge casts on Wednesday and Friday to talk about DVDs, VR, and all the other news of the week.
See you then. Rock and roll.
