Orchestrate all the Things - You.com is taking on Google with AI, apps, privacy, and personalization. Featuring CEO / Founder Richard Socher
Episode Date: June 20, 2022Award-winning AI research - check. Startup and enterprise experience - check. Venture capital and Mark Benioff backing - check. Is that enough for Richard Socher's you.com to take on Google? Her...e is why and how he aims to do that. Article published on ZDNet
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Award-winning AI research? Check.
Startup and enterprise experience? Check.
Venture capital and Mark Binioff backing? Check.
Is that enough for Richard Schorcher's u.com to take on Google?
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data
Orchestration on Twitter, LinkedIn, and Facebook. Boy, I'll try to keep it short. Lots to cover
there. I'm originally from Germany, Dresden, technically, I guess, East Germany. Old enough
to have grown up a little bit during that time. And I'll skip all the stuff, but basically did my undergrad in Germany and France
one year as an Erasmus student. And did my PhD at Stanford. At the time, in most of my 20s,
I wanted to be a professor and worked super hard to try to get a good faculty job. Like almost my entire 20s, most of my hobbies died.
I was just like working nonstop,
came to the US with, you know, very little money
and a couple thousand bucks and just worked super hard
and got the faculty job.
But, you know, eventually did my PhD at Stanford,
won the best computer
science thesis award there, and felt like deep learning for natural language processing
is clearly the right technology. It was a, you know, kind of a big bet I made during my PhD.
A lot of my fellow PhD students were like, ah, that's kind of your niche, like, I don't really
want to work on that, and they're like, in the beginning, like they really couldn't convince everyone right away. But, you know, ultimately more and more people joined sort of
this field. I gave some of the first lectures then after I graduated on the side about deep
learning for NLP. Eventually we merged the official main NLP class with deep learning for NLP because
everything sort of a few years after my PhD,
every state-of-the-art model in NLP was a deep learning model at that time. So sort of taught on the side for four years at Stanford, but my main job instead of being a
faculty was actually then doing a startup. Initially, I thought I'd just postpone my
faculty job for a year, do the startup, and then kind of find some people to replace me
and still become a professor.
But then I couldn't really leave the company.
It doesn't really work like that, yeah.
Yeah.
Like some really senior professors at Stanford had made that work,
but they had other PhD students of their team that they brought in
and all of that, and then they could extract themselves,
and I really couldn't have.
But I also started teaching on the side one quarter a year, and then, you know,
not being affiliated with the university the rest of the year, and kind of doing some research
in my startup, and then we eventually got acquired, became the chief scientist at Salesforce,
and then did a lot of research there and started working on applications.
And it felt more impactful than what I could have done as a professor where you have at
most 10, 15 PhD students.
And at Salesforce, I had 100 plus researchers and many hundreds of engineers working on
AI applications that were pretty impactful and
scalable. And that was a lot of fun. But at the end of my PhD, I actually had implemented a first
version of the new search engine. And at the time I thought, man, it's just too ambitious. People
were probably like, Google is going to sue me. All my smart friends are going to work at Google.
It's going to be so hard to compete with them. No one's really complaining about Google very much in my circles and online. And so I kind of discarded the idea
and built MetaMind, an enterprise sort of AI platform that worked in medical imaging and
e-commerce images and NLP and a bunch of other things. But it was sort of the horizontal platform play as a machine learning tool for developers.
And anyway, so did that instead, but could never quite shake the idea off.
And then after four and a half wonderful years at Salesforce, decided it is time to really give this big, crazy idea a shot
and try to really build it for a variety of different reasons. And yeah, that's sort of
where we are now. And if you want, I can go into some of the reasons of why I ended up
building that search engine. Yeah, sure. Actually, that's something I
intended to ask you about. Just a brief detour before we go into that, because I have the
impression, you can correct me if I'm wrong, that actually at least part of what you did at
Salesforce was helping develop Einstein, which in my understanding is really more like a cut-sold term for, well,
injecting, let's say, AI and NLP capabilities into their core system.
Is that correct?
Yeah, yeah.
I run a bunch of different teams across the different clouds, service and sales and a bunch of different ways you can infuse it in chatbots and
call deflection, um, and, uh, helping salespeople understand, uh, next best, uh, next best steps to
take, uh, as they're trying to close, close a deal. Um, and, uh, a whole host of other things
like search capabilities within Salesforce were also part of my team.
So, yeah, it was a bunch of different things.
Okay, great.
So, right, then after this brief detour, which I think actually is sort of important for people to get a feeling of the different kinds of things that you've done
and where you've had your fingers in, let's say, then let's go to the big, crazy,
ambitious thing on taking on Google, basically, because, well, when we're talking about search,
it's sort of inevitable that Google will pop up, as it already has a couple of times, actually.
So the thing about Google is, well, you know we we can argue the the quality of its uh its search
engine and you know where whether it has blind spots and where those are but the thing is from
a business let's say point of view if you're if you're up for taking on on google then you have
a very steep you know climb to uh ahead of you because the moat that it has built around this business,
which is more or less around its index and its crawling and the amount of data that they
have collected, but also around the algorithms and the NLP and everything that they have
going for them, basically, under the hood, it's pretty monumental.
So I'm trying to get a feeling of what was it
that motivated you to embark on this pretty monumental task,
basically.
Yeah, you're definitely right.
And I'm not in this to do a quick acquisition
or a quick flip or something.
I'm motivated enough for this to work
on it for many years, and I think it will take many, many years. I think there are sort of three
different groups of reasons of why take on such a giant. And there's sort of like user-specific
macro and timing. I think from a user perspective, the fact that our privacy gets so
massively invaded at almost every step we take online as our lives go more and more online is
kind of unfortunate. And, you know, I think it all starts, every online journey starts with
search or most of them, I guess, you know, a lot of them are also going to social media and so on and they have their own problems.
But yeah, so I think privacy is a big sort of user specific issue that I have. And that is
becoming more and more and more users are becoming aware of it. And so I think that's a good thing that more people kind of realize, man, I searched for
this one thing and then that follows me around the whole internet or I visited this one site and now
that follows me around in other places and it feels uncomfortable. And then you have, of course,
the ads. As a user, it's just annoying to see five, seven different ads before you see some content.
And once you go to the content and you actually learn a little bit about how it works, you realize
all these SEO, like search engine optimized microsites are also just ads. Like they're
just trying to funnel Google traffic through SEO into Amazon traffic or other like affiliate links
and cookies. And they often don't have any useful stuff. They
just kind of write blah, blah, in order to get you to click on this Amazon link and then have
a 24 hour cookie on that. And so I think, you know, that's, that's another thing that's annoying
for users. And it's more and more as, as Google kind of is trying to increase their sales.
They just ran out of ideas and they're they're like, how do we increase sales?
Well, if we go from four to five ads,
like we'll make even more money, right?
And so that's sort of the innovation
that they've worked on in the last couple of years.
It's just like N plus one ads.
And so that's not ideal.
And on top of that, we are,
and this is kind of macro end user,
like I think it's important to have a choice
in the kind of information that you consume. A lot of people think about their, their food diet, but I
think our information diet is incredibly important too. And, and a lot of people are constantly
freaking out or, you know, worried about things. And there are a lot of worrisome things, but it's
also, I think important to be able to have some control over that information diet and say, I want to see more Reddit or less Reddit or I want to see New York Times or CDNet and others.
And, you know, kind of have some say in that versus just be sold with your information desires to the highest bidding advertiser after and have no control over it. I think that will
also, that choice will help us kind of stay in line better when it comes to sort of letting users
kind of build a good experience for themselves. So those are the user reasons. Now, macro reasons
are that the entire economy is moving online. And then you have that single gatekeeper in the beginning
that wants to sell you to the highest bidding advertiser.
And that I think is kind of not ideal setup
for the web period.
And the fact that every company needs to kind of pay a tax
to exist on that front page is also highly suboptimal.
And as I've gotten into this space,
I've heard now this story twice where companies were built,
they made money by having content that was useful,
was linked to, came up in organic Google rankings.
And then after a while,
the Google ads team started reaching out to that company and say, Hey,
it looks like you're getting a lot of traffic from Google.
You want to also buy some ads. And then in this two cases,
these companies said, no, we're good. We're making like a lot of money.
We're just getting organic,
like good content from our organic like results or traffic from our content.
And then boom next week,
they're like on page 10 and their business is like they
lose 95 of their revenue stream and their users and the traffic uh and they're like oh we're sorry
sorry we'll buy the ads and then boom they come back uh but now are paying for half of that traffic
um and it's i mean it's literally like in a bad movie where you're like, you know, you need protection
for your business.
And if you don't get it, you have no more business, right?
Like, and you need to, and it's kind of nuts that that's happening.
And so I, now also we have some, some tailwind for us in terms of, you know, antitrust kind
of realizing the issues for the entire economy.
And that timing is kind of the third bucket, which is
now's the time. But maybe before we go to two more, which is, you know, macro wise,
we're also an information age and there's more and more information. And 20 years ago,
when Google started, it was just like kind of amazing to have access to information.
Now the access field is more like table stakes. And the problem is how do you deal with all of it and you need to have ai that summarizes it for you and so as i was working on ai and natural language processing
for for over a decade uh i actually think that search not in how we conceived originally but
how we're conceiving it as a summary uh of information summary search um fortunately
summary isn't a very cool term.
Like no one is like, oh, wow, you do summary.
That's like Terminator or something.
But it's actually one of the hardest AI tasks
when you think about it.
And I'm happy to sort of explain why later.
But it's one of the most impactful AI applications,
what we're working on now.
And I'm really excited to help people get things done
and move from just a search engine to kind of a do engine. And so with that excited to help people get things done and move from just
a search engine to kind of a do engine. And so with that, yeah, the third bucket is kind of timing.
I think now is the time there hasn't really been that much innovation in search. And when you
kind of plot time and value, the sort of initially Google provided an insane amount of value,
but now it's kind of sort of logarithmically flattened off. Whereas your data
that you provide to Google was kind of linearly. And in the beginning, it wasn't as valuable,
but kind of, I think we're in an inflection point maybe a year ago. So where it feels like
people's data actually becomes more valuable than the services they get from Google because the,
you know, one value has kind of leveled off.
And so on top of that, more and more people are kind of realizing that there hasn't been that
much innovation in search. They complain that the only way to get something real out of Google is
to add site call and Reddit to their query every time. So they get what real people are saying.
And all of those issues are opportunities for a small startup to build something that helps people find what they really want, focuses on some niches too, sort of boutique search engines for particular things like in our case, you code.
That just provides more useful things and helps people save a lot of time and in the future also money.
Well, I think many people, perhaps most people and myself included, would agree with a number of your points.
Again, perhaps most of your points.
So just to pick one, I think actually summarization is not just a very important task in AI, but
well, if I were to summarize what I do, then I would say it's also
summarization. And that's what many people do in day in, day out. You consume a huge amount
of information and you sort of digest it and try to produce something intelligible and useful out
of it. So yeah, lots of what we do actually is about all about summarization. So it's,
it is pretty important and pretty complicated, actually, if you if you really think about it.
And so I'm tempted to, to ask you whether you think well, you touched upon a number of things,
well, some of them really have to do with I would even go as far as to say, like ethics and
regulation. So actually, making sure that, you know, some of the mishaps that have happened in the past
are not even allowed to happen.
And, you know, there has to be some overseeing.
And actually, I think that's honestly, you know, bigger than a single company or a single
effort.
It's, you know, something that needs to happen on a systemic level let's say but it's
good that you identify it and you know you have the um the willingness to do to actually do
something about it to to do things differently let's say uh but in terms of well how do you
actually make that happen i have an an interesting anecdote there. Well, at some point, which seems like a lifetime ago,
you know, with everything that's going on in the world right now,
I was invited to Russia to visit Yantex campus
and had the opportunity to speak to a number of people there.
And one of the interactions we had that sort of left an impression on me,
we obviously inevitably got into the discussion of well you
know how do you beat Google and they said something like well in order to do that,
in order to make people switch we have to be not just on par, we actually have to be better,
we have to be something like 10 times better. Do you think that's an assessment that makes sense
and if yes how do you actually do that?
And you've touched on a number of directions there.
And I was wondering, well, okay,
obviously summarization and the whole NLP
and question answering area is one way.
But actually that's no secret.
That's something that Google has been known to as well.
And while you did say that it has sort of plateaued, I'm sort of beg to
differ there. If you watch closely the evolution of
Google's search algorithms over the year, you will see that for
the last few years, at least they've actually injected quite
a lot of NLP and question answering.
And specifically, they seem to be using BERT now behind the scenes to power the search.
So one part of the question, I guess, is, well, how do you take on that?
And then you also touched on the privacy aspect.
And there's a lot to be said there, but it's already a very long and winding question.
So let's break there and pick on the NLP AI stuff
and then we can go into the privacy aspect.
Yeah, really, really great and insightful question.
So I think when you ask, can you be 10x better?
I think when you think so much about search,
you realize that there are different groups of searches
and seekers and searchers.
So like when, depending on the person
and depending on the query that you have,
the search sort of query, we actually have to
acknowledge that some searches you just can't beat better. If someone searches for weather,
the best you can do is kind of give them the forecast. Is it going to rain today or not?
And the weather and the temperature and probability of humidity and simple stuff like that. And
there is no space for making a 10 X better. If someone asks you like,
who's the president of Jamaica or something like you can't really,
you just,
the best you can do is you give them the answer in as few milliseconds as
possible. And there isn't anything you can do much better than, than that.
And so for I sort of call them quick navigational searches.
Someone just wants to go to Facebook, but sort of call them quick navigational searches. Someone just wants to
go to Facebook, but instead of typing facebook.com, they type in Facebook in their search engine,
they want to click on facebook.com as the first result. Those quick navigational searches,
there is not space to make a 10x better. You just kind of max out quickly. And we just have to make
sure that for these quick informational and quick navigational searches, we do as well as Google and we don't
suck. So we're not super slow and things like that. So you kind of have to group them. And then
there are kind of complex informational searches. And in those cases, I actually think we are
already doing better than Google. We just provide so much more rich information. And then there are sort of complex action searches
where you really want to actually accomplish something.
You want to buy something, you want to order something,
book a flight, like these kinds of things.
And there, I think there's a lot of potential
to do much, much better than Google.
And that's sort of our goal is,
and we'll make these announcements actually in a few weeks together with some other big announcements.
We'd love to talk again in a few weeks, but don't want to spill the beans.
But basically, on these kinds of searches, I think we can do a lot better.
And then you can kind of think about, okay, across this whole spectrum of searches, of which there are so, so many, right many right like some people look for stocks you want to just show them a stock ticker some look for
cryptocurrencies you want to show them a cryptocurrency ticker some look for the weather
you want to show them just directly the weather you want to look uh for all these other little
things right that that you have special apps or widgets for and we call those apps in our app store
um and uh and sometimes you like certain things better
and now then once you realize this you can say okay now what are groups of people that make a
lot of searches uh for their work or you know for for their hobbies and their deep interests
and you could really do a lot better if you understand what they're trying to do
and one i hope that everyone who hears this or reads this
will come to us and tell us also how we could make their searches better. And we were always,
you know, better, better, never done, especially in search as it captures pretty much everything
people do online and in their lives. But one particular one that we chose is in coding and
developer searches. And there are actually a ton of things that you can do better for learning and for coding.
And so I'll sometimes show, don't tell.
If you look for how to train a sequence
to sequence model in PyTorch,
we just have a Stack Overflow app here
that has the right kinds of answers
and what people have described.
There's a code snippet here
and there's a copy and paste button.
And you just, boom,
you just solve that person's problem potentially.
And if it's not that,
then maybe it's the official PyTorch documentation.
And you show the code snippets of which there are,
you know, in this case here, a ton.
And you kind of help them summarize this
or you help them like understand,
oh, this is what, you know,
people are saying on Reddit about this.
And, you know, you can kind of see how normal people talk about that.
And you have GitHub issues and you see the code and you see exactly what kinds of, you know, issue results people have as they're trying to do this task.
And none of those, like, Google gives you.
And if you did the same search and many other kinds of searches for, you know,
CSS Flex or train a simple layer neural net,
here we have an AI that actually just literally writes the entirety of the code for you.
And you can do all kinds of other things.
You can say, like, oh, I want to have, like like a Fibonacci function of N and just have that
be generated.
And then instead of doing a full-on search, which probably will find you too, you just
get the AI to write you that function and you're done.
And again, you have a copy and paste button and you just saved so much time compared to trying to find it out in other ways.
And, you know, likewise here, this is the original search on train a singular neural net.
You know, you can see all of these useful summaries of different content islands on the web.
And, you know, sometimes Stack Overflow does have the exact right answer.
You see, oh, wow, this is the top answer, lots of uploads.
And again, you can quickly see the code and copy and paste it.
And so each time you do this search, you save 30 seconds to 30 minutes if the AI just
writes the code for you.
And indeed, that is 10x better if you value your time, which most developers and companies
do.
And so I think that's kind of where the answer can be yes.
But you have to, and yes, there's a ton of AI and NLP in there.
If I look, for instance, for something maybe simpler like best headphones,
we also help you save time by summarizing.
So when you have articles like this here,
you basically can open the side panel and it just tells you like, okay, here's the main thing,
the good, the bad, some specs in some cases, and that's it. And you can kind of, you know,
as you scroll through this, you can kind of quickly see like repeating items and see kind of what people are saying
about this on social media too. I personally like the Reddit app. I should refresh this.
And when I see the Reddit app, I already kind of get a sense of what people are actually saying
about this. And if I don't like the Reddit app, I could also say,
no, I don't like this app.
I don't want to see Reddit results.
And this, I think, is kind of the future of search.
You have control over it.
You get a quick summary for a complex decision.
But then we want to, of course, go even further than that
and eventually help you actually execute
on those decisions, which I think would be even better.
But already, you're saving a ton of time across different types of searches that could actually
take in an old search engine a long time.
You have to open 10 different tabs and then you go through the tab and try to find the
thing here.
You just get the content that you need right away.
And that's kind of why we're growing now and why we have so much love on Twitter and other
channels for what we've built.
So that's kind of the answer.
And yeah, I think there are,
if all you're doing in search
is you want to go to facebook.com
and you want to find the weather,
we probably can't be 10x better.
And then for those kinds of searches,
what we've seen in the past
we've done pretty successfully
is DuckDuckGo just saying,
but for every search,
you'll have better privacy.
And so to me, privacy is also very important. And so we have a very private mode.
The interesting thing is, I think for a lot of privacy conversation, I think you alluded to this
too. Perfection is kind of the enemy of progress in that if you say, oh, we're all about privacy,
that's our only and main thing. Then people essentially want you to be like the hardcore
privacy people at that point want you to be a fully encrypted, fully open source, like no revenue,
no data, like nothing kind of project, essentially, you can't really be a company.
And so they will crush it a little bit so much that you will never be able to compete
with Google because Google does collect and get preferences from people and do it sort
of sneakily and implicitly and follow you around the whole internet.
Whereas in our case, we will never be as bad as Google.
We'll never sell your data.
But we do, if you want to log in, that you keep your preferences and share your IP
with services that need it in order to provide localized results, for instance, like you want
to say Chinese restaurants near me. And now we send Chinese restaurants with location of Palo
Alto or something or that IP to the Yelp API in order for Yelp to tell us which restaurants are close by to that person
that are Chinese food. And so the problem with that is that you now have to say, oh, we share
your data with third parties because you send this particular kind of query to that particular
kind of service in order to get a localized result. And the truth is that most people would prefer a localized result when they
look for that kind of thing and want that convenience 90% of their lives. And then there
may be some searches that people want a ton of privacy. And there are some obvious ones I'm not
going to mention, but maybe other ones like medical issues and so on, people often forget
and are important too.
And so when you want to have that privacy,
we just have a hardcore privacy mode.
And in that mode, we indeed don't share anything and we don't log anything.
And we basically don't keep track of how it's being used.
Also means we don't find bugs.
Like, so if there is a bug and you kind of have to tell us
explicitly that you were in private mode and you didn't find know, you kind of have to tell us explicitly that you
were in private mode and you didn't find something, you had some issue because we just don't know.
We have no analytics. And so I think that way you kind of get the best of both worlds and you can
switch when you want the privacy, you have it. And when you want the convenience, you also have
it. And then we can also over time build an actually better search engine, which to be honest,
like DuckDuckGo is a very thin wrapper around Bing, and it will never be able to build its own index and sort of doesn't seem to try to do it either.
And I mean, in some ways, that's fine.
But yeah, I think it'll be hard to not be dependent on it.
And you see this with most other sort of non-Google and Bing search engines too.
Yeah, that's a very obvious thing. And thank you for pointing it out. You make my life easier
because I was going to ask precisely about that. I mean, obviously, if you're in the search business,
well, it comes down to having two options. You either have to build your own index from scratch or you have
to use somebody else's, which is what DuckDuckGo is doing. So how do you go about that actually?
Are you building your own index? So we're also partnering. It's actually complicated. So we have
all these apps and half of those apps are based on indices that we've built ourselves and the other
half are not and when they're not there are sometimes basically based on other APIs that
you know we don't have satellites so we need weather data from other providers and things
like that and then we also partner with Bing to get some of their results and some other APIs
for a variety of things like restaurants and weather and stuff like that.
Okay. So you basically have a number of different indexes, some of which are your own,
some of which you're sort of outsourcing to third parties. And depending on the type of query you get, you use the right index for the query. That's right.
Okay. Another thing that sort of popped up to me while listening to you explaining
how you go about things such as code snippets, for example, is the sort of inevitable tensions,
the sort of inevitable trade-offs and decisions you have to make when you do the kind of thing
that you do.
And one of those actually has to do, which you touched upon earlier as well, with sending
traffic to sites, basically.
So when you have something like a pop-up panel, for example,
obviously that means that, well, you're not doing that.
Has that been a problem for you so far?
So I know it has been for Google.
So people complaining like, oh, Google is stealing our traffic and so on.
I'm guessing that it may not have been a problem for you yet,
but it may be eventually.
How are you planning to to deal with that yeah so we've actually reached out to almost everyone um who you know we crawl and we've built
apps for i actually don't even think of these apps as um as our apps i actually um and this is kind
of where what we'll announce in a couple weeks so I don't know how to best describe it without fully spilling the beans and and announcing it now but like, yeah, we want to be a much more open kind of platform, and you can get a little bit
of a sense of what I'm talking about if you go to u.com slash apps, I'll just share my screen really quick.
You can kind of see maybe and anticipate
what I'm trying to say and where we're going.
But here you get basically these different kind of apps
and you can say what you're interested in.
And you don't really install a search app.
You kind of set a preference for it so that the AI
and the ranker can sort of pre-filter.
You know, when you, like, for instance, like you're making a coding query, you don't want to see
the weather, right? And so we need to have like some AI that like pre-filters it, and then it can
actually, I think that's a pretty impactful, important idea to make AI controllable by the
people that are affected by it to largely make it useful for them,
but still like have some control over it.
You know, you basically use the ranking that people gave you.
And so, you know, when you think about all these applications,
obviously we don't feel like we should own these applications, right?
We don't own all of Time Magazine, all of NPR and so on.
And so, yeah, so that's kind of, I think, on. And so, yeah.
So that's kind of, I think, how we're thinking about this.
I think when Google does it,
Google takes all the content and all the revenue.
And then of course, no one wants to be like,
well, like you're just making people
not even come to my site anymore
and you just stole all my traffic and my content.
In this case here, this is actually like in the future that companies app.
Okay. Thanks. All right. So let's go to another thorny topic, which is the business model,
actually. This was something that stood out for me pretty early on, actually, because you do mention that, well, again, you can correct me if I'm wrong,
but my impression is that one of the key points, let's say,
in your dot-com pitch is, well, no ads, basically.
And in order to make that work,
that definitely implies some sort of paid model, basically.
Otherwise, where was your money going to come from?
And I think, well, you're going to tell me if my understanding is correct.
But if it is, I think this is going to be another uphill battle for the simple reason
that, well, people have been used for a number of years to just use search for free.
With everything that, you know, with all the baggage that
this comes from privacy and selling your data and ads and all of that.
So are you going to be pushing for a totally different model?
And if yes, I get the impression that maybe you're going to, or maybe you should be targeting
specific segments.
So you mentioned examples such as well, searching
for code, or other well, domain specific search apps, let's let's call them. So what's the idea
there? Are you going to be maybe charging for access to those? Yeah, so I guess currently,
we're we're just focused on growth and building the best possible search experience. I also believe in choice and, you know,
here, and like you see this already with the sources, we currently have no ads and we set
very publicly some of our value-based kind of swim lanes and guidelines in that we will never have
ads kind of get preferential treatment and change the ranking. We plan to never have
targeted privacy invading ads and follow you around the internet and also, you know,
we'll never sell you data. But we may actually get private, that is query dependent only ads
in the future. So if you look for an air compressor or an air purifier or something like that, you may see an air compressor ad, but it won't be linked to you. It won't be, you know,
sort of the advertisers don't know you made that search and it's easy, it's hard to track you as
you click on that link and everything. So it's basically similar to DuckDuckGo. But I actually
think it's important for us to try to not be too dependent on ads.
I think ads are kind of just a backup.
What I'm really excited about are these applications, these apps that we have.
And there are apps that are so useful that I think people would want to pay for them.
And you write is a good example.
And our first example of that, where most people who get an ai writing assistant are willing
to pay for it as it saves them a ton of time um and and that to me is like one example in the next
few weeks we'll um probably in a month and a half or so from now we'll announce another really big
such app and then after that we want to um yeah, kind of work together with others and build out that
whole ecosystem. Okay. I guess it's a good point for me then to ask you about, well, some of the
non-technology stuff, because we do have to touch upon those at some point as well.
So, well, you mentioned like your sort of mid to long-term business plan. And
I have to ask you then, what sort of backing do you have for you.com? And well,
which really means how long is your runway? And also, I guess, how patient are your investors
really waiting for you to figure this out?
And if you'd like to share some key facts and metrics about the company,
like, I don't know, a headcount or backing and this kind of thing,
that would help as well.
Yeah.
So one thing we have announced already is our seed round from, I guess,
late two years ago, late 2020, which our main backers were Mark Benioff
and Jim Breyer Capital and a few others, Day One Ventures and Sound Ventures and others um and yeah so we raised um 20 million in that round uh and uh since then uh we have some
more news which we'll share in three weeks or so um later in the month um and and yeah i guess in
terms of team size we don't really share like the exact headcount. There's also contractors and interns
and full-time and part-time and all of that. But it's a fairly small team, especially in
comparison to Google. I think that's fair to say. Okay. All right. So let's then rephrase that a
little bit. So how long do you think you have to figure it out, basically?
Several years.
Several years.
Okay.
Okay.
Fair enough.
All right.
Well, then I guess since we only have like a few minutes left, and I do want to pick
your brain on the bigger picture as well, unless you want to add something to what we
already said, which sort of alludes to a number of things
you'll be announcing soon, I guess.
Let's sort of wrap up the u.com deep dive
and sort of level up a little bit because-
I just have one, since you asked,
like I'll just throw in one thing,
which is uCode.
I think that is kind of a big one
and I think might be relevant
for your very techie
and positively geeky audience, which is very sort of reminiscent and similar to our own team.
Of course, a lot of engineers, mostly engineers in the company. And that is, yeah, U code is
basically a special search engine lens, if you will, that focuses only on programming and really makes you much more efficient, right?
And I showed you already the examples and happy to send you screenshots if you're interested or GIFs of, you know, auto AI, like code complete and code snippets and all of that.
So, yeah, that's kind of something I'm really excited about to help that community be more efficient because ultimately there need to be a lot more software developers in the future.
And those software developers will just have infinite work usually to do, and it'll be good
to make them more efficient. So yeah, and now happy to talk about all things AI. I love all AI
from the whole spectrum from highly philosophical to very applied.
Well, actually, what you just mentioned, I may as well use it as a segue.
I mean, the thing that you said about having more programmers in the future, well, maybe
yes and maybe no.
I mean, some of the more sophisticated AI models that have been released, and specifically
the large language models,
they seem to be pretty good at coding as well.
So I don't know, at least for the more boilerplate stuff,
you may actually end up needing less programmers than today.
I would liken that to book writing,
where you might say, well, now once you have the gutenberg book press
maybe you'd not need fewer people in the book space because you can just like print one book
so much more quickly now than before when a monk had to kind of you know manually copy the book for
for another copy but really the book space exploded. And so I think we see similar things like as people,
as you lower the bar, and I don't think AI will just figure out what things to implement and
build the next TikTok by itself, right? Like it'll be people having to have those ideas,
that empathy with users and people, and also that creativity to actually think about what you want
to build. But you're right. I think coders think about what you want to build.
But you're right.
I think coders will become more efficient thanks to AI.
And depending on what they program, they'll be much, much more efficient.
But I think actually that does not mean we need fewer coders. It just means that we can get more things done that will digitize.
I think there actually is a precedent on what happens when you get,
when you lower the bar that much.
There are a number of precedents.
Actually, you mentioned one with Gutenberg.
And I would also mention, well, music and, you know,
digital arts in general.
It used to be that, you know, only the really motivated
and the people that were really good got to publish, you know,
their creation because you had to have access to a studio and then a record company and all of that.
And now you can do everything on your laptop pretty much.
That's right.
Yeah.
Uber is another interesting example where it's like, on the one hand, they made it much easier to get a cab.
And you'd think, oh, that's bad for the taxi industry. But if you include Uber, it's essentially just a really, really clever taxi marketplace.
Then the taxi industry exploded and is now much, much larger.
There are so many examples like that.
Okay, so I watched your TED Talk from 2017, which seems like a lifetime ago in a way.
And I'm pretty sure you must feel that way as well,
in which you kind of tried to give a sense of where AI was at the moment
and where it was going.
And it seems to me that you did pretty well in the sense that you identified
two of the key directions that are sort of getting mainstream, let's say, today.
One is the emphasis on language and large language models.
It's like a constant bombardment, really,
of new models that are being released almost,
well, not on a daily basis, but monthly, definitely.
We have new models every month.
And the other direction that you spotted was,
well, what's now called multi-modal AI
or multi-modal training.
So having basically models
that combine different modalities.
So usually it's text and visual,
but well, it can be other modalities as well.
So I know that you have done lots of work in the NLP space.
So I was wondering what your take is on the current state of the art there.
So there has been lots of progress.
Basically, there are two schools of thought, let's say.
There are the scale-up school of thought, which says that basically, well, you have to scale things up.
And eventually, you will cross a threshold and you'll get some sort of emergent intelligent behavior by doing so.
And then there's the other school of thought, which says that, well, you have to inject some sort of domain knowledge or, I don't know, rules or whatever it is you want to call that.
What's your take on this debate, let's say, and where do you see the field going?
Yeah, that's an open-ended question I could talk about for hours.
Let's see, maybe I'll start with the very concrete last one.
I think the sort of knowledge base and rules-based teams and sort of school of thought, at least, has
largely been superseded with scaling up massively. But even after you scale up massively now,
instead of defining rules, you want to give a few examples and do some fine tuning on top,
or even some priming of the language model with a few example rules, and then it'll kind of auto-complete that kind
of idea multiple times. I think there's sort of a separation here between short-term progress
and really, really long-term kind of towards AGI progress. And I think short-term, we do indeed,
we will indeed be able to solve a lot of different problems with the current technology. And I think
it's actually quite easy these days to have a lot of impact with AI applications. And that's
kind of what we see in China too. And part of its economic boom is coming from much more automation,
more and more of which has AI in it to an AI applications without inventing any new models, just taking the existing models,
applying them very carefully in massive scale.
I do think for some long-term sort of AGI goals,
we do eventually need to inject
and be able to learn certain rules.
It's kind of a funny situation that you have the models that have a billion
parameters, billions of floating point operations, multiplications, and so on.
But if you ask the model in natural language, what's 365 times 554.6, the billions of floating
point operations cannot actually solve that one multiplication problem.
And so that kind of tells you something, right?
It tells you that there is space still to try to learn and actually extract these rules. The tricky bit is, how do you do that without rigidly pre-defining those?
And instead, how can you make them so they're actually learned in an abstract
way from data, but then learned in such a way that they will generalize properly to entire sets? So
you need to have basically set theory and some logical types and probabilistic reasoning and
things like that eventually emerge from these models in some
capacity. And it doesn't seem like so far just scaling it up is able to do that, even though,
you know, it is doing amazing, amazing things. And I, you know, I've been very excited about
language modeling for many years and multitask learning. And I do believe that, you know,
multitask learning is also, it continues
to be a challenge in the sense that if you want one task to be done really, really well,
it usually comes at the cost of other tasks. Like the best translation model isn't also
a pure language model, right? And the pure language model isn't the best sentiment analysis model. Usually people have to
fine tune it a little bit and modify it for that task at the very end. And so I'm still hopeful
that we can make a lot more progress. And clearly these general language models, as they get larger
and larger, are such good, have such a good embedding and, you know, quote, unquote, understanding of language
that they can solve a lot of different tasks quite well right away with zero-shot learning
without extra sort of training data.
But if you want to get the really high performance, you kind of have to give them a little bit
of extra training data after.
Okay, so you already sort of tackled the one strand of, well, criticism, let's say, on those big language models, which is, you know, the actual intelligence part.
The other two core parts, as I see it, have to do with, well, the whole environmental aspect, energy efficiency, and that sort of thing.
I want to do a quick note because you said actual intelligence.
I think that is actually very typical
and you're not the only one.
Usually once we solve a problem
that seemed impossible before
and incredibly hard
and definitely like an AI problem,
once we solve it,
it's not actual intelligence anymore.
It's like, you know,
when AI people worked on chess
before anyone could build
a really good chess algorithm,
everyone thought, wow,
once we solve chess,
we clearly have an artificial intelligence
that can, once it can solve chess,
it can solve all the other little things very quickly too.
And that was never true,
but it's also true that the definition of AI
is kind of constantly evolving.
And I did say that in five years ago,
and now that these AIs can do so much in language
and translation gets better and better over time.
And like, there's so many incredible things that can now do without extra train data.
That would have been like an impossible goal five, 10 years ago.
Now we're like, oh, it's just kind of, you know, language modeling.
It's not really AI anymore.
And that's just, that's happened for a long time.
I wonder if at some point we will sort of have solved problems that we think now are
impossible. And then at that point, it's like, well, that's just, you know,
like an app on your phone anymore.
It's not real AI because it doesn't get to this other thing. So anyway,
that's just an interesting side note.
Well, yeah, I mean, you're right.
Even though I have to say that personally, I never, I never thought that,
you know, solving chess or, you know, go or whatever is, or whatever is like the golden gate.
But yeah, you do have a point there.
Anyway, what I was trying to get at really is your take on the other two strands of criticism, basically.
So, you know, the whole energy efficiency, resource utilization and, you know, like value for money, if you want to call it that, of building these humongous AI models.
And then the whole bias, toxicity, and all of that that comes with it,
which is a sort of byproduct of the way you train the models.
There's been some progress on that.
At least the people who produce those models seem to be aware of those? And where do
you think this may go? Yeah. So the first one, I think the concerns about electricity were a little
bit overblown in that if you have one flight from Europe to the US or something, that's basically the same amount of carbon that it takes to train
like a reasonably large AI model and like, you know, a very large one. And so I don't think
there's a massive amount of impact that AI has on electrical and sort of CO2 and carbon emissions.
And that doesn't, of course, mean we can't do better
and we should get electricity from green sources,
which I think more and more data centers do.
U.com is also carbon neutral with its workforce
from day one that we started.
And so there are certainly issues I care about.
You could argue that if the brain can do certain things
with so much less like electricity and sort of energy,
I guess, then there's clearly still a much better
architecture to do this kind of computation, right?
With many fewer flops and, or, you know,
just electricity, energy usage in general.
So I think, sorry, not flops,
just like electricity slash energy.
And so, yeah, I think there is,
there's a lot that we can still improve
on the architectures.
In fact, it's kind of sad,
but right now we're mostly constrained
on architectures that make,
that can mostly rely on fast matrix multiplies,
like large matrices, multiply them together.
And if your model is very good at feeding large matrices
into your compute stack,
then you can train that model so much faster,
it's more efficient,
and that hence is how we're thinking about models.
And there are lots of other models like recursive ones
or recurrent ones that have fallen
out of favor largely because the compute isn't as efficient for those. And so it's kind of
interesting. When people say, oh, we're searching for these generally I models, it's similar to the
analogy of looking for your keys only under the street lamps. It's like you only look under the
street lamps of the current compute and not in other places. And so that is quite constraining of a search space.
And then bias is indeed, I think, one of the biggest real issues that is facing AI. You know,
AI is only as good as the systems of people, the societies as a whole, and as well as sort of the organizations that
train it and the data that they're using.
And so the same algorithm that can be used to classify, you know, very helpful things
in medicine, like is there a brain tumor?
Yes, no.
In the CT scan, the same kind of ideas of convolutional neural nets can be
used to discriminate against Uyghurs in China, right? And so it's, you know, the algorithms,
and this is an easy confusion for a lot of non-experts in AI, but, you know, there's sort of,
I'm sure you know this, but I think it's important to mention for your readers, is like,
just sort of the abstract
algorithm, like a convolutional neural network type thing. And then you train that abstract
algorithm on a specific task and data set, and then it becomes like, you know, a solver for just
that particular problem. And sometimes those, that particular problem is very broad, like predict the
next word in large language models, but sometimes it's very specific, like breast cancer, yes, no, in this particular pathology
kind of cell sample. And so the abstract algorithm is very hard to think about in terms of biases,
because it doesn't have yet much of a bias other than it exists and has certain hardware biases
and so on of how it was invented.
But the actual trained model does have a lot of biases and there we have to kind of think about
what that means. And really it's also hard to talk about that in all of its generality. I think you often have to look at specific industries, specific examples and use cases, and think about the ethics and
also just the impact that it has and try to have some empathy with the users that are affected by
that AI algorithm and think through all the use cases. And that's kind of in our case, why we have
this massive AI system that ranks in very few milliseconds, all these different apps, but we
acknowledge that we can be wrong.
And so we get people to say, yeah, if I don't like this result, you can still vote it and make it
different. And that is one way of dealing with bias is you actually look at the impact that that
model has on people and you let those people give very tight feedback loops to improve the AI and
change it back. And that's kind of one of the many ways you can tackle
that bias. Now, there's all kinds of interesting philosophical reasons. I don't want to go on
forever, but I think there will be in the next couple of years, some very interesting ethical
questions of, for instance, let's say, you know, I don't know the exact numbers, but for every
100,000 miles driven, like 10 people die in a car accident on average in this country.
What if you can say, and with an AI driving more and more cars, the AI only kills two
people every 100,000 miles.
And so it's a 5x improvement over people dying in car accidents with an AI.
But now that particular company could be said to have quote unquote killed those two people. Right. And maybe there was a bias,
like those people were wearing certain things or walking around at certain times of day or
in certain kinds of areas. And there wasn't enough train data. So it's a bias issue too.
Versus, you know, those 10 people, they were just distracted, fell asleep or texting
or something. And you're like, ah, you know, we can't really do something and already laws,
you know, don't text and drive. Like, that's it. Now, if you, depending on, you know, your ethical
stances and sort of beliefs, if you're utilitarian, you're like, well, there's an obvious improvement,
5X lower death rate. We're saving lives, like ship ship it. Let those AI run, obviously So I would rather let more and more people die
every time, but not let this company kill people with their AI on the streets.
There is, as I can see, no easy answer to this. I tend to think more pragmatically and statistically
about these things. And if we can save lots of lives at an abstract level, that's better.
But that doesn't help anyone who loses a loved one
when the AI drove, right?
And they're still going to hate the AI and its guts
and they're going to sue it to death and everything.
So really tough, but important questions
that we're going to raise the next couple of years.
Yeah, well, I would say that
if there's one takeaway from all of that,
because we also have to wrap up, I guess, is that, well, it's not really all about the technology.
I mean, at the end of the day, it comes down to making choices.
And we're the ones that have to really make those choices and we'll live with them. The only thing to do there is to really be aware of, you know, the technical
background and the implications of your choice and have like an open debate, you know, as a society
or where it is that you want to go, really. That's right. Yeah. Ultimately, AI is a tool.
It's a very powerful tool. It makes us much more efficient. I think it will be a step function in
human civilization, just like, you know, hunters and gatherers to agriculture and sort of the invention of fire, making things more efficient.
And then the invention of electricity and engines and industrial revolution, all of that. AI is another step function of that. And it's up to us to use it in a positive way.
And it could certainly, just like a hammer, cars or the internet, it can be used in very
positive ways and can be used in very negative ways.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.