Orchestrate all the Things - You.com is taking on Google with AI, apps, privacy, and personalization. Featuring CEO / Founder Richard Socher

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Award-winning AI research? Check. Startup and enterprise experience? Check. Venture capital and Mark Binioff backing? Check. Is that enough for Richard Schorcher's u.com to take on Google? I hope you will enjoy the podcast. If you like my work, you can follow Link Data

Starting point is 00:00:25 Orchestration on Twitter, LinkedIn, and Facebook. Boy, I'll try to keep it short. Lots to cover there. I'm originally from Germany, Dresden, technically, I guess, East Germany. Old enough to have grown up a little bit during that time. And I'll skip all the stuff, but basically did my undergrad in Germany and France one year as an Erasmus student. And did my PhD at Stanford. At the time, in most of my 20s, I wanted to be a professor and worked super hard to try to get a good faculty job. Like almost my entire 20s, most of my hobbies died. I was just like working nonstop, came to the US with, you know, very little money and a couple thousand bucks and just worked super hard

Starting point is 00:01:19 and got the faculty job. But, you know, eventually did my PhD at Stanford, won the best computer science thesis award there, and felt like deep learning for natural language processing is clearly the right technology. It was a, you know, kind of a big bet I made during my PhD. A lot of my fellow PhD students were like, ah, that's kind of your niche, like, I don't really want to work on that, and they're like, in the beginning, like they really couldn't convince everyone right away. But, you know, ultimately more and more people joined sort of this field. I gave some of the first lectures then after I graduated on the side about deep

Starting point is 00:01:55 learning for NLP. Eventually we merged the official main NLP class with deep learning for NLP because everything sort of a few years after my PhD, every state-of-the-art model in NLP was a deep learning model at that time. So sort of taught on the side for four years at Stanford, but my main job instead of being a faculty was actually then doing a startup. Initially, I thought I'd just postpone my faculty job for a year, do the startup, and then kind of find some people to replace me and still become a professor. But then I couldn't really leave the company. It doesn't really work like that, yeah.

Starting point is 00:02:31 Yeah. Like some really senior professors at Stanford had made that work, but they had other PhD students of their team that they brought in and all of that, and then they could extract themselves, and I really couldn't have. But I also started teaching on the side one quarter a year, and then, you know, not being affiliated with the university the rest of the year, and kind of doing some research in my startup, and then we eventually got acquired, became the chief scientist at Salesforce,

Starting point is 00:03:01 and then did a lot of research there and started working on applications. And it felt more impactful than what I could have done as a professor where you have at most 10, 15 PhD students. And at Salesforce, I had 100 plus researchers and many hundreds of engineers working on AI applications that were pretty impactful and scalable. And that was a lot of fun. But at the end of my PhD, I actually had implemented a first version of the new search engine. And at the time I thought, man, it's just too ambitious. People were probably like, Google is going to sue me. All my smart friends are going to work at Google.

Starting point is 00:03:41 It's going to be so hard to compete with them. No one's really complaining about Google very much in my circles and online. And so I kind of discarded the idea and built MetaMind, an enterprise sort of AI platform that worked in medical imaging and e-commerce images and NLP and a bunch of other things. But it was sort of the horizontal platform play as a machine learning tool for developers. And anyway, so did that instead, but could never quite shake the idea off. And then after four and a half wonderful years at Salesforce, decided it is time to really give this big, crazy idea a shot and try to really build it for a variety of different reasons. And yeah, that's sort of where we are now. And if you want, I can go into some of the reasons of why I ended up building that search engine. Yeah, sure. Actually, that's something I

Starting point is 00:04:45 intended to ask you about. Just a brief detour before we go into that, because I have the impression, you can correct me if I'm wrong, that actually at least part of what you did at Salesforce was helping develop Einstein, which in my understanding is really more like a cut-sold term for, well, injecting, let's say, AI and NLP capabilities into their core system. Is that correct? Yeah, yeah. I run a bunch of different teams across the different clouds, service and sales and a bunch of different ways you can infuse it in chatbots and call deflection, um, and, uh, helping salespeople understand, uh, next best, uh, next best steps to

Starting point is 00:05:34 take, uh, as they're trying to close, close a deal. Um, and, uh, a whole host of other things like search capabilities within Salesforce were also part of my team. So, yeah, it was a bunch of different things. Okay, great. So, right, then after this brief detour, which I think actually is sort of important for people to get a feeling of the different kinds of things that you've done and where you've had your fingers in, let's say, then let's go to the big, crazy, ambitious thing on taking on Google, basically, because, well, when we're talking about search, it's sort of inevitable that Google will pop up, as it already has a couple of times, actually.

Starting point is 00:06:20 So the thing about Google is, well, you know we we can argue the the quality of its uh its search engine and you know where whether it has blind spots and where those are but the thing is from a business let's say point of view if you're if you're up for taking on on google then you have a very steep you know climb to uh ahead of you because the moat that it has built around this business, which is more or less around its index and its crawling and the amount of data that they have collected, but also around the algorithms and the NLP and everything that they have going for them, basically, under the hood, it's pretty monumental. So I'm trying to get a feeling of what was it

Starting point is 00:07:10 that motivated you to embark on this pretty monumental task, basically. Yeah, you're definitely right. And I'm not in this to do a quick acquisition or a quick flip or something. I'm motivated enough for this to work on it for many years, and I think it will take many, many years. I think there are sort of three different groups of reasons of why take on such a giant. And there's sort of like user-specific

Starting point is 00:07:40 macro and timing. I think from a user perspective, the fact that our privacy gets so massively invaded at almost every step we take online as our lives go more and more online is kind of unfortunate. And, you know, I think it all starts, every online journey starts with search or most of them, I guess, you know, a lot of them are also going to social media and so on and they have their own problems. But yeah, so I think privacy is a big sort of user specific issue that I have. And that is becoming more and more and more users are becoming aware of it. And so I think that's a good thing that more people kind of realize, man, I searched for this one thing and then that follows me around the whole internet or I visited this one site and now that follows me around in other places and it feels uncomfortable. And then you have, of course,

Starting point is 00:08:39 the ads. As a user, it's just annoying to see five, seven different ads before you see some content. And once you go to the content and you actually learn a little bit about how it works, you realize all these SEO, like search engine optimized microsites are also just ads. Like they're just trying to funnel Google traffic through SEO into Amazon traffic or other like affiliate links and cookies. And they often don't have any useful stuff. They just kind of write blah, blah, in order to get you to click on this Amazon link and then have a 24 hour cookie on that. And so I think, you know, that's, that's another thing that's annoying for users. And it's more and more as, as Google kind of is trying to increase their sales.

Starting point is 00:09:23 They just ran out of ideas and they're they're like, how do we increase sales? Well, if we go from four to five ads, like we'll make even more money, right? And so that's sort of the innovation that they've worked on in the last couple of years. It's just like N plus one ads. And so that's not ideal. And on top of that, we are,

Starting point is 00:09:39 and this is kind of macro end user, like I think it's important to have a choice in the kind of information that you consume. A lot of people think about their, their food diet, but I think our information diet is incredibly important too. And, and a lot of people are constantly freaking out or, you know, worried about things. And there are a lot of worrisome things, but it's also, I think important to be able to have some control over that information diet and say, I want to see more Reddit or less Reddit or I want to see New York Times or CDNet and others. And, you know, kind of have some say in that versus just be sold with your information desires to the highest bidding advertiser after and have no control over it. I think that will also, that choice will help us kind of stay in line better when it comes to sort of letting users

Starting point is 00:10:33 kind of build a good experience for themselves. So those are the user reasons. Now, macro reasons are that the entire economy is moving online. And then you have that single gatekeeper in the beginning that wants to sell you to the highest bidding advertiser. And that I think is kind of not ideal setup for the web period. And the fact that every company needs to kind of pay a tax to exist on that front page is also highly suboptimal. And as I've gotten into this space,

Starting point is 00:11:05 I've heard now this story twice where companies were built, they made money by having content that was useful, was linked to, came up in organic Google rankings. And then after a while, the Google ads team started reaching out to that company and say, Hey, it looks like you're getting a lot of traffic from Google. You want to also buy some ads. And then in this two cases, these companies said, no, we're good. We're making like a lot of money.

Starting point is 00:11:35 We're just getting organic, like good content from our organic like results or traffic from our content. And then boom next week, they're like on page 10 and their business is like they lose 95 of their revenue stream and their users and the traffic uh and they're like oh we're sorry sorry we'll buy the ads and then boom they come back uh but now are paying for half of that traffic um and it's i mean it's literally like in a bad movie where you're like, you know, you need protection for your business.

Starting point is 00:12:06 And if you don't get it, you have no more business, right? Like, and you need to, and it's kind of nuts that that's happening. And so I, now also we have some, some tailwind for us in terms of, you know, antitrust kind of realizing the issues for the entire economy. And that timing is kind of the third bucket, which is now's the time. But maybe before we go to two more, which is, you know, macro wise, we're also an information age and there's more and more information. And 20 years ago, when Google started, it was just like kind of amazing to have access to information.

Starting point is 00:12:39 Now the access field is more like table stakes. And the problem is how do you deal with all of it and you need to have ai that summarizes it for you and so as i was working on ai and natural language processing for for over a decade uh i actually think that search not in how we conceived originally but how we're conceiving it as a summary uh of information summary search um fortunately summary isn't a very cool term. Like no one is like, oh, wow, you do summary. That's like Terminator or something. But it's actually one of the hardest AI tasks when you think about it.

Starting point is 00:13:14 And I'm happy to sort of explain why later. But it's one of the most impactful AI applications, what we're working on now. And I'm really excited to help people get things done and move from just a search engine to kind of a do engine. And so with that excited to help people get things done and move from just a search engine to kind of a do engine. And so with that, yeah, the third bucket is kind of timing. I think now is the time there hasn't really been that much innovation in search. And when you kind of plot time and value, the sort of initially Google provided an insane amount of value,

Starting point is 00:13:42 but now it's kind of sort of logarithmically flattened off. Whereas your data that you provide to Google was kind of linearly. And in the beginning, it wasn't as valuable, but kind of, I think we're in an inflection point maybe a year ago. So where it feels like people's data actually becomes more valuable than the services they get from Google because the, you know, one value has kind of leveled off. And so on top of that, more and more people are kind of realizing that there hasn't been that much innovation in search. They complain that the only way to get something real out of Google is to add site call and Reddit to their query every time. So they get what real people are saying.

Starting point is 00:14:22 And all of those issues are opportunities for a small startup to build something that helps people find what they really want, focuses on some niches too, sort of boutique search engines for particular things like in our case, you code. That just provides more useful things and helps people save a lot of time and in the future also money. Well, I think many people, perhaps most people and myself included, would agree with a number of your points. Again, perhaps most of your points. So just to pick one, I think actually summarization is not just a very important task in AI, but well, if I were to summarize what I do, then I would say it's also summarization. And that's what many people do in day in, day out. You consume a huge amount of information and you sort of digest it and try to produce something intelligible and useful out

Starting point is 00:15:20 of it. So yeah, lots of what we do actually is about all about summarization. So it's, it is pretty important and pretty complicated, actually, if you if you really think about it. And so I'm tempted to, to ask you whether you think well, you touched upon a number of things, well, some of them really have to do with I would even go as far as to say, like ethics and regulation. So actually, making sure that, you know, some of the mishaps that have happened in the past are not even allowed to happen. And, you know, there has to be some overseeing. And actually, I think that's honestly, you know, bigger than a single company or a single

Starting point is 00:16:00 effort. It's, you know, something that needs to happen on a systemic level let's say but it's good that you identify it and you know you have the um the willingness to do to actually do something about it to to do things differently let's say uh but in terms of well how do you actually make that happen i have an an interesting anecdote there. Well, at some point, which seems like a lifetime ago, you know, with everything that's going on in the world right now, I was invited to Russia to visit Yantex campus and had the opportunity to speak to a number of people there.

Starting point is 00:16:37 And one of the interactions we had that sort of left an impression on me, we obviously inevitably got into the discussion of well you know how do you beat Google and they said something like well in order to do that, in order to make people switch we have to be not just on par, we actually have to be better, we have to be something like 10 times better. Do you think that's an assessment that makes sense and if yes how do you actually do that? And you've touched on a number of directions there. And I was wondering, well, okay,

Starting point is 00:17:11 obviously summarization and the whole NLP and question answering area is one way. But actually that's no secret. That's something that Google has been known to as well. And while you did say that it has sort of plateaued, I'm sort of beg to differ there. If you watch closely the evolution of Google's search algorithms over the year, you will see that for the last few years, at least they've actually injected quite

Starting point is 00:17:42 a lot of NLP and question answering. And specifically, they seem to be using BERT now behind the scenes to power the search. So one part of the question, I guess, is, well, how do you take on that? And then you also touched on the privacy aspect. And there's a lot to be said there, but it's already a very long and winding question. So let's break there and pick on the NLP AI stuff and then we can go into the privacy aspect. Yeah, really, really great and insightful question.

Starting point is 00:18:17 So I think when you ask, can you be 10x better? I think when you think so much about search, you realize that there are different groups of searches and seekers and searchers. So like when, depending on the person and depending on the query that you have, the search sort of query, we actually have to acknowledge that some searches you just can't beat better. If someone searches for weather,

Starting point is 00:18:52 the best you can do is kind of give them the forecast. Is it going to rain today or not? And the weather and the temperature and probability of humidity and simple stuff like that. And there is no space for making a 10 X better. If someone asks you like, who's the president of Jamaica or something like you can't really, you just, the best you can do is you give them the answer in as few milliseconds as possible. And there isn't anything you can do much better than, than that. And so for I sort of call them quick navigational searches.

Starting point is 00:19:24 Someone just wants to go to Facebook, but sort of call them quick navigational searches. Someone just wants to go to Facebook, but instead of typing facebook.com, they type in Facebook in their search engine, they want to click on facebook.com as the first result. Those quick navigational searches, there is not space to make a 10x better. You just kind of max out quickly. And we just have to make sure that for these quick informational and quick navigational searches, we do as well as Google and we don't suck. So we're not super slow and things like that. So you kind of have to group them. And then there are kind of complex informational searches. And in those cases, I actually think we are already doing better than Google. We just provide so much more rich information. And then there are sort of complex action searches

Starting point is 00:20:08 where you really want to actually accomplish something. You want to buy something, you want to order something, book a flight, like these kinds of things. And there, I think there's a lot of potential to do much, much better than Google. And that's sort of our goal is, and we'll make these announcements actually in a few weeks together with some other big announcements. We'd love to talk again in a few weeks, but don't want to spill the beans.

Starting point is 00:20:34 But basically, on these kinds of searches, I think we can do a lot better. And then you can kind of think about, okay, across this whole spectrum of searches, of which there are so, so many, right many right like some people look for stocks you want to just show them a stock ticker some look for cryptocurrencies you want to show them a cryptocurrency ticker some look for the weather you want to show them just directly the weather you want to look uh for all these other little things right that that you have special apps or widgets for and we call those apps in our app store um and uh and sometimes you like certain things better and now then once you realize this you can say okay now what are groups of people that make a lot of searches uh for their work or you know for for their hobbies and their deep interests

Starting point is 00:21:16 and you could really do a lot better if you understand what they're trying to do and one i hope that everyone who hears this or reads this will come to us and tell us also how we could make their searches better. And we were always, you know, better, better, never done, especially in search as it captures pretty much everything people do online and in their lives. But one particular one that we chose is in coding and developer searches. And there are actually a ton of things that you can do better for learning and for coding. And so I'll sometimes show, don't tell. If you look for how to train a sequence

Starting point is 00:21:52 to sequence model in PyTorch, we just have a Stack Overflow app here that has the right kinds of answers and what people have described. There's a code snippet here and there's a copy and paste button. And you just, boom, you just solve that person's problem potentially.

Starting point is 00:22:09 And if it's not that, then maybe it's the official PyTorch documentation. And you show the code snippets of which there are, you know, in this case here, a ton. And you kind of help them summarize this or you help them like understand, oh, this is what, you know, people are saying on Reddit about this.

Starting point is 00:22:28 And, you know, you can kind of see how normal people talk about that. And you have GitHub issues and you see the code and you see exactly what kinds of, you know, issue results people have as they're trying to do this task. And none of those, like, Google gives you. And if you did the same search and many other kinds of searches for, you know, CSS Flex or train a simple layer neural net, here we have an AI that actually just literally writes the entirety of the code for you. And you can do all kinds of other things. You can say, like, oh, I want to have, like like a Fibonacci function of N and just have that

Starting point is 00:23:11 be generated. And then instead of doing a full-on search, which probably will find you too, you just get the AI to write you that function and you're done. And again, you have a copy and paste button and you just saved so much time compared to trying to find it out in other ways. And, you know, likewise here, this is the original search on train a singular neural net. You know, you can see all of these useful summaries of different content islands on the web. And, you know, sometimes Stack Overflow does have the exact right answer. You see, oh, wow, this is the top answer, lots of uploads.

Starting point is 00:23:45 And again, you can quickly see the code and copy and paste it. And so each time you do this search, you save 30 seconds to 30 minutes if the AI just writes the code for you. And indeed, that is 10x better if you value your time, which most developers and companies do. And so I think that's kind of where the answer can be yes. But you have to, and yes, there's a ton of AI and NLP in there. If I look, for instance, for something maybe simpler like best headphones,

Starting point is 00:24:19 we also help you save time by summarizing. So when you have articles like this here, you basically can open the side panel and it just tells you like, okay, here's the main thing, the good, the bad, some specs in some cases, and that's it. And you can kind of, you know, as you scroll through this, you can kind of quickly see like repeating items and see kind of what people are saying about this on social media too. I personally like the Reddit app. I should refresh this. And when I see the Reddit app, I already kind of get a sense of what people are actually saying about this. And if I don't like the Reddit app, I could also say,

Starting point is 00:25:05 no, I don't like this app. I don't want to see Reddit results. And this, I think, is kind of the future of search. You have control over it. You get a quick summary for a complex decision. But then we want to, of course, go even further than that and eventually help you actually execute on those decisions, which I think would be even better.

Starting point is 00:25:22 But already, you're saving a ton of time across different types of searches that could actually take in an old search engine a long time. You have to open 10 different tabs and then you go through the tab and try to find the thing here. You just get the content that you need right away. And that's kind of why we're growing now and why we have so much love on Twitter and other channels for what we've built. So that's kind of the answer.

Starting point is 00:25:47 And yeah, I think there are, if all you're doing in search is you want to go to facebook.com and you want to find the weather, we probably can't be 10x better. And then for those kinds of searches, what we've seen in the past we've done pretty successfully

Starting point is 00:26:00 is DuckDuckGo just saying, but for every search, you'll have better privacy. And so to me, privacy is also very important. And so we have a very private mode. The interesting thing is, I think for a lot of privacy conversation, I think you alluded to this too. Perfection is kind of the enemy of progress in that if you say, oh, we're all about privacy, that's our only and main thing. Then people essentially want you to be like the hardcore privacy people at that point want you to be a fully encrypted, fully open source, like no revenue,

Starting point is 00:26:37 no data, like nothing kind of project, essentially, you can't really be a company. And so they will crush it a little bit so much that you will never be able to compete with Google because Google does collect and get preferences from people and do it sort of sneakily and implicitly and follow you around the whole internet. Whereas in our case, we will never be as bad as Google. We'll never sell your data. But we do, if you want to log in, that you keep your preferences and share your IP with services that need it in order to provide localized results, for instance, like you want

Starting point is 00:27:12 to say Chinese restaurants near me. And now we send Chinese restaurants with location of Palo Alto or something or that IP to the Yelp API in order for Yelp to tell us which restaurants are close by to that person that are Chinese food. And so the problem with that is that you now have to say, oh, we share your data with third parties because you send this particular kind of query to that particular kind of service in order to get a localized result. And the truth is that most people would prefer a localized result when they look for that kind of thing and want that convenience 90% of their lives. And then there may be some searches that people want a ton of privacy. And there are some obvious ones I'm not going to mention, but maybe other ones like medical issues and so on, people often forget

Starting point is 00:28:03 and are important too. And so when you want to have that privacy, we just have a hardcore privacy mode. And in that mode, we indeed don't share anything and we don't log anything. And we basically don't keep track of how it's being used. Also means we don't find bugs. Like, so if there is a bug and you kind of have to tell us explicitly that you were in private mode and you didn't find know, you kind of have to tell us explicitly that you

Starting point is 00:28:25 were in private mode and you didn't find something, you had some issue because we just don't know. We have no analytics. And so I think that way you kind of get the best of both worlds and you can switch when you want the privacy, you have it. And when you want the convenience, you also have it. And then we can also over time build an actually better search engine, which to be honest, like DuckDuckGo is a very thin wrapper around Bing, and it will never be able to build its own index and sort of doesn't seem to try to do it either. And I mean, in some ways, that's fine. But yeah, I think it'll be hard to not be dependent on it. And you see this with most other sort of non-Google and Bing search engines too.

Starting point is 00:29:06 Yeah, that's a very obvious thing. And thank you for pointing it out. You make my life easier because I was going to ask precisely about that. I mean, obviously, if you're in the search business, well, it comes down to having two options. You either have to build your own index from scratch or you have to use somebody else's, which is what DuckDuckGo is doing. So how do you go about that actually? Are you building your own index? So we're also partnering. It's actually complicated. So we have all these apps and half of those apps are based on indices that we've built ourselves and the other half are not and when they're not there are sometimes basically based on other APIs that you know we don't have satellites so we need weather data from other providers and things

Starting point is 00:29:59 like that and then we also partner with Bing to get some of their results and some other APIs for a variety of things like restaurants and weather and stuff like that. Okay. So you basically have a number of different indexes, some of which are your own, some of which you're sort of outsourcing to third parties. And depending on the type of query you get, you use the right index for the query. That's right. Okay. Another thing that sort of popped up to me while listening to you explaining how you go about things such as code snippets, for example, is the sort of inevitable tensions, the sort of inevitable trade-offs and decisions you have to make when you do the kind of thing that you do.

Starting point is 00:30:54 And one of those actually has to do, which you touched upon earlier as well, with sending traffic to sites, basically. So when you have something like a pop-up panel, for example, obviously that means that, well, you're not doing that. Has that been a problem for you so far? So I know it has been for Google. So people complaining like, oh, Google is stealing our traffic and so on. I'm guessing that it may not have been a problem for you yet,

Starting point is 00:31:22 but it may be eventually. How are you planning to to deal with that yeah so we've actually reached out to almost everyone um who you know we crawl and we've built apps for i actually don't even think of these apps as um as our apps i actually um and this is kind of where what we'll announce in a couple weeks so I don't know how to best describe it without fully spilling the beans and and announcing it now but like, yeah, we want to be a much more open kind of platform, and you can get a little bit of a sense of what I'm talking about if you go to u.com slash apps, I'll just share my screen really quick. You can kind of see maybe and anticipate what I'm trying to say and where we're going. But here you get basically these different kind of apps

Starting point is 00:32:16 and you can say what you're interested in. And you don't really install a search app. You kind of set a preference for it so that the AI and the ranker can sort of pre-filter. You know, when you, like, for instance, like you're making a coding query, you don't want to see the weather, right? And so we need to have like some AI that like pre-filters it, and then it can actually, I think that's a pretty impactful, important idea to make AI controllable by the people that are affected by it to largely make it useful for them,

Starting point is 00:32:46 but still like have some control over it. You know, you basically use the ranking that people gave you. And so, you know, when you think about all these applications, obviously we don't feel like we should own these applications, right? We don't own all of Time Magazine, all of NPR and so on. And so, yeah, so that's kind of, I think, on. And so, yeah. So that's kind of, I think, how we're thinking about this. I think when Google does it,

Starting point is 00:33:10 Google takes all the content and all the revenue. And then of course, no one wants to be like, well, like you're just making people not even come to my site anymore and you just stole all my traffic and my content. In this case here, this is actually like in the future that companies app. Okay. Thanks. All right. So let's go to another thorny topic, which is the business model, actually. This was something that stood out for me pretty early on, actually, because you do mention that, well, again, you can correct me if I'm wrong,

Starting point is 00:33:48 but my impression is that one of the key points, let's say, in your dot-com pitch is, well, no ads, basically. And in order to make that work, that definitely implies some sort of paid model, basically. Otherwise, where was your money going to come from? And I think, well, you're going to tell me if my understanding is correct. But if it is, I think this is going to be another uphill battle for the simple reason that, well, people have been used for a number of years to just use search for free.

Starting point is 00:34:22 With everything that, you know, with all the baggage that this comes from privacy and selling your data and ads and all of that. So are you going to be pushing for a totally different model? And if yes, I get the impression that maybe you're going to, or maybe you should be targeting specific segments. So you mentioned examples such as well, searching for code, or other well, domain specific search apps, let's let's call them. So what's the idea there? Are you going to be maybe charging for access to those? Yeah, so I guess currently,

Starting point is 00:34:58 we're we're just focused on growth and building the best possible search experience. I also believe in choice and, you know, here, and like you see this already with the sources, we currently have no ads and we set very publicly some of our value-based kind of swim lanes and guidelines in that we will never have ads kind of get preferential treatment and change the ranking. We plan to never have targeted privacy invading ads and follow you around the internet and also, you know, we'll never sell you data. But we may actually get private, that is query dependent only ads in the future. So if you look for an air compressor or an air purifier or something like that, you may see an air compressor ad, but it won't be linked to you. It won't be, you know, sort of the advertisers don't know you made that search and it's easy, it's hard to track you as

Starting point is 00:35:56 you click on that link and everything. So it's basically similar to DuckDuckGo. But I actually think it's important for us to try to not be too dependent on ads. I think ads are kind of just a backup. What I'm really excited about are these applications, these apps that we have. And there are apps that are so useful that I think people would want to pay for them. And you write is a good example. And our first example of that, where most people who get an ai writing assistant are willing to pay for it as it saves them a ton of time um and and that to me is like one example in the next

Starting point is 00:36:34 few weeks we'll um probably in a month and a half or so from now we'll announce another really big such app and then after that we want to um yeah, kind of work together with others and build out that whole ecosystem. Okay. I guess it's a good point for me then to ask you about, well, some of the non-technology stuff, because we do have to touch upon those at some point as well. So, well, you mentioned like your sort of mid to long-term business plan. And I have to ask you then, what sort of backing do you have for you.com? And well, which really means how long is your runway? And also, I guess, how patient are your investors really waiting for you to figure this out?

Starting point is 00:37:25 And if you'd like to share some key facts and metrics about the company, like, I don't know, a headcount or backing and this kind of thing, that would help as well. Yeah. So one thing we have announced already is our seed round from, I guess, late two years ago, late 2020, which our main backers were Mark Benioff and Jim Breyer Capital and a few others, Day One Ventures and Sound Ventures and others um and yeah so we raised um 20 million in that round uh and uh since then uh we have some more news which we'll share in three weeks or so um later in the month um and and yeah i guess in

Starting point is 00:38:19 terms of team size we don't really share like the exact headcount. There's also contractors and interns and full-time and part-time and all of that. But it's a fairly small team, especially in comparison to Google. I think that's fair to say. Okay. All right. So let's then rephrase that a little bit. So how long do you think you have to figure it out, basically? Several years. Several years. Okay. Okay.

Starting point is 00:38:51 Fair enough. All right. Well, then I guess since we only have like a few minutes left, and I do want to pick your brain on the bigger picture as well, unless you want to add something to what we already said, which sort of alludes to a number of things you'll be announcing soon, I guess. Let's sort of wrap up the u.com deep dive and sort of level up a little bit because-

Starting point is 00:39:16 I just have one, since you asked, like I'll just throw in one thing, which is uCode. I think that is kind of a big one and I think might be relevant for your very techie and positively geeky audience, which is very sort of reminiscent and similar to our own team. Of course, a lot of engineers, mostly engineers in the company. And that is, yeah, U code is

Starting point is 00:39:41 basically a special search engine lens, if you will, that focuses only on programming and really makes you much more efficient, right? And I showed you already the examples and happy to send you screenshots if you're interested or GIFs of, you know, auto AI, like code complete and code snippets and all of that. So, yeah, that's kind of something I'm really excited about to help that community be more efficient because ultimately there need to be a lot more software developers in the future. And those software developers will just have infinite work usually to do, and it'll be good to make them more efficient. So yeah, and now happy to talk about all things AI. I love all AI from the whole spectrum from highly philosophical to very applied. Well, actually, what you just mentioned, I may as well use it as a segue. I mean, the thing that you said about having more programmers in the future, well, maybe

Starting point is 00:40:36 yes and maybe no. I mean, some of the more sophisticated AI models that have been released, and specifically the large language models, they seem to be pretty good at coding as well. So I don't know, at least for the more boilerplate stuff, you may actually end up needing less programmers than today. I would liken that to book writing, where you might say, well, now once you have the gutenberg book press

Starting point is 00:41:07 maybe you'd not need fewer people in the book space because you can just like print one book so much more quickly now than before when a monk had to kind of you know manually copy the book for for another copy but really the book space exploded. And so I think we see similar things like as people, as you lower the bar, and I don't think AI will just figure out what things to implement and build the next TikTok by itself, right? Like it'll be people having to have those ideas, that empathy with users and people, and also that creativity to actually think about what you want to build. But you're right. I think coders think about what you want to build. But you're right.

Starting point is 00:41:48 I think coders will become more efficient thanks to AI. And depending on what they program, they'll be much, much more efficient. But I think actually that does not mean we need fewer coders. It just means that we can get more things done that will digitize. I think there actually is a precedent on what happens when you get, when you lower the bar that much. There are a number of precedents. Actually, you mentioned one with Gutenberg. And I would also mention, well, music and, you know,

Starting point is 00:42:15 digital arts in general. It used to be that, you know, only the really motivated and the people that were really good got to publish, you know, their creation because you had to have access to a studio and then a record company and all of that. And now you can do everything on your laptop pretty much. That's right. Yeah. Uber is another interesting example where it's like, on the one hand, they made it much easier to get a cab.

Starting point is 00:42:39 And you'd think, oh, that's bad for the taxi industry. But if you include Uber, it's essentially just a really, really clever taxi marketplace. Then the taxi industry exploded and is now much, much larger. There are so many examples like that. Okay, so I watched your TED Talk from 2017, which seems like a lifetime ago in a way. And I'm pretty sure you must feel that way as well, in which you kind of tried to give a sense of where AI was at the moment and where it was going. And it seems to me that you did pretty well in the sense that you identified

Starting point is 00:43:21 two of the key directions that are sort of getting mainstream, let's say, today. One is the emphasis on language and large language models. It's like a constant bombardment, really, of new models that are being released almost, well, not on a daily basis, but monthly, definitely. We have new models every month. And the other direction that you spotted was, well, what's now called multi-modal AI

Starting point is 00:43:50 or multi-modal training. So having basically models that combine different modalities. So usually it's text and visual, but well, it can be other modalities as well. So I know that you have done lots of work in the NLP space. So I was wondering what your take is on the current state of the art there. So there has been lots of progress.

Starting point is 00:44:19 Basically, there are two schools of thought, let's say. There are the scale-up school of thought, which says that basically, well, you have to scale things up. And eventually, you will cross a threshold and you'll get some sort of emergent intelligent behavior by doing so. And then there's the other school of thought, which says that, well, you have to inject some sort of domain knowledge or, I don't know, rules or whatever it is you want to call that. What's your take on this debate, let's say, and where do you see the field going? Yeah, that's an open-ended question I could talk about for hours. Let's see, maybe I'll start with the very concrete last one. I think the sort of knowledge base and rules-based teams and sort of school of thought, at least, has

Starting point is 00:45:08 largely been superseded with scaling up massively. But even after you scale up massively now, instead of defining rules, you want to give a few examples and do some fine tuning on top, or even some priming of the language model with a few example rules, and then it'll kind of auto-complete that kind of idea multiple times. I think there's sort of a separation here between short-term progress and really, really long-term kind of towards AGI progress. And I think short-term, we do indeed, we will indeed be able to solve a lot of different problems with the current technology. And I think it's actually quite easy these days to have a lot of impact with AI applications. And that's kind of what we see in China too. And part of its economic boom is coming from much more automation,

Starting point is 00:45:59 more and more of which has AI in it to an AI applications without inventing any new models, just taking the existing models, applying them very carefully in massive scale. I do think for some long-term sort of AGI goals, we do eventually need to inject and be able to learn certain rules. It's kind of a funny situation that you have the models that have a billion parameters, billions of floating point operations, multiplications, and so on. But if you ask the model in natural language, what's 365 times 554.6, the billions of floating

Starting point is 00:46:40 point operations cannot actually solve that one multiplication problem. And so that kind of tells you something, right? It tells you that there is space still to try to learn and actually extract these rules. The tricky bit is, how do you do that without rigidly pre-defining those? And instead, how can you make them so they're actually learned in an abstract way from data, but then learned in such a way that they will generalize properly to entire sets? So you need to have basically set theory and some logical types and probabilistic reasoning and things like that eventually emerge from these models in some capacity. And it doesn't seem like so far just scaling it up is able to do that, even though,

Starting point is 00:47:33 you know, it is doing amazing, amazing things. And I, you know, I've been very excited about language modeling for many years and multitask learning. And I do believe that, you know, multitask learning is also, it continues to be a challenge in the sense that if you want one task to be done really, really well, it usually comes at the cost of other tasks. Like the best translation model isn't also a pure language model, right? And the pure language model isn't the best sentiment analysis model. Usually people have to fine tune it a little bit and modify it for that task at the very end. And so I'm still hopeful that we can make a lot more progress. And clearly these general language models, as they get larger

Starting point is 00:48:19 and larger, are such good, have such a good embedding and, you know, quote, unquote, understanding of language that they can solve a lot of different tasks quite well right away with zero-shot learning without extra sort of training data. But if you want to get the really high performance, you kind of have to give them a little bit of extra training data after. Okay, so you already sort of tackled the one strand of, well, criticism, let's say, on those big language models, which is, you know, the actual intelligence part. The other two core parts, as I see it, have to do with, well, the whole environmental aspect, energy efficiency, and that sort of thing. I want to do a quick note because you said actual intelligence.

Starting point is 00:49:05 I think that is actually very typical and you're not the only one. Usually once we solve a problem that seemed impossible before and incredibly hard and definitely like an AI problem, once we solve it, it's not actual intelligence anymore.

Starting point is 00:49:17 It's like, you know, when AI people worked on chess before anyone could build a really good chess algorithm, everyone thought, wow, once we solve chess, we clearly have an artificial intelligence that can, once it can solve chess,

Starting point is 00:49:28 it can solve all the other little things very quickly too. And that was never true, but it's also true that the definition of AI is kind of constantly evolving. And I did say that in five years ago, and now that these AIs can do so much in language and translation gets better and better over time. And like, there's so many incredible things that can now do without extra train data.

Starting point is 00:49:48 That would have been like an impossible goal five, 10 years ago. Now we're like, oh, it's just kind of, you know, language modeling. It's not really AI anymore. And that's just, that's happened for a long time. I wonder if at some point we will sort of have solved problems that we think now are impossible. And then at that point, it's like, well, that's just, you know, like an app on your phone anymore. It's not real AI because it doesn't get to this other thing. So anyway,

Starting point is 00:50:13 that's just an interesting side note. Well, yeah, I mean, you're right. Even though I have to say that personally, I never, I never thought that, you know, solving chess or, you know, go or whatever is, or whatever is like the golden gate. But yeah, you do have a point there. Anyway, what I was trying to get at really is your take on the other two strands of criticism, basically. So, you know, the whole energy efficiency, resource utilization and, you know, like value for money, if you want to call it that, of building these humongous AI models. And then the whole bias, toxicity, and all of that that comes with it,

Starting point is 00:50:53 which is a sort of byproduct of the way you train the models. There's been some progress on that. At least the people who produce those models seem to be aware of those? And where do you think this may go? Yeah. So the first one, I think the concerns about electricity were a little bit overblown in that if you have one flight from Europe to the US or something, that's basically the same amount of carbon that it takes to train like a reasonably large AI model and like, you know, a very large one. And so I don't think there's a massive amount of impact that AI has on electrical and sort of CO2 and carbon emissions. And that doesn't, of course, mean we can't do better

Starting point is 00:51:45 and we should get electricity from green sources, which I think more and more data centers do. U.com is also carbon neutral with its workforce from day one that we started. And so there are certainly issues I care about. You could argue that if the brain can do certain things with so much less like electricity and sort of energy, I guess, then there's clearly still a much better

Starting point is 00:52:12 architecture to do this kind of computation, right? With many fewer flops and, or, you know, just electricity, energy usage in general. So I think, sorry, not flops, just like electricity slash energy. And so, yeah, I think there is, there's a lot that we can still improve on the architectures.

Starting point is 00:52:34 In fact, it's kind of sad, but right now we're mostly constrained on architectures that make, that can mostly rely on fast matrix multiplies, like large matrices, multiply them together. And if your model is very good at feeding large matrices into your compute stack, then you can train that model so much faster,

Starting point is 00:52:57 it's more efficient, and that hence is how we're thinking about models. And there are lots of other models like recursive ones or recurrent ones that have fallen out of favor largely because the compute isn't as efficient for those. And so it's kind of interesting. When people say, oh, we're searching for these generally I models, it's similar to the analogy of looking for your keys only under the street lamps. It's like you only look under the street lamps of the current compute and not in other places. And so that is quite constraining of a search space.

Starting point is 00:53:30 And then bias is indeed, I think, one of the biggest real issues that is facing AI. You know, AI is only as good as the systems of people, the societies as a whole, and as well as sort of the organizations that train it and the data that they're using. And so the same algorithm that can be used to classify, you know, very helpful things in medicine, like is there a brain tumor? Yes, no. In the CT scan, the same kind of ideas of convolutional neural nets can be used to discriminate against Uyghurs in China, right? And so it's, you know, the algorithms,

Starting point is 00:54:13 and this is an easy confusion for a lot of non-experts in AI, but, you know, there's sort of, I'm sure you know this, but I think it's important to mention for your readers, is like, just sort of the abstract algorithm, like a convolutional neural network type thing. And then you train that abstract algorithm on a specific task and data set, and then it becomes like, you know, a solver for just that particular problem. And sometimes those, that particular problem is very broad, like predict the next word in large language models, but sometimes it's very specific, like breast cancer, yes, no, in this particular pathology kind of cell sample. And so the abstract algorithm is very hard to think about in terms of biases,

Starting point is 00:54:58 because it doesn't have yet much of a bias other than it exists and has certain hardware biases and so on of how it was invented. But the actual trained model does have a lot of biases and there we have to kind of think about what that means. And really it's also hard to talk about that in all of its generality. I think you often have to look at specific industries, specific examples and use cases, and think about the ethics and also just the impact that it has and try to have some empathy with the users that are affected by that AI algorithm and think through all the use cases. And that's kind of in our case, why we have this massive AI system that ranks in very few milliseconds, all these different apps, but we acknowledge that we can be wrong.

Starting point is 00:55:45 And so we get people to say, yeah, if I don't like this result, you can still vote it and make it different. And that is one way of dealing with bias is you actually look at the impact that that model has on people and you let those people give very tight feedback loops to improve the AI and change it back. And that's kind of one of the many ways you can tackle that bias. Now, there's all kinds of interesting philosophical reasons. I don't want to go on forever, but I think there will be in the next couple of years, some very interesting ethical questions of, for instance, let's say, you know, I don't know the exact numbers, but for every 100,000 miles driven, like 10 people die in a car accident on average in this country.

Starting point is 00:56:27 What if you can say, and with an AI driving more and more cars, the AI only kills two people every 100,000 miles. And so it's a 5x improvement over people dying in car accidents with an AI. But now that particular company could be said to have quote unquote killed those two people. Right. And maybe there was a bias, like those people were wearing certain things or walking around at certain times of day or in certain kinds of areas. And there wasn't enough train data. So it's a bias issue too. Versus, you know, those 10 people, they were just distracted, fell asleep or texting or something. And you're like, ah, you know, we can't really do something and already laws,

Starting point is 00:57:10 you know, don't text and drive. Like, that's it. Now, if you, depending on, you know, your ethical stances and sort of beliefs, if you're utilitarian, you're like, well, there's an obvious improvement, 5X lower death rate. We're saving lives, like ship ship it. Let those AI run, obviously So I would rather let more and more people die every time, but not let this company kill people with their AI on the streets. There is, as I can see, no easy answer to this. I tend to think more pragmatically and statistically about these things. And if we can save lots of lives at an abstract level, that's better. But that doesn't help anyone who loses a loved one when the AI drove, right?

Starting point is 00:58:08 And they're still going to hate the AI and its guts and they're going to sue it to death and everything. So really tough, but important questions that we're going to raise the next couple of years. Yeah, well, I would say that if there's one takeaway from all of that, because we also have to wrap up, I guess, is that, well, it's not really all about the technology. I mean, at the end of the day, it comes down to making choices.

Starting point is 00:58:35 And we're the ones that have to really make those choices and we'll live with them. The only thing to do there is to really be aware of, you know, the technical background and the implications of your choice and have like an open debate, you know, as a society or where it is that you want to go, really. That's right. Yeah. Ultimately, AI is a tool. It's a very powerful tool. It makes us much more efficient. I think it will be a step function in human civilization, just like, you know, hunters and gatherers to agriculture and sort of the invention of fire, making things more efficient. And then the invention of electricity and engines and industrial revolution, all of that. AI is another step function of that. And it's up to us to use it in a positive way. And it could certainly, just like a hammer, cars or the internet, it can be used in very positive ways and can be used in very negative ways.

Starting point is 00:59:34 I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.

Orchestrate all the Things - You.com is taking on Google with AI, apps, privacy, and personalization. Featuring CEO / Founder Richard Socher

Award-winning AI research - check. Startup and enterprise experience - check. Venture capital and Mark Benioff backing - check. Is that enough for Richard Socher's you.com to take on Google? Her...e is why and how he aims to do that. Article published on ZDNet

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.