Programming Throwdown - 155: The Future of Search with Saahil Jain
Episode Date: April 10, 2023When it comes to untangling the complexities of what lies ahead for search engines in this age of AI, few are as deeply versed in the subject as You.com Engineer Saahil Jain. Jason and Patric...k talk with him in this episode about what search even is, what challenges lie ahead, and where the shift in paradigms can be found. 00:01:16 Introductions00:02:06 How physics led Saahil to programming00:07:20 Getting started at Microsoft00:13:39 Analyzing human text input00:22:22 The exciting paradigm shift in search00:29:02 Rationales for direction00:33:40 Image generation models00:39:55 Knowledge bases00:45:12 FIFA00:49:29 Understanding the query’s intent00:51:18 Expectations00:55:38 A need to stay connected to authority repositories01:03:45 About working at You01:08:18 FarewellsResources mentioned in this episode:Join the Programming Throwdown Patreon community today: https://www.patreon.com/programmingthrowdown?ty=hLinks:Saahil Jain:Website: http://saahiljain.me/Email: saahil @ you.comGithub: https://github.com/saahil9jain/Linkedin: https://www.linkedin.com/in/saahiljain/Twitter: https://twitter.com/saahil9jainRadGraph: https://arxiv.org/abs/2106.14463VisualCheXbert: https://arxiv.org/abs/2102.11467 You.Com:Website: https://you.com/Twitter: https://twitter.com/YouSearchEngineDiscord: https://discord.gg/f9jRFH5gHP Others:On Thorium: https://www.youtube.com/watch?v=ElulEJruhRQ More Throwdown? Check out these prior episodes:E143: The Evolution of Search with Marcus Eagan: https://www.programmingthrowdown.com/2022/09/143-evolution-of-search-with-marcus.htmlE94: Search at Etsy: https://www.programmingthrowdown.com/2019/10/episode-94-search-at-etsy.html If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/ Reach out to us via email: programmingthrowdown@gmail.com You can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM Join the discussion on our DiscordHelp support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
programming throwdown episode 155 the future of search with sahil jain take it away patrick
excited to be here for another episode. 155. I
don't know. I guess we can, we remark on the number every time, but that habit, that habit.
We should, we should hire someone to help write intros for us. Oh no, that's what we're going to
get chat GPT to do. Oh, I should hire a ghostwriter. I was listening to a podcast where they were
talking about getting researchers and then the researchers would basically feed them all their topics.
And I was like, we've been doing this a very long time, Jason, but we've never been that professional.
No, never happened.
I don't even know if we discussed it.
Is it true that musicians don't write their own songs?
Or is that just like a diss or something?
It depends on the musician.
Okay, that's a very good, very noncommittal answer.
Is it true that Stack Overflow writes all your code, Jason?
Yes.
All right.
Well, we're going to welcome to the show Sahil.
He is an engineer at u.com.
Glad to have you here.
Yeah, glad to be here.
So the way we always kind of start this off with guests,
you know, it's always an interesting story to learn a little bit about the different ways people
got into tech, got into programming. So do you have like a first memory, like a first computer,
or first time you did a programming problem, or was like the earliest thing that sort of
got you excited about technology? Yeah, that's a good question. So actually, I think compared to most
folks, especially nowadays, I probably got introduced to programming a little bit later
compared to most folks. I never actually did much programming in middle school or even high school,
for that matter. I think the first kind of introduction I got was really in college.
Really, I think what I initially wanted to do was become more of a mechanical engineer.
So I was, you know, in high school and stuff, I was really interested in physics. So I thought
in some ways I would maybe focus on, you know, going down the applied physics or just physics
engineering. But then I think I realized very quickly that I wasn't super good with my hands,
but I was much better at, you know, dealing with the world of abstractions. So I think I naturally
gravitated a bit towards programming. I think the first memory was working on a project where i was using
a very simple you know nearest neighbor classification algorithm to determine whether
a cell is malignant or benign okay i think that was kind of a really fun experience because i
think it showed me how you know useful and powerful programming can be and kind of a really fun experience because I think it showed me how useful and powerful programming can be.
And it was a really simple thing.
You just basically look at the different attributes of a cell and you match it to the nearest kind of cell you have in your data set.
And you can see whether or not that one was willing or benign and then classify this one.
And you can benchmark the scores.
I just remember that being one of my first core experiences.
Yeah, I mean, I think that's pretty good. You know, I think it's interesting, you're right,
a lot of folks do have experiences a lot earlier. And so the, I don't want to say
sophistication, that sounds insulting. The level of complexity of the thing that they get exposed
to when they start programming is a lot, a lot lower, just because, you know, when you're,
like you said, you know, a middle schooler or an elementary student, you know, you can be exposed to a lot of programming
ideas, but you don't have the math background. So for you to say, you know, so it's simple,
you look to your neighbors, I mean, even describing, you know, what the nine connected
neighborhood is, or your eight neighbors, and like trying to express how in a Cartesian space,
you know, one might be up or up into the right, you know, explaining that to someone, you know, I have a fifth grader. So explaining that to my fifth grader, like she
gets it, but she doesn't really get it. And so, you know, the complexity at which you can sort of
like, touch on the variety of subjects that programming really involves. I think it's a
great story. I don't know if you were doing an actual malignant detection thing or whether it was a sort of like a simplified cell version with attributes but either way i mean
just like a great a great topic oh yeah this is totally a toy problem okay all right it's not real
this is not real at all this is getting the first programming assignment type thing i was going to
be impressed you had like open cv open and you were like looking at microscopy slides and a dye
stain and trying to i was going to be like, that's a pretty high bar for first programming problem.
No, no, not at all.
No, no, that's still good.
But yeah.
And, you know, you touched on as well, mechanical engineering.
And I think it's often forgotten by me, like even when I was in school, mechanical engineers
still had to do a lot of programming.
They get exposed to it.
And even the CAD software today has a lot of scripting or procedural elements that aren't
that dissimilar. But as you said, I think there are some folks who sort of think they want to do
one thing and switch to the other or back and forth. Very, very common. So you you were doing
this first programming assignment, you got really engaged, presumably in a class where and then you
just sort of like decided to kind of like pursue that more, like take more classes? Yeah, I think at that time, I was still unsure. For me,
I think I, I probably decided a little bit later, even after that. But yeah, I think essentially,
I was kind of exploring different topics in parallel, I knew I was in the I was basically
an engine in the engineering school, which, you know, limited the options a little bit.
But I was very interested in energy systems and that type of stuff at the time as well, which I still am.
More as a side hobby, I guess.
But yeah, so I think, you know, I think I realized that, yeah, I guess, you know, programming is fun, just the joy of it.
And in general, building software.
I think the idea of artificial intelligence also always appealed to me.
I think that was really what kind of drew me towards programming is kind of the appeal
of, you know, building intelligent systems.
So that's really kind of the route that drew me in.
I guess we could hit the energy systems.
I'm not sure exactly what you meant, but I feel like we can summon a discussion about
thorium at this point, just do like the tech hipster topics like thorium ai like this is the
this is the classic thing so i don't know we don't have to we don't have to go in there but is that
energy systems you know you say you're in a hobby that's sort of like energy plant productions or
oh sorry i said intelligent systems oh intelligent systems oh i'm sorry oh never
mind i'm just excited i want to talk about thorium see that's the
all right cool never mind well i just ignored
that uh we'll keep going keep going on uh is that the same as tiberium no that's coming to conquer
yeah oh okay i i do remember talking to uh to someone else about thorium a while back
yeah i wonder what happened to that i remember there was a lot of
excitement there no no no i no. I can't.
I can't.
All right, ho.
We've got to keep going.
Yeah, yeah.
Look it up.
Look it up.
There's YouTube videos.
YouTube videos.
Thorium.
It's good stuff.
No, don't get near it.
Don't get near it.
But read about it.
It's fine.
So, yeah.
So you got this interest in programming.
And then did you end up, you know, when you sort of graduated school, did you take your
first job as like a programming job or wasn't sort of at that level yet?
Yeah. So when I was deciding, I was actually deciding.
I remember for, I guess, my first job, I was deciding between, I guess, a couple of startups and then Microsoft as well.
Some of them were engineering roles. Some of them were product roles. I actually ended up for my first role entering as a product manager at Microsoft where I was working on, I guess, cloud infrastructure and Office 365.
Oh, very nice. I don't know if it's still the case. People may not know. So we have a little bit differently than a lot of other software and that they have tight couple teams where you said like product managers, software engineers and at a time test engineers.
Is that still was that still the sort of arrangement that they had going or is that something that they've sort of gone past?
Yeah, I mean, I guess when I was there, which is also, I guess, a little bit of time ago at this point.
But I think it really depends on the team within the company.
I think at a big company like Microsoft, it's almost like you're working at a different company depending on what org you're on.
And the structures are radically different across different teams.
The team I was on, I remember we did have a little bit of that set up.
We had a lot of service engineers because we were more on the cloud infrastructure side.
So we had software engineers, service engineers,
and product managers.
Nice.
And so you were doing product management,
but still had your eye on wanting to do more of a programming role
and continue to do that?
Or how did it sort of shape up in your time in that role?
Yeah, so I mean, I've always been interested in building things
as well as in whatever form it may take. So. I mean, I've always been interested in, you know, building things as well as in whatever
form it may take.
So I think the way I've always viewed it is there's a bunch of different roles.
If somebody's interested in tech, there's definitely a lot of different ways to contribute,
one of which is engineering, but one of which is, you know, design.
There's a whole host of, you know, cool ways to contribute, even if you're not necessarily
inclined to code or to engineer
necessarily. But I think for me, I was always, you know, working on side projects or, you know,
interested in writing code in addition to kind of, I guess, my day job at that time.
So I think in some ways, I necessarily didn't really identify myself as a product manager as
an engineer, but more just, you more just in the spirit of building products,
whatever that may be.
And then I think I ended up again kind of returning
to kind of being interested in artificial intelligence.
So I think immediately after Microsoft,
I ended up becoming a researcher.
So I ended up, I guess, doing research at Stanford
in a machine learning group.
And I think that was a pretty important experience for me and kind of helped me, you know, better
understand what I'm interested in.
Nice.
That's a pretty big transition, right?
To work for Microsoft and then go to be a researcher at Stanford.
How did that sort of come about?
Yeah, so I ended up going to grad school.
So I think the way it came about was, I think in some sense, I was always interested in, you know, artificial intelligence. And I guess a little bit of at the time, I was also very interested in healthcare. And I still am. So I think healthcare and search, there's fascinating interactions between the two of them. And I think there's a lot of work to be done in improving health search. But really, I think I came in with the angle of I wanted to, you know,
use AI to improve healthcare. I think there's a lot of different ways healthcare is broken.
And I was also interested in language. So natural language processing has always been kind of of
interest to me. So I think I ended up doing research kind of at the intersection of the two.
A lot of it was, you know, how can we use, you know, language to improve health, whether that
means, you know, mining, you know, reports for label data to then train, you know, language to improve health, whether that means, you know, mining,
you know, reports for label data to then train, you know, computer vision models for healthcare,
to just thinking a little bit more about, you know, conversational agents in healthcare and,
you know, bias, etc. So I think that was kind of what drew me. It was more just an interest in
topics. I think the topic in that case was just artificial intelligence, natural language
processing, like deep learning, healthcare, those types of, it was a little bit varied,
but that's kind of the general. Yeah. And so what did you, what did you kind of pursue while you
were there? What kind of research were you doing? Yeah. So I guess I was doing research. Yeah,
I guess at the intersection of healthcare and deep learning. Okay. So I guess, you know,
some of the projects I ended up working on,
I guess maybe the first one was, it was a project called Chexbert,
which we ended up releasing.
And essentially what that was is it was a radiology report labeling tool
that, you know, used, at this time I think BERT was,
it wasn't necessarily new, but it was maybe like one or two years after BERT.
And a lot of the existing radiology report labelers were very heuristic based, which means they use like a lot of, you
know, hard coded rules, essentially. And they were being used to train a bunch of computer vision
models, some of which were being tested in hospitals. So the idea was that, you know,
we can very easily improve the quality of the labels, which will then downstream improve all
the other models that are trained on the labels using kind of some of the advances
in natural language processing.
So we ended up kind of developing Chexpert,
which we released.
And, you know, it was a great project
and, you know, it's been used by other researchers
for their projects in different ways.
So I think that was kind of one flavor of research.
The other one was building data sets.
So I think I gained a lot of appreciation
for, you know, how important data is in machine learning
and AI.
Like one of the datasets we built was called RadGraph.
Essentially it was basically a dataset of entities and relations and radiology reports
annotated at more of a fine-grained level than previous datasets in that healthcare
space.
And the idea was that it can eventually help train multimodal models when matched with computer vision images. So that was kind of a couple
flavors of work. And there were some other along those lines. Nice. So it may be obvious to many
folks, but humor me, it's not obvious to me. So Jason's background is more machine learning by
now. But you mentioned something here that I've heard. So I'll get your opinion on it, or maybe you can help me on this.
So you mentioned, you know, looking at radiology reports and doing natural language processing.
When you say that, do you mean, and you mentioned like sort of computer vision as well.
Are you looking at the sort of like scans of a radiology, like an x-ray?
Or are you looking at like the human text
that like a radiologist would enter? Yeah, that's a great question. And yeah, I think I was breezing
through. So no, no, no, that's okay. No, no, that's why I definitely have clarified a bit more,
because I definitely think it's not obvious now that I've said it. But essentially, I think what
I was looking at was, in this context, the human written text. So oftentimes, when a
radiologist, you know, looks at their patient or their x rays, they'll then write down clinical
notes. So in general, I was paying a lot of attention to clinical notes that doctors are
writing, and how to essentially structure that information. Because it's fascinating, there's
all these notes that we have that we've collected, you know, doctors have made over, you know, the
course of decades, and it's all very unstructured.
And when you have unstructured data that's in free text, it's very hard to use it for analytics, insights, training machine learning models.
So there's a lot of value we can just get by structuring a lot of the data in healthcare.
So that's kind of maybe the theme of the research a little bit is how can we, how can we leverage, you know, all these like decades of knowledge that doctors have inputted? How can we structure that and then use
it to, you know, be a little bit more algorithmic in the future? Nice. Well, I was going to tee up
a question about natural language processing and maybe large language models and like diverse
fields that aren't related to text. But this is also interesting. So we'll dig in on this one.
And maybe we can get back to that later at some other point, if it comes back up again. But so you
mentioned this, and you know, we hear this as well, a little bit in the in the machine learning,
this taking unstructured text, right? So a radiologist writes what they I, please correct
me if I'm wrong. But like, they sort of write in, you know, notes about, hey, I looked at this patient's chart, maybe they were given something that they were looking to try to
diagnose. And then they're sort of writing, it's all freeform prose, almost, you know, I guess I
would say, like, oh, I see that, you know, there's this thing here, or that thing there, it looks
like this, it could be that maybe follow up suggestions would be to do this or that, you
know, or, you know, I deem it's okay. And this is very unstructured. And you mentioned sort of making it more, you know, hierarchical.
You also mentioned sort of entities and relationships and sort of like some of these
modeling. And as you were saying, some of it may have been previously like heuristic based,
like searching, I assume for sort of keywords saying this keyword corresponds to that keyword.
Therefore in this chart, there's like this entity of, you know, a specific kind of growth.
I don't know the word. And it relates to this patient and this over here.
And you're sort of saying just giving it to a machine learning, you know, rather than heuristic and allowing it to just sort of build those relationships.
What does that end up is maybe that's too broad of a question. What does it end up looking like you feed them text and the training I assume is the sort of desired model output of like the hierarchy that you expect
from from that text, and you're asking it to sort of do the same thing?
Yeah, yeah, that's a good question. And I think in some ways, when I say structure,
clinical information, the North Star is to capture all of the meaning and all of the nuance in a clinical note.
But there's a lot of work until we can kind of do that.
So the way we started off was very simple.
And we, you know, used existing schemas that, you know, our lab had actually developed.
In this case, it was called like the Chexpert report and schema.
And really it consisted of like 14 different labels for different conditions.
So this was kind of, you know, formed with a lot of interaction between, you know, machine learning
researchers and also doctors. And it was basically, you know, a very simple task, you know, given a
report, what are the positive, negative or absent mentions of a particular medical condition? So for
example, pneumonia, pneumothorax, cardiomegaly,
these are all examples of different medical conditions. And there was maybe, you know,
a couple of labels, different labels that each of those conditions can have. And that was kind
of the setup of the problem. And then in terms of how we trained it, really, this is also kind of
another interesting thing is that we actually use a lot of these heuristic systems in order to
generate labels. And we actually train some of these, you know, more powerful models using kind of models
from or labels from simpler models. So this is kind of known in machine learning as generally
kind of, you know, weekly supervised learning, where you can kind of have, you know, noisy labels,
essentially, and you can learn off of them. And then you can have a stronger set of high quality labels that were, you know, devised by radiologists, and then you fine tune on those,
and then you end up getting better performance than the initial labeler that you even used for
training. So in some ways, it's kind of like a student teacher model, where the student ends up
ultimately outperforming the teacher. It's kind of one maybe way of thinking about it. Nice. Yeah, I hear that as like a recurring thing now where rather than sort of a homogenous
training supervised, you know, this data set in and then I just get my results out where like
you're kind of mentioning, you may have an earlier stage that uses one approach for training. And
then you mentioned sort of like refinement, but like, you know, kind of mixing and matching different training styles throughout the sort of larger, larger model seems to be a,
something I see at least as a, as a layman and not in the machine learning space as something
that I see as a recurring theme more recently. So that's interesting.
Yeah, yeah, no, I mean, it's a, there's a lot of interesting work going on. All right. So you're not at Stanford now. So something happened after that. So you did
your research, you got your degree, and then where did you go next?
Yeah. So I guess while I was doing research and I was becoming interested in language
and natural language processing, at that point in time, you know, towards, I guess, the end
and maybe overlapping a little bit, I had started working with u.com.
So our founders, Brian and Richard, were both previously at Salesforce,
which is kind of this, you know, I guess a big tech company.
And then they had kind of a vision of starting a search engine.
So I know Richard has been thinking of starting a search engine for a while. And he had left Salesforce
at that time with Brian. And I think I had noticed somewhere that, you know, that they, you know,
that they had, you know, posted about it. And I think I had reached out or maybe they had reached
out by some like alumni or whatever. And ended up, you know, joining and, you know, helping build
u.com. And, you know, there was definitely a lot of like, you know, interesting problems,
and there still is in the search space, which I think is what attracted me. So I think making
the jump from research to working at a startup was something that felt a little bit natural,
especially given the fact that it was such a, you know, ambitious problem. So I think that's something that appealed to me
is the idea of working on something that, you know, is hard. It's definitely not easy to build
a search engine, especially when, you know, it's a, there's a lot of people who, you know,
have a good product out there. Google's a great product. So, but we thought that, you know, we can,
you know, still provide something of value. So I think that's kind of what sold me a little bit on, I guess, moving from academia more into industry slash startup world.
Awesome.
So, I mean, I think this sort of have an idea of search, you know, the equivalent,
I guess, of your hotkey and then letter F for sort of like searching a document for text.
We even had an episode where we talked about, you know, how you would do search in databases
and this kind of thing. But we sort of not really covering the topic of search engine,
right? So something where you're going to a website and trying to index all of publicly or maybe even not publicly accessible human information that's up on
the web and allowing people to sort of find the needle in the haystack, right? Find the thing
they're looking for. This is my personal, I've no idea this is a very good definition of a search
engine. But this is a topic that I think everyone bumps up against at some time. Like you said, we've all used Google or, you know, another search engine and sort of, you know, put in your text. And like
you said, there's already, you know, you sort of mentioned your background was natural language
processing. There are many pieces to that. I can think of a thousand ways to sort of like start
into the conversation. So maybe I'll just, you know, kick it over to you. Like when you think
about this space, is there a sort of approach that you think about the sort of high level
components of, you know, sort of I type text in a box, or what is a text even look like or
formatting to, you know, I get, you know, a link or even just the information that I'm looking for
on the internet? Like, how do you either think about what it is today or think about where you want to see it going?
Yeah, yeah.
I think, yeah, there's definitely a lot of excitement
around search right now.
I think search is, in some ways,
it's been a little bit,
I wouldn't say, I think it's always been evolving
over the last couple of decades.
But I think right now is particularly a time
when we're seeing almost like a paradigm shift in search, which I can kind of get into in a bit.
But in general, like when I think about search, and I think you mentioned search over databases,
I think when I think about search kind of in this concept that you're talking about,
when we think about, you know, essentially, you know, search engines, I think one of the
differences is basically in the types of information that you're searching over and the goals.
So I think when you're doing something like a database lookup, you're basically looking over data that's already been, you know, very structured. And you're essentially kind of
looking for almost like a very procedural method of finding like an exact match or something.
And I think that definitely has value and is very interesting. In general, when I talk about search,
I think about, you know about the discipline most aligned with search
would be kind of this area known as information retrieval.
And in information retrieval,
I think there's a couple characteristics,
one of which is we're kind of doing search
over a large corpus of data.
So if it's a small corpus, it's not really a search problem.
A lot of search engine is the point is that
you're searching over kind of like an infinite amount of documents
or a very large amount of documents.
That's one characteristic that makes it a bit different sometimes than other settings.
And then the other one is that these documents tend to be very unstructured.
So for example, web pages are super unstructured, information about news, et cetera.
So in general, you're looking for unstructured information.
There's a lot of it.
That's basically the setting in which search operates.
And then in terms of, I think you mentioned, you know, search being about,
you know, you type in something, you get some links.
I think that's the way it's historically been.
And I think, you know, when we look at existing systems, a lot of times it's become kind of
very ad dominated.
So you end up, you know, searching, you get links and you get links with ads above it.
And in general, that's kind of in the paradigm.
And obviously there's been a lot of content as well. if you look at a lot of search engine, there's
knowledge panels, etc, with kind of extracted content, which is very useful. But I think what
we're going to start seeing is a move towards search being more about getting you the answer
and letting you do things. So I just want to bring a search engine, we think about, you know,
being a do engine, not just search engine, how can we let you do things to kind of achieve your goals? So that's kind of maybe one distinction that we
see search going from. So we kind of see it also as maybe going away from like a list of blue links
to a list of kind of, you know, different types of organizations of content. So this can be kind
of just giving you the answer straight up. You know, it could be in the form of a chatbot. So
we've been thinking a lot about conversation.
So I think we'll see search evolve in many different ways.
But yeah, I definitely think we're moving away from the, you know,
you search something, you get a bunch of blue links,
you click through a bunch of them,
and then you eventually find what you're looking for to being a more kind of
user centric, like streamlined experience.
I was going to just make a funny observation
that getting blue links isn't as bad as searching something
and getting the purple links.
And you're like, oh, no, I've already tried these.
Like, I need new links.
So, okay, so blue links is, I get what you're saying.
But anyways, I recall, like, I'm thinking back to,
you kind of mentioned this, like, if you have a small set of data,
it's sort of a different set of problem where small i guess is a is a bit of a relative term
but i recall like early on i guess you know i'm kind of old so like when i would first you know
sort of move from a card catalog at the library to like they had a computer and you could look up
not just what books were in your library but in ours ours, it's a countywide library where I was growing up. So actually, there was many branches, and you could search and see what books were in
stock across the entire library system in that county. And I recall one of the things, one,
it was pretty terrible. Two, like you had to put in text that was pretty close to what you were
searching for. And then you needed to tell it like, do you want it to search, you know, author,
title, subject, like literally in a drop drop down like what field are you searching and then they would give these things which at my time i didn't really know by boolean operators like i want this
and this or not this um and so you could sort of say like i want i'm trying to think of a good
example i want cooking but not food i don't know what that would be. But like, you know, I want, yeah, oh, shoes, but not sneakers.
So like, I guess dress shoes or, you know, brake pad shoes.
Are there shoes?
I don't know.
And so like there was this, you know, sort of Boolean, very structured, almost like programming
concepts, these logical concepts that you needed to sort of go in and put
down into the search box and then run your search. And you know, it would come back with inevitably,
basically nothing. But you know, you would try to get one thing. And then you were sort of saying,
you know, to moving to a more conversational. So I think over time, we've seen a lot of changes
where people have moved to, I guess, with sort of Google coming to be there, like,
you can type in a word. And it used to be before Google, like you would type in that word, and you
would just find websites that had that word repeated, like 1000 times in like white on white
at the bottom of the page. And they were just saying, like, frequency, like, which was the most.
And then, at least for me, like starting to use Google was like you type in a word and
that word didn't even need to appear in the page you were looking for. Like it seemed to try to
understand related words or concepts or, you know, what you were searching for. And to sort of I'm
giving my own narrative, but feel free to fill in gaps there. But like today, you sort of mentioned
now it becomes almost more even conversational. Like I'm not even just typing words, like I'm
typing whole questions or sentences in sometimes.
Sometimes it works well.
Sometimes it doesn't.
I feel like often it's just ignoring a lot of the context
you're trying to put in your sentence.
But, you know, that's sort of, I feel like myself,
even how I interact with it,
I guess the systems are training us
in as much as we train the system,
like giving you feedback about the quality of the links
and teaching you, like, you got to give me better inputs.
But I don't know, like, is there, do you feel like you mentioned this move to conversational?
Do you feel like we're at an end to this sort of like, I don't know what the right word
would be like, strict, I'm looking for this word or concept directly to appear to something
that's more akin to how we would ask a question to
someone behind the desk at a bookstore or, you know, someone at a university? Is that the sort
of like direction that you think we're kind of moving? Yeah, definitely. I think you pretty
much nailed it with your analysis, especially of the card catalogs and maybe the bookstores
you were looking for books at. I think that's a great example of kind of the general direction in which it's been evolving.
I think in some ways, you know, the reason why it's bad
is because it's a tough problem.
So, you know, when you have really, you know, long user queries
with ambiguous context and a lot of assumptions,
it can definitely be tricky in order to kind of surface
the right document or result for you or answer even.
But I think that's definitely the direction in which we're heading.
And a lot of this has been enabled by a ton of advancements that have happened in natural language processing over the last five this year's five six years um it's actually been
tremendous i think there's been like multiple paradigm shifts essentially in terms of you know
the way in which we can kind of deal with language uh with machine learning essentially
that have kind of opened up new possibilities um so yeah i think and if i were to give kind of deal with language, with machine learning, essentially, that have kind of opened up new
possibilities. So yeah, I think if I were to give kind of a simple example of how we're moving in
this direction, there's kind of this one concept in search known as semantic search. So you know,
oftentimes, I think what you're referring to, and a lot of classical information retrieval
is dealt around keyword search. And keyword search, no doubt is still even important today.
But the idea behind keyword search is that when you, no doubt, is still even important today. But the idea behind
keyword search is that when you look up something, you basically look for the exact word in a
document. And you can obviously do some math, kind of, you know, normalize over the length of the
document and other types of things. But now we're actually also, you know, moving into a world in
which semantic search is becoming, you know, more important and better, essentially, where you can basically have a user's query or question
and you can do what's called,
or basically a look in essentially
what we call embed the question
in a way in which you can kind of understand
the context a bit more
and then find similar documents.
So even if the keywords don't exactly match up,
but the spirit of the question
is similar to kind of the answer,
we could still, you know, surface those now.
And there's also been subsequent works,
you know, dealing with large language models, et cetera,
that kind of, you know,
even further pushed the possibilities.
But yeah, I think to answer your question,
that was a long-winded way of saying yes.
No, no, no, it's great.
Yeah, so I guess like this keyword versus, I mean, I don't have the right terms for it. So I think that was a long-winded way of saying no no that's great yeah so i i guess like this
this keyword versus i mean i don't i don't have the right terms for it so i think that was really
good thank you that that helps me put put words to it i guess there are still times where um you
can go to duck duck go and sort of like put in you know i want to search the cpp reference documents
for like i know the function i'm using i just always forget the order of inputs you know the
order of the input parameters or whatever.
So like I want, I know exactly what I want.
I want this, you know, document this, you know, thing,
but it's still, you know, I don't know,
first Python or first C++ or maybe tens of thousands of pages.
You can't, we don't even print out books when I, you know,
we used to have like, you know,
language manuals or whatever you'd flip through.
But now like you couldn't do that.
Like it wouldn't be practical for like the C standard library it'd just be gigantic uh or boost or
something um you know and so now like yeah we rely on still maybe that keyword search like i actually
do know what i'm looking for very specifically and i just want you to you know sort of find that
thing but i i feel like that uh when i'm, maybe that's true. But that's sort of the
exception. Often, like I was saying, even difference of not knowing semantic versus keyword.
A lot of times you were saying like nearest neighbors earlier, you may not know the term
nearest neighbors is like the academic way of describing finding the nearest cell to you by
distance for, you know, matching an attribute set. And so you try to go to the search
engine, and you're just trying to describe the problem, finding the closest thing to me algorithm.
And, you know, you're hoping that the search engine can sort of like, deduce that you're
looking for, like you said, a document that describes how to do that thing. And from there, that this concept becomes obvious of like nearest neighbor search, and then maybe you can refine or go back.
So that, I guess that's the semantics of it, like understanding what you're asking,
and then knowing the answer to that question and giving you useful results to that.
Yeah, exactly. Yeah, that makes sense. Yeah, I think that's a good way of kind of rephrasing
it in a little bit more of a understandable way. But yeah, I think there's a lot of value to be
gained by also knowing when to do each. I think in some ways, searches in some ways,
it's a very vague problem and it's a very all-encompassing one. So almost a lot of things
that we do can be rephrased as search problems. You could think of all of coding as being one giant search problem where you're given a goal and you have to figure out what to do.
And I think we're slowly going to kind of keep bridging that gap.
And we'll see search kind of expand its scope a little bit into more, you know, what we call do actions instead of just being giving you information, allowing you to do things and eventually even doing things for you. So, you know, one example, you know, might be, and this is something that,
and maybe this is kind of going in a little bit of an odd direction, but,
you know, typically, you know, in search engines, you are getting content,
but I think also now you can kind of, with a lot of advancements,
you could think about, you know, making it take action for you.
So I'll give you a simple example.
There's been a lot of advances in kind of image generation.
So I'm sure, I don't know if, I'm sure a lot of the listeners have played with some of these really cool tools out there.
So I think one of them would be things like mid-journey, stable diffusion.
There's these models that you can give it text, and it'll literally create a really high-fidelity image for you.
So when you type in a search engine now, or even in our chatbot,
something like generate an image of a cat playing the piano, you'll be able to get that.
So this is kind of a little bit of a wonky example. It doesn't really feel like a search problem.
But in some ways, I think we're going to start seeing a lot more of these types of commands and
actions happening within the context of search. And similarly within programming, if you say, you know, I want to, you know, implement, I don't know, a binary tree with this unique case,
you know, you may actually end up running into like a, you know, there may be, you know,
the search engine might have some type of, you know, code generation module running,
and it may just give you your code. That way, you don't have to search through a bunch of
documentation, make it yourself.
So I think what we're going to see is search is generally going to move a little bit more in this direction of allowing you to do things,
as opposed to giving you information that you don't have to read, synthesize yourself, and then do things.
So I think that's going to be another really interesting shift we'll see.
Yeah, I guess that's an interesting extension.
You said it might be a little bit of a weird direction, but I kind of see it as we were describing like a shift in the way that you provide the information to almost, I don't even want to call it a search
engine because like you said, it sort of moves beyond that, but like into the text box, I want
to put words or, I mean, to be fair, like I, we've been talking about text box and search engine. I
mean, the same is true for any of the smart assistants that you talk to verbally.
It almost becomes the same problem, right?
It used to be, you had a very formulaic, I need to speak in this precise way.
And they were all very like singular points of engagement.
Like I asked for one thing.
And if I ask again, there's no, no context.
We were talking about that.
Jason was mentioning that with chat GPT.
We were sort of going through it and sort of talking about the difference between remembering what you've been,
the context versus not remembering. And so I think when we go to the search engine text box
or the smart assistants, that not only is there context outside of that text, like that you've
previously entered or things that knows about you,ization is one of those but then you're sort of describing the that's like if i think about that as like the
left hand inputs but then now you're talking about the outputs it's not let me give you a piece of
information that already exists back so a stack overflow page or cpp documentation or python
diameters or your email maybe even like whatever it might be that you have in
your corpus of documents, you're actually saying now, and you kind of mentioned it happens sometimes
like sort of knowledge panels, or like image generation, where rather than saying, hey, I want
a picture of cat, and let me do an image search where there's obviously smarts going on for
knowing that this is a picture of image, there's the whole image processing stuff that this is an image of a cat to match to cat because obviously most of those
images don't have the word cat in them uh in the image itself so gaining that context but then as
you were mentioning going even one step further which is you know using a lot of the advances
we're seeing now with with a lot of these systems to generate that response that makes sense.
And so do you think that that's something where I guess one,
that's like pretty cool to like,
it could be problematic in that like as unauthoritative as the web already is
in ways like,
how do we sort of know that it makes sense?
And then is there like a thing where we feel that those machine learning
systems hang off of the text directly or is there some sort of like cleanup system that sits in
between which is is sort of how i think about it happening today like today we put text in
some sort of system sits in between to sort of like like you said do this embedding and then
search the embedding space right and sort of sort of do this cleanup. And so does something exist where I put in what I want and it knows how to queue the systems well downstream,
or did they just become end to end? We just have like these monolithic, I don't want to call it
general AI, but these things that just know how to answer questions, generate text. I want to watch
a Seinfeld episode, except they're on the moon. And it just like knows how to go create like,
you know, a movie for me that's like Seinfeld, but on the moon.
Yeah, this is a great question. And I think the answer is probably a bit of both. So I think in
some ways, you know, you know, we definitely should not end up, you know, moving away from
authoritative content. I think having content that has citations is very important. So I think,
for example, if you're looking up information, sometimes it's very important to, you know, read like kind of the details and the
source documentation, and to dig up answers for yourself, and not just kind of see what a search
engine condenses and gives to you. So I think that can also introduce a new form of bias that can
really be a little bit problematic. So I think, you know, the way I think about it is, it really
depends. I think search is such a diverse task. Even within search,
there's so many different categories of types of things people are looking to accomplish from
search. Some of those things, I think, make sense for generations. So for example, if you ask,
right now you ask a search engine to write me a poem. I think in that case, it's okay if it
doesn't necessarily cite, if it's giving you you ideas or if you ask questions like you know when like
who won the world cup in that case if you can authoritatively just kind of say Argentina that's
that's the correct answer and that's okay to kind of show that there and obviously citations are
are very important but I think in some ways it's also important to you know surface documentation
and really ground truth material that allows people to kind of make sure they
have, they're in touch with kind of reality in some ways and not just, you know, immediately
ingesting anything an AI tells them.
So I think that's the great balance.
I think that's something we're trying to find as well is how do we balance the need for,
you know, the user's desire to get quick content with kind of allowing them to be able to dig
into facts themselves and have trustworthy information, like citations.
So I think that's kind of maybe a balance we're thinking about
is how to integrate the two of them together.
And the way we think about this is also that I think knowledge bases
are still going to be very important in the future.
So even if we live in a world where, you know,
it seems like ChatGPT is able to answer all sorts of things,
in some ways it's still going to be very important
to make sure that, you know, we have knowledge bases that these AI systems can interface with.
And I think that's ultimately going to be the way in which we solve this is that you'll have, you know, you'll ask ChatGPT something.
Or in this case, you know, maybe let's not use ChatGPT, but like with u.com even, like you ask us something and we'll use a lot of ai to kind of generate the answer for you and
but but hopefully the generation will hopefully be backed very strongly by you know knowledge bases
and authoritative information that we'll give to you with citations and that's kind of the north
star i think for me at least is and for you.com at least is how to kind of bridge that gap so i
think what you you pointed out like a really important i think one of the toughest problems
in search right now and in generated i in general is, you know, what is the balance there?
For what use cases is it OK to have pure generated content with no citations?
When do you need citations?
You know, to what degree do you need to surface the citations?
To what degree can you adapt them and paraphrase?
So these are all really thorny questions that I think we're going to have to come to grips with.
Yeah, it's a bit of a
side tangent. So I'll, I'll indulge it briefly, and then we can return. But one of the things
when you were sort of talking there about citations and authoritative and not to be like,
dystopian and say, it's very actually difficult to say what's true. So you can say, Oh, who won
the World Cup? Well, even that, I mean, we could we could get into it but like i guess at a surface level it's pretty straightforward to sort of like find someone
who generally people would agree is authoritative like oh i could go to the fifa website and the
fifa website is determined to be the like source of truth for this information and so whatever they
say and let's just ignore the fact that they could get hacked or whatever but like there are other questions that are just
very difficult to answer or maybe uh contentious or become political i don't i don't want to give
any examples to like it'll color the conversation too much but i think there are just certain you
know questions that you ask where actually it's very difficult and not to to tag another you know
i guess like mean thing to talk about cryptocurrency.
But I was reading about sort of how oracles work in cryptocurrency systems where you want to do like a betting market.
So I want to bet on, you know, who won the presidential election or who's going to win the World Cup. But of course, they like what system is going to provide the answer.
And how do you resolve disputes where someone says, no, actually, that that wasn't the answer.
And so sometimes the question itself could be ill posed, like it's not answerable in its current form or
not uh undisputedly answerable and so how do you have this like staking of reputation or in this
case money or you know coins in order to it's just a very interesting way to sort of start from a
very decentralized i trust no one, how do we,
and then it just ultimately ends up boiling down to a vote and who's willing to like,
put their sort of like resources, their coins behind, you know, one side or the other,
and hope that they're, they sort of understand the, what the general answer is going to be.
Again, not trying to get into the cryptocurrency discussion per se, although I'm happy to go there.
But I think just this, this thing you mentioned that we're going to with these systems of
text generation, search engines, like many systems are going to be trying to give answers that some
people aren't going to listen to the warnings, and they're just going to take as truth. But even how
do they train themselves and what is true? And whatever source you rely on,
people can become an attack vector
where people attempt to overwhelm that source of truth
to like say something else, right?
So, oh, the FIFA website, right?
Like we said, like it could get hacked.
You could try to like, you know,
say that the FIFA website is somehow biased.
We don't need to go in there.
The bribery scandals, whatever, right?
Like, oh, you can't actually trust them. You need need to trust this other thing and i think it's just very interesting that this
somewhat subtle question like the hard problem seems to be just how do you give an answer but
then when you peel back the onion layer once and say like okay but what is the actual answer uh so
even if we all agreed on what the question was are we actually able to agree on the answer so again So again, I know that's a bit of a side tangent. But just when you were sort of saying
it leads me to this, like, so exciting to see this progress where you can type in something,
and a lot of times it's right or really close. And then you go that one step further, like,
oh, wait a minute, hang on, there's like some still like foundational level questions where,
you know, it's sort of is that Godel, the incompleteness theorem?
Like you can't actually have like fundamental axioms that like describe your
entire, like some things are sort of like not provable.
And this is somewhat dissatisfactory that you can't build an entire, okay.
More tangents, but that's just like what tipped me off when you were sort of
talking.
Yeah. Yeah. I mean, those are, I think you made a lot of good points. And yeah, I think in some ways, again, it's like a balance because
we have to make assumptions about the world. You know, we do this every day that could be wrong.
I think the same is true of search is that, you know, when we provide an answer, we will have to
make assumptions. Like, for example, if you ask who won the World Cup, we have to base that on some type of information, especially, well, I guess, yeah, in order to give an answer.
And then the sources we choose, like, for example, you mentioned the FIFA website.
In general, we can assume it's quite reputational. There's other websites, news outlets, etc.
And I think in some ways, you're right that we do have to make assumptions there.
And those assumptions can often be wrong, right? Things can be changing.
I think one of the key kind of ideas or principles is also to make sure that information can
be cited as much as possible.
So for example, the statement, say that FIFA was hacked, the statement, Argentina won the
World Cup would then be wrong if it turns out that France actually won. But the statement, according to FIFA.com, that's self contained, and it's still somewhat right. So in some ways, you know, you can at least provide like, like answers that are rooted in independent context, where the user and, or, you know, yeah, I guess the user in this case, would look at that answer. And even if it's wrong, it would be kind of self contained
in its wrongness. And all the assumptions would be, you'd be able to investigate them within the
answer. And I think that's kind of in general, the importance of citations is being able to
kind of dig in and understand. So if you example, if you look up, you know, what is the nutritional
value of a banana? Or how much protein is in a banana? And we just spit an answer out at you. And that answer could be,
I think in some ways it's contested, right?
Like maybe it's not clear exactly how much protein is in a banana.
Which study, what, what, what area of the world are you in?
Like the bananas are different depending on where they're grown.
So there's all this like nuance, right? So you have to make assumptions.
And it seems like kind of one way we get around that is by trying to provide the sources so that if the user wants
to dig in more, they can. Oftentimes, people won't want to, they just want a very rough answer.
And it's okay. But you know, you don't know that ahead of time. So I think I think that's one kind
of, you know, line of thinking that we're going about. But it's tricky.
I think in general, it's somewhat of a historically unsolved problem.
And it'll continue to be kind of a big issue in the future.
Yeah, this sort of escape hatch you mentioned is really interesting.
And people like to debate this explainable AI, right?
That being able to tell.
So in your case, you may not know how it got it right? That like being able to tell. So in your
case, you may not know how it got it. It may not be able to explain how it got to the answer,
maybe not important, but it's giving you the equivalent of what you would have expected
before, which is I click on, I click on the link and I go to the FIFA website. And like you said,
you sort of get out of it because, Hey, I use this information and I gave it back to you. And
I maybe gave it to you in a more palatable form. I wonder if that adds complication.
And then like, but you mentioned sort of moving beyond, which is like, oh, I want to generate
me the nearest neighbor's code with a limit of, you know, 50 meters with the data set.
I mean, I could just go on and describe a problem and we could get generation that may
touch on, you know, a hundred different, you know, GitHub repositories or whatever.
Right.
And so this sort of explanation there becomes a
bit tricky. You can give your sources, I used all of these inputs, here's a list of 100.
But if I go to any one of the 100, I'm not really able to fact check. In the code example,
it's a little easier because one, hopefully you write unit tests or you can compile the code and
make sure it works, which isn't always true out of the systems today.
But like, you know, you can kind of go through that.
But the same topic sort of appears, right?
Which is like, you just mentioned, like, maybe a poem is subjective, so it's a little harder.
But when you sort of touch across many things, does that end up becoming, I don't want to say like a limitation,
like forcing the system to be required to sort of cite its sources? Does that, it adds
an extra step. Does it end up becoming tricky or an issue itself? Like how do you even rank those
sources? How do you, like what order do they appear in? What's the correct or, you know, best order
becomes like yet more things that the system has to do? Yeah, yeah. I mean, I think you're right
that I think it is strictly and somewhat, it's a harder problem to provide a generation with citations and to provide a generation that's somewhat ungrounded. So I think you're right that it is somewhat, it's a harder problem. But I think it's a problem worth solving. And it's something that I think we're interested in kind of going deep and trying to go dig into. I think the other thing is,
you know, I don't think all, I think it's important to be able to understand the intent of the query.
I mean, I know when to kind of use which system. So I think, again, I think when it comes to search,
these discussions can be kind of complicated because of the fact that when we talk about
search, search is not really one, you know, single constrained task. It's really a set of so many
different types of tasks,
some of which are frivolous,
some of which are extremely important,
some of which are even borderline,
like life or death situations.
And I think all of that is compassed within search.
So I think we basically need to be very good at deciding when to engage with one technique
versus the other. And I think
the other important piece is around user expectations. So I think when it comes to
search, I think it's important that users have the right expectations when they use the product.
So I think like if you're in a life or death situation, I would not suggest that, you know,
you ask ChatGPT what to do. Definitely good advice.
Yes. I don't think open AI would suggest
that either. So I think, you know, in some ways, it's really important, no matter how good
a product is to always have the right expectations with it. Like if you're feeling sick, or you have
some issue with health, I mean, you should probably still see a doctor. Obviously, you know, there's
amazing tools now and then search can be very helpful for it. But at the end of the day,
having the user expectation should be that,
you know, you should probably trust your medical professionals.
Maybe one day it'll be true that,
you know, we'll be so good
at doing certain types of things with search
that that won't be possible
or won't be as necessary,
but at least right now
and far into the foreseeable future,
I mean, it's the case that we should trust,
you know, healthcare professionals.
So I think in some ways,
like the expectations are also important. So, you know, people should know how to use search. I think
that's always been the case for search, right? Like we don't, we use Google for very specific
things. Like if you ask Google, should I accept this job or not? We know that what to do. So it's
important to kind of have the right expectations and using a product. And it's also the, the creator
of the products that are, it's also the creator of the products.
It's their responsibility to make sure that the expectations are communicated appropriately
so people use it the right way.
I think with AI, we're going to see this happen a lot more
where there's going to be a mismatch
between kind of expectations
and what the product can deliver.
And that'll lead to some issues.
So I think we're trying to think hard about,
you know, making sure that we can promote
responsible usage of whatever we build. So I just wanted to jump in and ask a question about the structured data and and how to
kind of incorporate that into something like chat gbt and we've seen a lot of articles talking about
like oh i asked chat gbt what seven plus three is and it said it was 13 you know and so it's it doesn't like uh it's its own self-contained
neural network which which can't rely on things from the outside so it can't go and ask wolfram
alpha for an answer right and so i'm wondering like how do you where do you see that going like
can you like how could we train something which knows, like to start writing
text, and then at some point to stop and say, I need to make a SQL query, and then put the result
of a SQL query there and then keep writing text? Like, how is that? Where do you how do you think
it's going to unfold? Yeah, that's a great question. And I think in some ways, it depends
on what the purpose is. So the purpose
of kind of the model and the user expectations around it will determine to what degree you need
to do that. So I think, you know, in some cases, I think in some ways like search can be kind of a,
it's a hard problem. In some ways it's definitionally impossible to be perfect at
because there's so much ambiguity in user requests as well. And you never really know somebody's true intent when they're asking a question,
but you can always try to approximate it. But I think basically going to your question around,
like, I guess, if you rephrase it, I guess you were basically asking what again? Oh, yeah. I was wondering how, like, right now, ChatGPT, you know, it just keeps calling the neural
network.
Yeah, it gets like, okay, the dog jumped over the fence.
I'm wondering if it could get to a point where it could say, you know, yeah, the net worth
of Shaquille O'Neal is, And then instead of just calling the neural network again,
it knows at this point that it needs to like make a SQL query to like net worth database
or something like that.
Yeah, yeah.
Okay, sorry.
I remember I got lost in a tangent.
So thank you for bringing me back.
No, it happens to all three of us.
But basically, I think what you're talking about
is something that maybe I had mentioned
a little bit earlier about knowledge bases being important. So I think it's okay. So it depends on what ChatGPT's goal
is. So in some cases, you know, they have an API that they allow other people to use,
and they don't necessarily need to solve that problem. Maybe they will. But it does seem like
it's the responsibility of people building the products to know when to use a tool and how to
use it. But I think the way, you know way I think about it at least is that it's going
to be really important to know when to plug in knowledge bases. So if you're looking up what is
the stock price of Microsoft, it's very unlikely that a neural network will be able to answer that
question properly. Maybe one day if it has a live feed plugged in or whatever. But even then you need some component.
There needs to be something that's kind of monitoring the real world and is
aware of events that are happening.
So I think it's kind of clear when you look at it that way,
that knowledge bases are important and we need some way of kind of having AI
interfacing with them.
And that's kind of maybe what I was talking about earlier when, you know,
I think that, you know, the way that we're going to go about this,
and I think that maybe the optimal way
is to really think about how to combine
some of these advances in AI
with a lot of the knowledge that we've collected,
crawled, and kind of indexed as a search engine.
So I think we'll need a mix of both
and we'll need to know essentially
when to draw on one versus the other
and how to basically combine a lot of the advances in generative AI
with a lot of the advances in traditional
and kind of, I guess, neural information retrieval.
So I think it's a hard problem.
But yeah, I think you're right that I think that it will be important
to connect to valid authoritative sources of information
so people can trust the content that we provide.
Yeah, that makes sense. I think the hardest part is on the, on the, um, um, like data curation side,
you know, if you're going to, for example, scrape Wikipedia, well, Wikipedia, you know,
humans went in and wrote like Shaquille O'Neal's net worth is X, but they, they didn't make X like,
uh, some kind of token that is updated.
It's just someone maybe looked it up on the internet, found it,
and then hard-coded whatever the dollar amount is.
And so we have to somehow reverse that process.
We have to say, okay, this sentence from Wikipedia
or whatever our input corpus is says,
Shaquille's net worth is
I don't know like 100 million or something and we have to be able to say okay that number actually
like there's a way I could get the latest version of that number like that number actually represents
some kind of structure yeah I know that OpenAI spends a ton of time and energy on you know
curating the data set and And I think that's going
to be even more important going forward. Yeah, I agree. We also think a lot about
curating data sets and having really good data to kind of build AI on and eventually,
I guess, conversational search and search for. And I think the way we think about this at least is we have these ideas of u.com apps within our search engine. And, you know, a lot of those apps
are, you know, ones that we've built in partnerships with other companies. We have,
for example, a Stack Overflow app that, you know, provides kind of trusted programming content
that you can use to kind of supplement kind of generated code content.
But I think in general, we also have this idea of like an open platform where, you know, we can have other people plug in
and build apps with their own data.
And eventually that will be used in conversational search as well.
So I think when it comes to data curation, it's important.
And I also think it's important that it's opened up.
So there's not just one search engine really, you know,
controlling all of kind of the data that's being curated,
but it's really kind of an open community in some ways that is, you know,
providing data and, you know, updating the quality and users can kind of,
you know, basically prioritize which sources of data that, you know,
they like and they think is trustworthy.
So I think that maybe went a little bit further than what you were suggesting, but I think it's
a great point about data creation. No, it makes sense. That was a fantastic answer.
So we're getting a little close to the end of time. So what I was going to just give us,
we referenced this conversational search a couple of times. So I just wanted to give an opportunity
to talk about that before we sort of wrap up. But if I sort of take the words for what I assume that
they mean, conversational search, where you're having more of a conversation rather than the
traditional posing of search queries we sort of worked through. And then when you say conversational,
I always think about it as two ways. So it's like, I'm asking a question, I'm getting some answer,
and then maybe I'm giving refinements or feedback or additional context and getting right, you know, it becomes a two way dial.
Am I am I in the right in the right vicinity of what we're talking about?
Yeah.
All right.
Awesome.
And so for that, I mean, in some ways, I could see wanting to do that on some days and sometimes
not.
But I guess like if I start thinking about hard problems there and then, you know, maybe
you can, you know, follow up with other hard problems or tell me that mine aren't really that hard.
If you sort of have this ongoing conversation, sometimes I won't say that it's my wife, but definitely my wife.
When I say, you know, have some conversation, she's and then the topic switches and you're not aware the topic has switched.
Right. And so this even for I won't say full functioning adults, but like for humans,
I guess, to say generically that, you know, two humans conversing, the topic can switch
and keeping up with the topic switches is actually a challenge, right? Not even that you're paying
attention, despite whatever someone might say, like you're definitely listening, you're definitely
paying attention. And the subtleness of a context switch, a topic switch, can be very difficult to discern.
Yeah, I think this is definitely something we're thinking a lot about and trying to improve. But
you're right, when we think about conversational search, one of the main differences between
conversational search and normal search is that with normal search, you're typing in a new query
essentially every time. And you're starting from scratch. You can't really leverage context that you have done previously and you have to basically make sure that whatever
you're typing in is self-contained and contains all the assumptions that you know you've learned
along the way um so you know this is kind of often i guess there's different like you know
there's technical words for kind of this this type of work but um i guess one of which is query
rewriting is how do you essentially
write the query?
And this is what we do all the time when we're doing search, we're rewriting our queries
in order to include more context, to include less context, in order to get the results
we want.
With conversational search, you're essentially doing a chat and then you're doing another
chat and you kind of expect the previous chat's context to apply or not apply, according to some assumptions.
So, for example, if you look up what is the stock price of Microsoft and then you look up what is its price earning ratio, you're going to assume that you want the context from the previous chat to be relevant.
But now let's say you suddenly say, who won the presidential election in, I don't know, 2008.
You don't really need the context to apply and you wouldn't expect it to be related to Microsoft or anything. And the chat bot and
conversational search should be able to handle that for you. I think it's a North Star goal.
There's definitely cases where the context becomes really ambiguous. And it's ambiguous whether or
not you're starting something new, or you're referring to kind of
things that you've talked about maybe even two or three previous chats ago and this is even
chart for humans so it's definitely going to be hard for any conversational agent but this is
kind of an active area of research that we're working on is how to think a lot about um you
know being very good at um you know the conversational flow yeah i feel like so thinking to your example like i think
you said 2008 like who who was elected president and 99.9 of people who say that those words want
to know the like presidential election but the prior that you were just talking about microsoft
could make it ambiguous if microsoft also held a shareholder election for president in that year
right and so i i don't i don't think they did like let's just
say they did right like the term could apply although awkwardly and now you're forced with
one like making sure that the chat understands there's two potential things and then how do you
communicate that to the user in a like not uh incongruous way that like i i'm not sure which
of the things you're talking because then it's super obvious right they know which one they intended but now
all of a sudden you're stuck with a dilemma of either
saying a non sequitur like oh
well you know so and so won the US
presidential election you know
and they're like well I was talking about Microsoft
like it's a super like
yeah like you said assumption is the word there
there's a lot of assumption that you didn't necessarily
put down and so how could anyone or any
system have known it?
Yeah, it seems like a very difficult problem.
Yeah, yeah.
No, I think you made it.
Yeah, well put.
There's definitely assumptions that need to be made.
And I think we have to make those assumptions clear as well.
So if we're giving you the answer for the presidential election,
there should be kind of some context in the answer that points that we're talking about the presidential election.
Oh, that's even better.
You just don't get the name.
Yeah. Wait, I didn't know he was president of microsoft so well awesome all right so i mean that was a pretty like whirlwind tour through a variety of
different topics but i definitely learned some stuff so you know i had a super super good time
we always like to give people a little bit of opportunity at the end to talk about you know
sort of the company they work for sort of the culture there are you hiring um i can tee
them up one by one you can just sort of go whatever you want but like let's start off like
you know you mentioned you.com a few times you've been there for a few years now like how is it like
working there you know would you recommend other people to join maybe that's a tf but like uh tell
us a little bit about the the and the culture. Yeah, definitely.
So, yeah, I mean, I think working at u.com has definitely been, I think the best word
to describe it is like an adventure.
And I think we're, you know, still evolving and trying to figure out how to, you know,
technology is changing, user needs are changing all the time, and how do we best meet those
needs.
And I think it's kind of an exciting space, search, conversational search, et cetera.
So essentially, we're focused very deeply
on these types of questions.
If you find them interesting, definitely reach out.
I think our culture could probably be described
as one that is collaborative, but also fast moving.
So we definitely work at kind of the face of a startup.
So we're very much focused on iterating fast, learning fast from our users and, you know, really, you know, building something. So I think that's kind of maybe one along with kind of collaboration. points yeah i mean if you're looking to to join or reach out you can definitely reach out to me
at i guess sahil at u.com so s-a-h-i-l at u.com we'll have that if you didn't want to chat about
search topics you can also feel free to message me i'm more than you know happy to chat about
anything uh we've said chat too many times people are going to be confused what you mean
you get to give more no i'm just kidding uh so yeah i mean are you guys you guys have like uh physical locations are you doing most of virtual where are you guys at
yeah so we're remote so we're fully remote um yeah since the start nice did anyone at your
company move somewhere really exotic we recently just had someone move to hawaii
oh really um i don't know i don't think anybody has necessarily moved anywhere exotic,
although we do have employees like, you know, different places. Some of them, you know,
outside of the country as well. Okay. All right. Yeah. It's, it's tempting. I mean,
you could just go to Hawaii now. It's never didn't really think about it until someone said,
I'm in Hawaii now. I said, Oh, yeah, okay, that's interesting. Yeah, I mean, now it could be I'm on vacation slash working or like,
I live here now. But yes. Yeah, definitely. There is a temptation, I think, for being displaced. I
think a lot of companies, you know, to dwell on that, but I got, you know, in most companies,
there's at least more flexibility or openness to that, even if it's only part time. And I really think that that's something that people should use and find new ways, it'll be
a little difficult, right? Like, if I go on vacation with my family, they're normally used to me being
present with them and on vacation the whole time. Now you have an option, like you could take,
you know, two weeks somewhere else, you know, and visit and, you know, have them exposed,
but you might be working part of the time. I think there's some interesting
sort of like calibration
for everyone to do to each other.
But I definitely think that's going to be
an opportunity, at least for me,
like I hope to be able to leverage.
And I think a lot of folks will as well,
like being able to,
even if your culture is fully remote,
making sure you're not just remote
in your house all the time,
like taking opportunities
to be remote in other places.
So yeah, I think that's
pretty awesome. Yeah. Anything else about you.com you want to, you want to sort of pitch, Sahil?
So I assume interns, full-time, anything else that you want to kind of like encourage people
to check out? Check out the website, of course. You know, I went, you can just go, you can try
some of these things we were talking about today. They're already up there on the website.
Yeah, yeah. No, I would basically suggest, you know, if, you know, yeah, definitely
interns are interested or full-time engineers, designers, marketers, really anybody is welcome
to kind of reach out. And I think there's definitely a place for a lot of people here,
especially people who are interested in, you know, making conversational search and search in general even better.
And I think the other, yeah, I guess the other aspect would be to definitely check out u.com, check out UChat,
which is our conversational search.
And also if you're interested in kind of being part of the community,
you can also join the community.
So if you go scroll to the bottom at u.com,
there's a join community button.
You can click that and we have a Slack group of kind of different users.
I guess we call them beta users,
but at this point they're, I guess, all sorts of users.
And you're definitely welcome to kind of contribute,
share your thoughts, ideas,
and yeah, be part of the community.
Provide more training data for conversations.
No, no, no.
This is not about that. Yeah, sorry, no. This is not about that.
Yeah, sorry, sorry.
I shouldn't say that.
Definitely.
And that's you.com,
just so no one types the letter U.
You.com.
And yeah, check it out.
Well, thank you all for being on the show.
It's been a great time.
I've really enjoyed talking about this.
And thank you to all our listeners
for hanging with us another episode.
And we'll see you next time
music by eric barn dollar programming throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license.
You're free to share, copy, distribute, transmit the work, to remix, adapt the work,
but you must provide an attribution to Patrick and I, and sharealike in kind.