Microsoft Research Podcast - 090 - HCI, IR and the search for better search with Dr. Susan Dumais
Episode Date: September 18, 2019Dr. Susan Dumais knows you have things to do, and if you need help finding stuff to get them done (and you probably do) then her long and illustrious career in search technologies has been worth it. S...ituated firmly in Louis Pasteur’s quadrant of the research grid (the square where you answer “yes” to both the quest for fundamental understanding and use-based applications) the Microsoft Technical Fellow, and Deputy Lab Director of MSR AI, has made finding information the focus of her career, and has probably made your life a little more productive in the process. Today, Dr. Dumais tells us how the landscape of information retrieval has evolved over the past twenty years; reminds us that queries don’t fall from the sky but are grounded in the context of real people, real events and real time; talks about her current interest in non-web-based search (or how I can easily put my hands on my own digital belongings) and reveals what apples and Michael Jordan have in common with search research.https://www.microsoft.com/research
Transcript
Discussion (0)
I think more and more information retrieval is moving from helping people find information to helping people get things done.
I've spent a lot of my life thinking about search.
It is nobody's end goal.
You don't get up in the morning and say, I'm going to search for the next two minutes.
You're trying to accomplish a task.
And search is a means by which you do that.
And I think we shouldn't ever forget
that. You're listening to the Microsoft Research Podcast, a show that brings you closer to the
cutting edge of technology research and the scientists behind it. I'm your host, Gretchen
Huizenga. Dr. Susan Dume knows you have things to do.
And if you need help finding stuff to get them done, and you probably do,
then her long and illustrious career in search technologies has been worth it.
Situated firmly in Louis Pasteur's quadrant of the research grid, the square where you answer yes to both the quest for fundamental understanding and use-based applications,
the Microsoft Technical Fellow
and Deputy Lab Director of MSRAI
has made finding information the focus of her career
and has probably made your life
a little more productive in the process.
Today, Dr. Dumais tells us
how the landscape of information retrieval
has evolved over the past 20 years,
reminds us that queries don't fall from the sky,
but are grounded in the context of real people,
real events, and real time,
talks about her current interest in non-web-based search,
or how I can easily put my hands
on my own digital belongings,
and reveals what Apples and Michael Jordan
have in common with search research.
That and much more on this episode of the Microsoft Research Podcast.
Susan Dume, welcome to the podcast. Thank you, Gretchen.
Listen, I've been waiting a long time to get you on. Way back in 2017, Eric Horvitz said,
you got to get Susan on the podcast.
I guess you're kind of like a hot Manhattan restaurant.
You have to book two years out.
Well, it's finally come true and it's fun to be here.
I like to start by situating my guests and their research.
So let's get situated.
You're a Microsoft Technical Fellow and the Deputy Managing Director of Microsoft Research AI. And your work lives at the intersection of information retrieval and human-computer
interaction. Actually, as we've noted, it's a much larger intersection than that, but we'll
keep it at those two roads for now. And you have more papers, patents, and honors than it would be
prudent to list in a half-hour podcast. But it's worth noting that there's a common theme running
through all the accomplishments and accolades.
So tell us in broad strokes, what's the driving motivation behind the work you do and why you do it?
What gets you up in the morning?
I think there are two commonalities and themes in my work.
One is topical.
So as you said, I'm really interested in understanding problems from a very user-centric point of view.
I care a lot about people, their motivations, the problems they have.
I also care about solving those problems with new algorithms, new techniques,
and so on. So a lot of my work involves this intersection of people and technology, thinking
about how work practices co-evolve with new technological developments. And so thematically,
that's an area that I really like. I like this ability to go back and forth between
understanding people,
how they think, how they reason, how they learn, how they find information, and finding solutions that work for them. I mean, in the end, if something doesn't work for people, it doesn't
work. In addition to topically, I approach problems in a way that is motivated oftentimes
by things that I find frustrating. We may talk a little bit
later about my work in latent semantic indexing, but that grew out of a frustration with trying
to learn the Unix operating system. Work I've done on email spam grew out of frustration in
mitigating the vast amount of junk that I was getting. So I tend to be motivated by problems
that I have now or that I anticipate that our customers and people will have in general, given the emerging technology trends.
And I approach it not just from a use-based perspective, understanding situations that will likely happen, but also try to generalize a bit and provide a more theoretical and generalizable foundation.
Donald Stokes wrote a fascinating book about basic science
and technology innovation. And he talks about Pasteur's quadrant, which is use-based fundamental
research. And I characterize myself as living in Pasteur's quadrant.
That's a good place to live.
Yeah. I love the idea that you talk about things that frustrate you and you want to
solve them because if it frustrates you, it's probably frustrating me too. And so I'm glad to know that yourate me. I try to understand how broadly applicable those ideas are. But there are things that frustrate me that if I spent a career solving them would not benefit lots of other people. But my work is really very much motivated by pain points that, let's do a little then and now on the search landscape.
Because contrary to what we experienced today, high quality search results were not always a click away.
So give us a snapshot of the field 20 years ago and tell us how things have evolved in part because of the work you've done over the ensuing decades.
Yeah, you're absolutely right.
If you're under 20 years of age, you have probably not lived in a world where you don't have at your fingertips access to an increasingly broad set of information 24-7.
Even in, we'll say, the mid-90s, the first web search engines were just starting. And by web search engine, I mean a system that crawls for content, indexes that content, and provides it in a browser. We clearly had libraries. We had library
catalogs. But the ability to have at your fingertips an amazing breadth of information
is really fairly new. Some of the early search engines, things like InfoSeek, AltaVista, Lycos,
were operating in a very different time. Lycos, I think, in the mid-90s
indexed a few hundred thousand web pages. They had a thousand or two thousand queries a day.
Fast forward to today, and there are billions of web pages, billions of queries per day. And so
the world has evolved, you know, a lot in terms of size. It's evolved a lot in terms of diversity
of content. Mostly the web then was HTML pages.
It wasn't videos.
It wasn't images.
It wasn't news.
And so more and more, a variety of different kinds of information are there.
The depth of the analysis that's provided has changed tremendously.
We used to just look at simple keywords.
More and more, we're going beyond keywords to do a deeper understanding
of the language, the objects, the entities.
And think about something like your phone when you're on the go.
You're asking queries verbally often.
That's just such a far cry from typing in 2.1 words into a rectangle on the screen.
How it's presented, how you iterate through it, it's becoming much more of a dialogue. So the world has gone from a situation where search was really this arcane skill.
So you needed almost a graduate degree in library science.
There were librarians.
We went to them and asked for information.
To a case where today search is just ubiquitous.
You expect it to be there.
When it's not, it's incredibly
frustrating. So we've gone from something which was a real specialty skill to something that's
just a core fabric of everything we do. You use it to find information. You use it to buy things,
to learn about medical conditions, to learn about household or electronic troubleshooting.
To find someone you're looking for.
Exactly. Yeah. Sure. And that was available in different ways, not through web search engines.
The ubiquity, I think, makes it more exciting for me in many ways. It's more important to understand people, what they're trying to accomplish, and really to help them
generate, make sense of, and find information. Well, that's an amazing segue into what you're
actually doing about it, because there's a
lot that went on behind the scenes, from being a very specialty thing to something that I
can use very, very easily every day.
And in fact, my sister's three-year-old grandchild can do it better than I can, right?
What they call a magazine, an iPad that doesn't work.
That's right.
So I want to talk specifically today about three areas where your research contributions have, as you say, built bridges among several communities, notably human-computer interaction, information retrieval, or IR, and web.
So first, let's start with the work you did way back at Bell Labs, before you even came to Microsoft Microsoft Research in what you referred to a little bit earlier as latent semantic indexing or LSI.
So this work addresses what's known as vocabulary mismatch in IR systems.
You'll unpack that for us.
I will.
Explain the problem first, how you addressed it, and then tell us why this work from the 1990s is still relevant and highly cited today.
Yeah, right?
A century ago.
In graduate school, I pursued research interests in cognitive science.
So a lot of my work there revolved around building models of how people learn and retrieve information from their own memories.
And when I moved to Bell Labs and really started interacting much more with what was becoming a very ubiquitous computer industry at the time,
I got very interested in how people find information from external sources.
So not their own heads, but other people, computers.
And one of the problems that kept coming up over and over and over again
was this kind of impedance mismatch between the way that I seek information
and the way that you as an author might have written that information.
It was very acute at Bell Labs because I was trying to learn the Unix operating system,
and I wanted to find the function that allowed me to find a word in a document that I had,
and it was called GRIP for generate regular expression.
Who in their right mind would have done that?
An engineer.
Well, somebody who did not understand the broad set of users who might wind up using those systems.
And so there are two aspects to the problem, and they're both due to fundamental characteristics of how people generate text.
The first is called synonymy, that we use many different words to describe the same object.
So you might refer to a medical professional as a doctor or a physician.
Apple means fruit, and in the last 40 years or so, it's meant a computer system.
Even people like Michael Jordan, this very famous computer scientist named Michael Jordan.
There's also a more famous basketball player named Michael Jordan. Sad for the computer scientist. No, no, actually, we take care of
him in web search engines. I bet you do. And so one problem is that there are lots of ways of
saying the same thing. And the other problem, which I just mentioned, is that the same word
can have many different meanings. And both of those present problems for retrieval. I think the key insight in latent semantic indexing was that we tried to represent words not as isolated tokens,
but as a richer representation of the context in which they appear. So we projected words into a
much lower dimensional space. And the impact was it brought together words that shared similar
contexts. So physician and doctor occur in the same company. And that allowed those words to be very similar in this reduced dimension or what we
call semantic space. There's been a tremendous resurgence of interest in these word embeddings
or context embeddings in the last five years or so. Many of the modern word embedding techniques,
whether it's Word2Vec or GloVe or BERT or GPT-2, really share the same
goal of uncovering latent structure. That problem still exists because people write and read and
understand text, and there's tremendous variability in that. What has changed tremendously are the
data resources. It's easy to get billions of web pages, hundreds of thousands of
Wikipedia pages. The computational capabilities have increased and really
the representational richness of the models have changed tremendously by
orders and orders of magnitude. And so I think there's been a resurgence in
rethinking what you can do with some of these approaches. Well, another area in which you and your colleagues have made a significant
contribution is in the area of context in search.
Context in anything makes a difference
with language. And this is integrally linked to the idea of personalization, which is a
buzzword in almost every area of computer science research these days. How can we give
people a valet service experience with their technical devices and systems? So tell us
about the technical approaches you've taken on context
in search and how they've enabled machines to better recognize or understand the rich
contextual signals, as you call them, that can help humans improve their access to information.
If you take a step back and consider what a web search engine is, it's incredibly
difficult to understand what somebody is looking for given
typically two to three words. These two to three words appear in a search box, and what you try to
do is match those words against billions of documents. That's a really daunting challenge.
That challenge becomes a little easier if you can understand things about where the query is coming
from. It doesn't fall from the sky, right? It's issued by a real live human being. They've searched for things in the longer term,
maybe more acutely in the current session. It's situated in a particular location and time.
All of those signals are what we call context. They help understand why somebody might be
searching and more importantly, what you might do to help them, what they might mean by that.
You know, again, it's much easier to understand queries if you have a little bit of context about
it. If I search for Michael Jordan, and you know I'm a computer scientist, that provides you a
signal. If today I type in Hong Kong airport, I probably don't want to know about all the
concession stores in the Hong Kong airport. I want to know about all the concession stores in the Hong Kong airport.
I want to know about ongoing protests there.
A lot of searches are motivated by things that happen in the real world.
And so that's what context means, just trying to understand a little bit about where the request is coming from,
what larger task it might be embedded in, what contextual situation it might be embedded in. If you have a single web
search engine and you return exactly the same results for the same query to everyone at every
point in time in every location, you're going to have suboptimal performance. All right, so going
a little deeper on the technical approaches that you've taken to bring context. I'm leaving traces wherever I go
online. I'll leave a little footprint or fingerprint, and that becomes part of this
inferred data about who I am, what I'm doing. Like you said, if I searched the Hong Kong airport
maybe six months ago, I wouldn't get the same results today.
Right. What you just highlighted is what I would call contextualization.
So in that case, there are spikes in queries.
Queries do not occur uniformly over time.
And so when a query starts spiking, things like Hong Kong airport or Hong Kong in general,
you better figure out what's going on.
In many cases, it's driven by external events.
That's not you as an individual. It's the aggregate of people who are approaching search engines, asking different
queries over time. So you can think about it at an aggregate level, you know, at a more personal
level or in a session. If you've asked a query that's related to basketball, and then you ask
about Michael Jordan, that gives you a hint about how to handle what might be otherwise a
very ambiguous query. Well, a third area of contribution I want to talk about has to do with
the temporal dynamics of information. This rests on the notion that information isn't static.
And when you say it out loud, it seems kind of like a no-brainer. Of course, it isn't static.
But the tools we've traditionally used tend to focus on snapshots of information rather than the dynamic nature of our information.
So tell us again what technical approaches you've explored to help people interact with the reality of dynamic information.
So as you said, the world is constantly changing around us, whether it's the world of information or the physical world in which we live.
In web search, what's changing
is the content. The web is not static. We're crawling new content all the time. The questions
people ask are changing as a function of events that are going on in the world, as a function of
events in their personal lives. And what's most interesting is that what's relevant changes.
So let me just give you an example to ground the pervasiveness. If you typed in the query U.S. Open, do you mean last year or this year? It's an event.
Or do I mean tennis or golf?
Exactly. Even if you said U.S. Open 2019, what's relevant depends on where you are relative to
that event. So right now, you're probably not interested in the scores and results because
they don't exist. You want to buy tickets. During the event, you care about the results. And we've done a couple of things to
try to address that. One is on the algorithmic front. So we've tried to model things like how
the content on web pages changes. We also model how people's interactions change, the queries they
issue, what's clicked on. And by combining those in a kind of time series analysis, you can understand how to weight
new information versus older information.
Search engines learn from people as they interact with things, what's relevant to a particular
query.
But that means new information is disadvantaged because it doesn't have that historical interaction
data.
And so by being smart and modeling things as a time series, knowing how things change over time, you can do a much
better job of finding information. We also built what I think is a really fun system. It's still
one of my favorite systems. It was a browser plugin called DiffIE, not a very well-named system.
I complained about grep earlier. This is not a lot better.
Still coming up with dumb names.
Right. Exactly. This was a prototype we built to help people understand how the world was
changing around them. And what I mean by that is the system, all in the browser,
as you visited a web page, would look at how that was different than the version of the page that was in the web cache and highlight those changes to you.
Wow.
So yeah, it was a totally fun system.
So imagine going to a news site and you'd see what the changes were,
not relative to what a news editor thought, but since you had last been there.
Sure.
If you hadn't been there in two days, it might be what the headlines were.
If you were following a story, it would just show you what was different.
It really brought to light for people how information changes in ways that they had never seen before. So if I would go
to somebody's webpage, I might see new publications highlighted. I might see a new job title. And that
really brought the dynamics to people in ways that were really previously hidden. And so that was a
really fun project that touched not so much
on the underlying algorithms, but how we can help people understand and experience that
change.
The interesting thing about the temporal dynamics of things, I mean, just yesterday,
my husband came home and he said, there was a huge accident on 405. So I go and search accident on 405. Well, five days ago, seven years ago, 405 in California. It's like, there's still a lot of work that needs to be done on this temporal dynamics. things that are a priori really important that you want to make sure to continue to retrieve,
and the dynamics of information. In that case, it's also possible that the content wasn't there.
But the fact that you thought about going to web search suggests that you expect to find that kind of information there. And as you say, there's a long way to go in a lot of this.
Well, you know where it led me was to the Washington State DOT Twitter feed,
which is immediate. You know, somebody's on that, but it doesn't hit the web as news necessarily
if it's just happened in the last half hour.
Right. That gets to the point of trying to integrate different sources of information.
But you need to stay on top of that.
How do web search engines decide what to crawl and what frequency to crawl it at?
Or is some of the information pushed?
This highlights a couple of different dimensions.
One is getting the data in the first place.
And then how you take all this stuff from web pages to news to maybe Twitter feeds
to structured data like Wikipedia feeds and compose those into an environment
or representation that can really help people solve their needs.
And that's much harder if you're on a phone.
Because if you have a big screen in front of you, you can show a lot of information. You can allow
people's visual systems to quickly scan it. If you're on a phone, you need to take your best
guess, iterate, start a conversation with people. It's a much more temporal processing of the
information than a spatial one.
Also, you look at the generational aspect of this. My daughter rarely goes on her computer
unless she's doing something for school. She's on her phone. That is her primary source. And so
that data point has got to be where a lot of researchers' brains are heading is,
well, what is the mobile first generation? How are we going to adapt something innovative that
we did into this milieu? The world's constantly changing and you need to evolve. We've clearly gone off as
an industry search and even beyond that from the desktop into the real world.
Right.
And I think that raises all sorts of interesting opportunities as well as challenges.
We're not even going to talk about HoloLens or any of the other wearable technologies
that I've had other researchers in the booth about saying, hey, even your phone looking at your rectangle is going to be obsolete sooner than you think.
So, Susan, I can find almost any piece of general information by searching the web,
but my own information is fragmented and scattered everywhere on apps, bookmarks,
email folders, devices, etc. Tell us how your current interest in non-web search applications is going to help people
like me access my personal information better.
Right.
It's interesting.
I think the search industry for a while was focused actually on finding information on
your desktop, finding information in email.
And with the advent of the web, a lot of public information moved online.
And you've seen a tremendous set of innovations in that arena.
But search is really much more prevalent.
And a particular pain point for me, I told you I was motivated by things that annoy me,
are that we haven't done as good a job of helping people make sense of their own kind of personal space of information is the way I like to think about it.
In many ways, it's stuff you've seen before,
stuff you've interacted with.
It's web pages, it's email, it's documents,
apps of all kinds.
There are so many times when you say,
I know I saw this article or I saw this photo.
Where is it?
Yeah, was it on Twitter?
Was it on the web?
Was it on Instagram?
Was it on Facebook?
And there's no reason that you should have to remember that. And so I think the challenge
is providing people with unified access to that information without necessarily making copies of
it everywhere. At Microsoft, we are certainly working on it from within the Microsoft ecosystem.
It's increasingly easy to find not just files, but shared files, email with the click of a
button. In research, Shane Williams and others have developed a prototype called TaskEasy that
tries to improve that. But it's an area that I think still has a lot of opportunity for improvement.
Let me ask you a little off-script question, because this is a frustration of mine.
When I do a web search and I misspell
something by accident, it tells me, did you mean, or looking for results for, or on other websites.
If I spell your name wrong, no results. I get nothing. It's a pain point. And the same was
true in web search 20 years ago. If you mistyped something,
you didn't get anything. You didn't, or you got somebody else who randomly
typed things in the same way. One of the things that search engines and lots of other web services
do is understand what people are looking for and the ways in which they're doing it.
Web search engines have gotten better at searching,
not because the algorithms are better, but because you can observe in aggregate lots of people
searching for things, failing to find them. There were some really interesting observations that
folks published very early on about web search. They were things that were unexpected to people
who were in the search industry. We all thought that people would go to web search. They were things that were unexpected to people who were in the search industry.
We all thought that people would go to web search and type in these beautiful informational
requests. The most common queries at that time in the late 90s were things like eBay,
Hotmail, Pokemon, Weather, Horoscope. They weren't asking for information. They were
using the web to navigate to things. Getting back to your spelling example, there are many queries, things like, I think, Abercrombie and Fitch, Arnold Schwarzenegger, that are misspelled more than they're spelled correctly.
But it's learning by people typing things incorrectly, looking at their reformulations, and then figuring out how to improve the spelling correction to handle those cases.
We've talked about personalization, which in theory is something we all want,
but there's always some big trade-offs here. We'll get to the pitfalls in a second
and the discussion of the downside of large-scale behavior analysis. But right now,
tell us about the potential of large-scale behavior analysis. But right now, tell us about the potential of large-scale behavioral analysis
that helps you contextualize things.
One of the things that's happened
over the last two decades
is that web-based services,
whether it's a website that you go to,
travel sites, shopping sites,
web-based services like this,
because they see lots and lots of information,
have provided this really
new lens on to how people are interacting with their systems. They
provide insights about how you can improve those systems. This is a lens on
to people's behavior that we just never had before. Even when I joined Microsoft,
when I first joined, folks from Office Help came and said, help us fix Office Help search.
And so my first question was, what are the most common queries?
And they go, we don't know.
What are people looking for?
We don't know.
The reason they didn't know is search for Office Help happened on your desktop machine.
All the Office Help was downloaded to your desktop.
All of the searching was done on your desktop.
We knew nothing about what people were asking or whether they were being successful.
Because your desktop was Las Vegas.
Whatever happened there stayed there.
Exactly.
My desktop is a little cleaner than Las Vegas, but yeah.
Good to know.
And the minute they moved office search onto the web, you learned all sorts of things.
And so by knowing what people are seeking, doing, we can create the relevant content.
We can create the relevant algorithms.
And so this has been an amazingly rich lens, this virtuous feedback cycle between delivering content and using it to understand what it is that people are looking for and where the failure points are.
It's hard to understate how much systems have really changed because of that.
Well, like a recurring nightmare, here we are again at What Could Possibly Go Wrong.
Right.
And you've done a lot and seen a lot over the course of your career.
One thing that's of great interest to me is this idea that in order to help us get better search results, and I want that, I want personalization on the one hand, but the things I have to give up about my own personal information,
my privacy, I'm giving up to the web to help you make my search better.
So talk about the potential pitfalls here, because I know you're thinking about them.
What keeps you up at night?
Yeah, sure.
In general, there's really a need to balance in a very thoughtful and responsible way the
benefits that accrue from seeing various
kinds of interaction, understanding how people are interacting with systems, and the potential
risks for storing information about individuals that enable these services. For some of the things
that we talked about in terms of spelling correction, the fact that there were navigational
queries that people had not anticipated, those happen at the aggregate level.
And frankly, a lot of the insights happen at the aggregate level or group level.
And some of them happen at the individual level, but many of them happen at a much higher level.
We all make these tradeoffs every day.
You know, I give credit card information to some services because it's easier for me.
I want to save my purchase history in some places because it's much easier to go back and refine things.
And I think, you know, as a company, Microsoft is tremendously invested in protecting people's privacy, the security of information that people entrust us with.
So I think as an industry, what we need to do is work first and foremost to protect whatever data we have.
Also to be clear on what information is being stored, be transparent about it, and provide people with ways of opting out of that.
When I type in the beginning of a name, I would like it to autocomplete.
When I move to a new computer where that's not the case, I find it frustrating.
And again, there are ways that that data can be stored over different time horizons. It can be aggregated and anonymized. I think search engines in
particular, but really almost any web service, tries to strike the right balance between
understanding things at a very fine level and then aggregating things where that's relevant
and appropriate. All right. It's story time.
I happen to know you didn't start out thinking,
I'm going to be a computer scientist or the deputy lab director of MSRAI.
So tell us how it all began for you, maybe not back to when you were a baby,
but, you know, kind of academically,
and how you landed here at MSR in your leadership role today.
Well, what you just said is certainly true.
Microsoft didn't exist when I was in high school and in graduate school, what you just said is certainly true. Microsoft didn't
exist when I was in high school and in graduate school, so I had no aspirations of being there.
If I did, I would be incredibly wealthy right now. Yeah, when I look back on my career, I think it's
fun to reflect on a few pivot points because the road from where I was as a high school student
and undergraduate in Maine to Redmond, Washington in the tech industry is not one that I had planned
for meant to end. And I was able and lucky enough to be in environments where I could take some
risks and take some turns. So let me just tell you a few of them that really stand out in my mind.
I started college as a math major intending to go to law school. I wanted to do environmental law.
I took a course when I was a junior called Mathematical Psychology,
which was a course that talked about how people learn information, how they recall information,
and how you can precisely describe the evolution of learning and retrieval of information from
memory. And I was just smitten. I just thought it was the most fascinating thing, blending algorithms with the ability to understand people and how they work.
And so I just decided that I was going to go to psychology graduate school. I had no idea what it was. My parents were even more concerned.
But I did it. I had a blast doing it. And then when I finished my PhD, I had every intent of teaching at a university. And when I was looking for jobs, I got a call from Bell Labs.
And they had just started the industry's first human-computer interaction lab.
And I was still all set on going to a university.
And my undergraduate advisor called and said, I hear you're going to Bell Labs.
And I said, no, I don't think so.
And he literally said, you ought to have your head examined. And I said, no, I don't think so. And he literally said, you ought to have your head
examined. And I asked why. And he made a very good point, which was, you really have nothing to lose
by this and a lot to gain. You're at the beginning of something that could be a really important
future direction. And if you decide you don't like it, you can leave. And two years later,
you'll be better off than you are now in looking for jobs. And almost 40 years later, you can leave. And two years later, you'll be better off than you are now in looking
for jobs. And almost 40 years later, you can say that it suited me very well. And my transition
from Bell Labs to Microsoft was also based on opportunities that I decided to seize. We had had
a postdoc at Bell Labs who was a product manager in office at the time on FindFast. And he said,
hey, Microsoft Research
is looking for somebody in information retrieval. I told them they should reach out to you. And
again, I said, I'm not looking to move. But I came. I really enjoyed meeting the people,
the problems, the scale of problems. I could just see being very, very different from what I had.
And again, now 22 years later, it's been maybe one of the best decisions in my
life, in part because what I'm interested in, helping people create, find, manage, make sense
of information, is exactly what Microsoft is about. So every question I have, every innovation
I have has really natural outlets. And I find that really sort of exciting and fun. MSR is also just this amazingly vibrant
intellectual environment that I love.
People from lots of different perspectives coming together.
Well, as we close,
and I'm sad that we're closing because you're fun.
There are a handful of people
who've really earned the right
through length, depth, and quality of career
to give advice to people,
then you're one of those people.
Let's frame the final question in terms of your leadership role in cultivating the next generation of talent here at MSR.
Tell our audience from your perspective what's on the horizon in the field,
and why is now a good time to be a researcher?
When I think back on my career and I look at other successful people, I think we all share some traits that I think are important to think about.
One is to have a purpose, but also be willing to seize a new opportunity.
And I just told you several times how really pivotal points in my life came from having a true north, but also be willing to take not the obvious and
straight path to it.
So Jack Sparrow's compass?
Exactly.
Wherever.
No, no, actually not wherever.
I had a goal, but I was also willing to deviate when there were opportunities.
The second is to be passionate about what you do.
I think I'm incredibly fortunate to be in an environment where my passion and what people
pay me to do align.
But in any endeavor, you're going to work hard. You're going to work long hours.
Find something that speaks to you. It might be an application area. It might be a particular
theoretical framework, a methodology. But make sure that at the end of the day, when you've
worked really hard, you're proud of that outcome. And perhaps the most important thing is to persevere. Be persistent in what you do.
There is no straight path to an aspiration and how you get there. And I think it's often
deceptive because students will see this brilliant talk by somebody who's very well known in the
field. Oh my gosh, this person is just brilliant.
Sure, they may be brilliant,
but they've also worked hard behind the scenes.
They've tried lots of things that failed.
And I think it's really important to stick with it
and learn from failures, but also celebrate successes.
In terms of, I think, some of the interesting areas
moving forward, let me just mention three.
One of them is that I think more and more information retrieval is moving from helping people find information to helping people get things done.
I've spent a lot of my life thinking about search.
It is nobody's end goal.
You don't get up in the morning and say, I'm going to search for the next two minutes.
You're trying to accomplish a task.
And search is a means by which you do that.
And I think we shouldn't ever forget that.
So really trying to go from finding information to using that information in a way that helps you solve the problem.
The other one we mentioned briefly before, it's moving off the desktop into the world.
More and more our systems are
interacting. There's this interesting mix of digital and physical worlds. And I guess the
last is a personal one. I think there are really interesting opportunities moving forward to
combine insights from computation, cognitive science, and neuroscience. It's an area that
I haven't had as much time to spend as I would like,
but I think there's some interesting things coming together in that space.
You know, I'm glad that you're passionate and persistent about what you're doing because it's helped my life in many, many ways. You're right. I don't get up and say, I'm going to go search.
I have to find something and I need that click to be the one I want.
Susan Dume, thank you so much, finally, for coming on the podcast.
Thanks, Gretchen. It's really been fun to talk with you.
To learn more about Dr. Susan Dume and how the search for better search goes on, visit microsoft.com slash research.