Microsoft Research Podcast - 090 - HCI, IR and the search for better search with Dr. Susan Dumais

Starting point is 00:00:00 I think more and more information retrieval is moving from helping people find information to helping people get things done. I've spent a lot of my life thinking about search. It is nobody's end goal. You don't get up in the morning and say, I'm going to search for the next two minutes. You're trying to accomplish a task. And search is a means by which you do that. And I think we shouldn't ever forget that. You're listening to the Microsoft Research Podcast, a show that brings you closer to the

Starting point is 00:00:31 cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga. Dr. Susan Dume knows you have things to do. And if you need help finding stuff to get them done, and you probably do, then her long and illustrious career in search technologies has been worth it. Situated firmly in Louis Pasteur's quadrant of the research grid, the square where you answer yes to both the quest for fundamental understanding and use-based applications, the Microsoft Technical Fellow and Deputy Lab Director of MSRAI has made finding information the focus of her career

Starting point is 00:01:11 and has probably made your life a little more productive in the process. Today, Dr. Dumais tells us how the landscape of information retrieval has evolved over the past 20 years, reminds us that queries don't fall from the sky, but are grounded in the context of real people, real events, and real time,

Starting point is 00:01:28 talks about her current interest in non-web-based search, or how I can easily put my hands on my own digital belongings, and reveals what Apples and Michael Jordan have in common with search research. That and much more on this episode of the Microsoft Research Podcast. Susan Dume, welcome to the podcast. Thank you, Gretchen. Listen, I've been waiting a long time to get you on. Way back in 2017, Eric Horvitz said,

Starting point is 00:02:06 you got to get Susan on the podcast. I guess you're kind of like a hot Manhattan restaurant. You have to book two years out. Well, it's finally come true and it's fun to be here. I like to start by situating my guests and their research. So let's get situated. You're a Microsoft Technical Fellow and the Deputy Managing Director of Microsoft Research AI. And your work lives at the intersection of information retrieval and human-computer interaction. Actually, as we've noted, it's a much larger intersection than that, but we'll

Starting point is 00:02:33 keep it at those two roads for now. And you have more papers, patents, and honors than it would be prudent to list in a half-hour podcast. But it's worth noting that there's a common theme running through all the accomplishments and accolades. So tell us in broad strokes, what's the driving motivation behind the work you do and why you do it? What gets you up in the morning? I think there are two commonalities and themes in my work. One is topical. So as you said, I'm really interested in understanding problems from a very user-centric point of view.

Starting point is 00:03:01 I care a lot about people, their motivations, the problems they have. I also care about solving those problems with new algorithms, new techniques, and so on. So a lot of my work involves this intersection of people and technology, thinking about how work practices co-evolve with new technological developments. And so thematically, that's an area that I really like. I like this ability to go back and forth between understanding people, how they think, how they reason, how they learn, how they find information, and finding solutions that work for them. I mean, in the end, if something doesn't work for people, it doesn't work. In addition to topically, I approach problems in a way that is motivated oftentimes

Starting point is 00:03:42 by things that I find frustrating. We may talk a little bit later about my work in latent semantic indexing, but that grew out of a frustration with trying to learn the Unix operating system. Work I've done on email spam grew out of frustration in mitigating the vast amount of junk that I was getting. So I tend to be motivated by problems that I have now or that I anticipate that our customers and people will have in general, given the emerging technology trends. And I approach it not just from a use-based perspective, understanding situations that will likely happen, but also try to generalize a bit and provide a more theoretical and generalizable foundation. Donald Stokes wrote a fascinating book about basic science and technology innovation. And he talks about Pasteur's quadrant, which is use-based fundamental

Starting point is 00:04:33 research. And I characterize myself as living in Pasteur's quadrant. That's a good place to live. Yeah. I love the idea that you talk about things that frustrate you and you want to solve them because if it frustrates you, it's probably frustrating me too. And so I'm glad to know that yourate me. I try to understand how broadly applicable those ideas are. But there are things that frustrate me that if I spent a career solving them would not benefit lots of other people. But my work is really very much motivated by pain points that, let's do a little then and now on the search landscape. Because contrary to what we experienced today, high quality search results were not always a click away. So give us a snapshot of the field 20 years ago and tell us how things have evolved in part because of the work you've done over the ensuing decades. Yeah, you're absolutely right. If you're under 20 years of age, you have probably not lived in a world where you don't have at your fingertips access to an increasingly broad set of information 24-7.

Starting point is 00:06:01 Even in, we'll say, the mid-90s, the first web search engines were just starting. And by web search engine, I mean a system that crawls for content, indexes that content, and provides it in a browser. We clearly had libraries. We had library catalogs. But the ability to have at your fingertips an amazing breadth of information is really fairly new. Some of the early search engines, things like InfoSeek, AltaVista, Lycos, were operating in a very different time. Lycos, I think, in the mid-90s indexed a few hundred thousand web pages. They had a thousand or two thousand queries a day. Fast forward to today, and there are billions of web pages, billions of queries per day. And so the world has evolved, you know, a lot in terms of size. It's evolved a lot in terms of diversity of content. Mostly the web then was HTML pages.

Starting point is 00:06:46 It wasn't videos. It wasn't images. It wasn't news. And so more and more, a variety of different kinds of information are there. The depth of the analysis that's provided has changed tremendously. We used to just look at simple keywords. More and more, we're going beyond keywords to do a deeper understanding of the language, the objects, the entities.

Starting point is 00:07:08 And think about something like your phone when you're on the go. You're asking queries verbally often. That's just such a far cry from typing in 2.1 words into a rectangle on the screen. How it's presented, how you iterate through it, it's becoming much more of a dialogue. So the world has gone from a situation where search was really this arcane skill. So you needed almost a graduate degree in library science. There were librarians. We went to them and asked for information. To a case where today search is just ubiquitous.

Starting point is 00:07:42 You expect it to be there. When it's not, it's incredibly frustrating. So we've gone from something which was a real specialty skill to something that's just a core fabric of everything we do. You use it to find information. You use it to buy things, to learn about medical conditions, to learn about household or electronic troubleshooting. To find someone you're looking for. Exactly. Yeah. Sure. And that was available in different ways, not through web search engines. The ubiquity, I think, makes it more exciting for me in many ways. It's more important to understand people, what they're trying to accomplish, and really to help them

Starting point is 00:08:19 generate, make sense of, and find information. Well, that's an amazing segue into what you're actually doing about it, because there's a lot that went on behind the scenes, from being a very specialty thing to something that I can use very, very easily every day. And in fact, my sister's three-year-old grandchild can do it better than I can, right? What they call a magazine, an iPad that doesn't work. That's right. So I want to talk specifically today about three areas where your research contributions have, as you say, built bridges among several communities, notably human-computer interaction, information retrieval, or IR, and web.

Starting point is 00:08:59 So first, let's start with the work you did way back at Bell Labs, before you even came to Microsoft Microsoft Research in what you referred to a little bit earlier as latent semantic indexing or LSI. So this work addresses what's known as vocabulary mismatch in IR systems. You'll unpack that for us. I will. Explain the problem first, how you addressed it, and then tell us why this work from the 1990s is still relevant and highly cited today. Yeah, right? A century ago. In graduate school, I pursued research interests in cognitive science.

Starting point is 00:09:31 So a lot of my work there revolved around building models of how people learn and retrieve information from their own memories. And when I moved to Bell Labs and really started interacting much more with what was becoming a very ubiquitous computer industry at the time, I got very interested in how people find information from external sources. So not their own heads, but other people, computers. And one of the problems that kept coming up over and over and over again was this kind of impedance mismatch between the way that I seek information and the way that you as an author might have written that information. It was very acute at Bell Labs because I was trying to learn the Unix operating system,

Starting point is 00:10:13 and I wanted to find the function that allowed me to find a word in a document that I had, and it was called GRIP for generate regular expression. Who in their right mind would have done that? An engineer. Well, somebody who did not understand the broad set of users who might wind up using those systems. And so there are two aspects to the problem, and they're both due to fundamental characteristics of how people generate text. The first is called synonymy, that we use many different words to describe the same object. So you might refer to a medical professional as a doctor or a physician.

Starting point is 00:10:51 Apple means fruit, and in the last 40 years or so, it's meant a computer system. Even people like Michael Jordan, this very famous computer scientist named Michael Jordan. There's also a more famous basketball player named Michael Jordan. Sad for the computer scientist. No, no, actually, we take care of him in web search engines. I bet you do. And so one problem is that there are lots of ways of saying the same thing. And the other problem, which I just mentioned, is that the same word can have many different meanings. And both of those present problems for retrieval. I think the key insight in latent semantic indexing was that we tried to represent words not as isolated tokens, but as a richer representation of the context in which they appear. So we projected words into a much lower dimensional space. And the impact was it brought together words that shared similar

Starting point is 00:11:41 contexts. So physician and doctor occur in the same company. And that allowed those words to be very similar in this reduced dimension or what we call semantic space. There's been a tremendous resurgence of interest in these word embeddings or context embeddings in the last five years or so. Many of the modern word embedding techniques, whether it's Word2Vec or GloVe or BERT or GPT-2, really share the same goal of uncovering latent structure. That problem still exists because people write and read and understand text, and there's tremendous variability in that. What has changed tremendously are the data resources. It's easy to get billions of web pages, hundreds of thousands of Wikipedia pages. The computational capabilities have increased and really

Starting point is 00:12:31 the representational richness of the models have changed tremendously by orders and orders of magnitude. And so I think there's been a resurgence in rethinking what you can do with some of these approaches. Well, another area in which you and your colleagues have made a significant contribution is in the area of context in search. Context in anything makes a difference with language. And this is integrally linked to the idea of personalization, which is a buzzword in almost every area of computer science research these days. How can we give people a valet service experience with their technical devices and systems? So tell us

Starting point is 00:13:23 about the technical approaches you've taken on context in search and how they've enabled machines to better recognize or understand the rich contextual signals, as you call them, that can help humans improve their access to information. If you take a step back and consider what a web search engine is, it's incredibly difficult to understand what somebody is looking for given typically two to three words. These two to three words appear in a search box, and what you try to do is match those words against billions of documents. That's a really daunting challenge. That challenge becomes a little easier if you can understand things about where the query is coming

Starting point is 00:14:02 from. It doesn't fall from the sky, right? It's issued by a real live human being. They've searched for things in the longer term, maybe more acutely in the current session. It's situated in a particular location and time. All of those signals are what we call context. They help understand why somebody might be searching and more importantly, what you might do to help them, what they might mean by that. You know, again, it's much easier to understand queries if you have a little bit of context about it. If I search for Michael Jordan, and you know I'm a computer scientist, that provides you a signal. If today I type in Hong Kong airport, I probably don't want to know about all the concession stores in the Hong Kong airport. I want to know about all the concession stores in the Hong Kong airport.

Starting point is 00:14:45 I want to know about ongoing protests there. A lot of searches are motivated by things that happen in the real world. And so that's what context means, just trying to understand a little bit about where the request is coming from, what larger task it might be embedded in, what contextual situation it might be embedded in. If you have a single web search engine and you return exactly the same results for the same query to everyone at every point in time in every location, you're going to have suboptimal performance. All right, so going a little deeper on the technical approaches that you've taken to bring context. I'm leaving traces wherever I go online. I'll leave a little footprint or fingerprint, and that becomes part of this

Starting point is 00:15:32 inferred data about who I am, what I'm doing. Like you said, if I searched the Hong Kong airport maybe six months ago, I wouldn't get the same results today. Right. What you just highlighted is what I would call contextualization. So in that case, there are spikes in queries. Queries do not occur uniformly over time. And so when a query starts spiking, things like Hong Kong airport or Hong Kong in general, you better figure out what's going on. In many cases, it's driven by external events.

Starting point is 00:16:03 That's not you as an individual. It's the aggregate of people who are approaching search engines, asking different queries over time. So you can think about it at an aggregate level, you know, at a more personal level or in a session. If you've asked a query that's related to basketball, and then you ask about Michael Jordan, that gives you a hint about how to handle what might be otherwise a very ambiguous query. Well, a third area of contribution I want to talk about has to do with the temporal dynamics of information. This rests on the notion that information isn't static. And when you say it out loud, it seems kind of like a no-brainer. Of course, it isn't static. But the tools we've traditionally used tend to focus on snapshots of information rather than the dynamic nature of our information.

Starting point is 00:16:49 So tell us again what technical approaches you've explored to help people interact with the reality of dynamic information. So as you said, the world is constantly changing around us, whether it's the world of information or the physical world in which we live. In web search, what's changing is the content. The web is not static. We're crawling new content all the time. The questions people ask are changing as a function of events that are going on in the world, as a function of events in their personal lives. And what's most interesting is that what's relevant changes. So let me just give you an example to ground the pervasiveness. If you typed in the query U.S. Open, do you mean last year or this year? It's an event. Or do I mean tennis or golf?

Starting point is 00:17:32 Exactly. Even if you said U.S. Open 2019, what's relevant depends on where you are relative to that event. So right now, you're probably not interested in the scores and results because they don't exist. You want to buy tickets. During the event, you care about the results. And we've done a couple of things to try to address that. One is on the algorithmic front. So we've tried to model things like how the content on web pages changes. We also model how people's interactions change, the queries they issue, what's clicked on. And by combining those in a kind of time series analysis, you can understand how to weight new information versus older information. Search engines learn from people as they interact with things, what's relevant to a particular

Starting point is 00:18:17 query. But that means new information is disadvantaged because it doesn't have that historical interaction data. And so by being smart and modeling things as a time series, knowing how things change over time, you can do a much better job of finding information. We also built what I think is a really fun system. It's still one of my favorite systems. It was a browser plugin called DiffIE, not a very well-named system. I complained about grep earlier. This is not a lot better. Still coming up with dumb names.

Starting point is 00:18:45 Right. Exactly. This was a prototype we built to help people understand how the world was changing around them. And what I mean by that is the system, all in the browser, as you visited a web page, would look at how that was different than the version of the page that was in the web cache and highlight those changes to you. Wow. So yeah, it was a totally fun system. So imagine going to a news site and you'd see what the changes were, not relative to what a news editor thought, but since you had last been there. Sure.

Starting point is 00:19:17 If you hadn't been there in two days, it might be what the headlines were. If you were following a story, it would just show you what was different. It really brought to light for people how information changes in ways that they had never seen before. So if I would go to somebody's webpage, I might see new publications highlighted. I might see a new job title. And that really brought the dynamics to people in ways that were really previously hidden. And so that was a really fun project that touched not so much on the underlying algorithms, but how we can help people understand and experience that change.

Starting point is 00:19:50 The interesting thing about the temporal dynamics of things, I mean, just yesterday, my husband came home and he said, there was a huge accident on 405. So I go and search accident on 405. Well, five days ago, seven years ago, 405 in California. It's like, there's still a lot of work that needs to be done on this temporal dynamics. things that are a priori really important that you want to make sure to continue to retrieve, and the dynamics of information. In that case, it's also possible that the content wasn't there. But the fact that you thought about going to web search suggests that you expect to find that kind of information there. And as you say, there's a long way to go in a lot of this. Well, you know where it led me was to the Washington State DOT Twitter feed, which is immediate. You know, somebody's on that, but it doesn't hit the web as news necessarily if it's just happened in the last half hour. Right. That gets to the point of trying to integrate different sources of information.

Starting point is 00:20:56 But you need to stay on top of that. How do web search engines decide what to crawl and what frequency to crawl it at? Or is some of the information pushed? This highlights a couple of different dimensions. One is getting the data in the first place. And then how you take all this stuff from web pages to news to maybe Twitter feeds to structured data like Wikipedia feeds and compose those into an environment or representation that can really help people solve their needs.

Starting point is 00:21:22 And that's much harder if you're on a phone. Because if you have a big screen in front of you, you can show a lot of information. You can allow people's visual systems to quickly scan it. If you're on a phone, you need to take your best guess, iterate, start a conversation with people. It's a much more temporal processing of the information than a spatial one. Also, you look at the generational aspect of this. My daughter rarely goes on her computer unless she's doing something for school. She's on her phone. That is her primary source. And so that data point has got to be where a lot of researchers' brains are heading is,

Starting point is 00:21:57 well, what is the mobile first generation? How are we going to adapt something innovative that we did into this milieu? The world's constantly changing and you need to evolve. We've clearly gone off as an industry search and even beyond that from the desktop into the real world. Right. And I think that raises all sorts of interesting opportunities as well as challenges. We're not even going to talk about HoloLens or any of the other wearable technologies that I've had other researchers in the booth about saying, hey, even your phone looking at your rectangle is going to be obsolete sooner than you think. So, Susan, I can find almost any piece of general information by searching the web,

Starting point is 00:22:34 but my own information is fragmented and scattered everywhere on apps, bookmarks, email folders, devices, etc. Tell us how your current interest in non-web search applications is going to help people like me access my personal information better. Right. It's interesting. I think the search industry for a while was focused actually on finding information on your desktop, finding information in email. And with the advent of the web, a lot of public information moved online.

Starting point is 00:23:04 And you've seen a tremendous set of innovations in that arena. But search is really much more prevalent. And a particular pain point for me, I told you I was motivated by things that annoy me, are that we haven't done as good a job of helping people make sense of their own kind of personal space of information is the way I like to think about it. In many ways, it's stuff you've seen before, stuff you've interacted with. It's web pages, it's email, it's documents, apps of all kinds.

Starting point is 00:23:33 There are so many times when you say, I know I saw this article or I saw this photo. Where is it? Yeah, was it on Twitter? Was it on the web? Was it on Instagram? Was it on Facebook? And there's no reason that you should have to remember that. And so I think the challenge

Starting point is 00:23:47 is providing people with unified access to that information without necessarily making copies of it everywhere. At Microsoft, we are certainly working on it from within the Microsoft ecosystem. It's increasingly easy to find not just files, but shared files, email with the click of a button. In research, Shane Williams and others have developed a prototype called TaskEasy that tries to improve that. But it's an area that I think still has a lot of opportunity for improvement. Let me ask you a little off-script question, because this is a frustration of mine. When I do a web search and I misspell something by accident, it tells me, did you mean, or looking for results for, or on other websites.

Starting point is 00:24:34 If I spell your name wrong, no results. I get nothing. It's a pain point. And the same was true in web search 20 years ago. If you mistyped something, you didn't get anything. You didn't, or you got somebody else who randomly typed things in the same way. One of the things that search engines and lots of other web services do is understand what people are looking for and the ways in which they're doing it. Web search engines have gotten better at searching, not because the algorithms are better, but because you can observe in aggregate lots of people searching for things, failing to find them. There were some really interesting observations that

Starting point is 00:25:17 folks published very early on about web search. They were things that were unexpected to people who were in the search industry. We all thought that people would go to web search. They were things that were unexpected to people who were in the search industry. We all thought that people would go to web search and type in these beautiful informational requests. The most common queries at that time in the late 90s were things like eBay, Hotmail, Pokemon, Weather, Horoscope. They weren't asking for information. They were using the web to navigate to things. Getting back to your spelling example, there are many queries, things like, I think, Abercrombie and Fitch, Arnold Schwarzenegger, that are misspelled more than they're spelled correctly. But it's learning by people typing things incorrectly, looking at their reformulations, and then figuring out how to improve the spelling correction to handle those cases. We've talked about personalization, which in theory is something we all want,

Starting point is 00:26:12 but there's always some big trade-offs here. We'll get to the pitfalls in a second and the discussion of the downside of large-scale behavior analysis. But right now, tell us about the potential of large-scale behavior analysis. But right now, tell us about the potential of large-scale behavioral analysis that helps you contextualize things. One of the things that's happened over the last two decades is that web-based services, whether it's a website that you go to,

Starting point is 00:26:35 travel sites, shopping sites, web-based services like this, because they see lots and lots of information, have provided this really new lens on to how people are interacting with their systems. They provide insights about how you can improve those systems. This is a lens on to people's behavior that we just never had before. Even when I joined Microsoft, when I first joined, folks from Office Help came and said, help us fix Office Help search.

Starting point is 00:27:06 And so my first question was, what are the most common queries? And they go, we don't know. What are people looking for? We don't know. The reason they didn't know is search for Office Help happened on your desktop machine. All the Office Help was downloaded to your desktop. All of the searching was done on your desktop. We knew nothing about what people were asking or whether they were being successful.

Starting point is 00:27:31 Because your desktop was Las Vegas. Whatever happened there stayed there. Exactly. My desktop is a little cleaner than Las Vegas, but yeah. Good to know. And the minute they moved office search onto the web, you learned all sorts of things. And so by knowing what people are seeking, doing, we can create the relevant content. We can create the relevant algorithms.

Starting point is 00:27:52 And so this has been an amazingly rich lens, this virtuous feedback cycle between delivering content and using it to understand what it is that people are looking for and where the failure points are. It's hard to understate how much systems have really changed because of that. Well, like a recurring nightmare, here we are again at What Could Possibly Go Wrong. Right. And you've done a lot and seen a lot over the course of your career. One thing that's of great interest to me is this idea that in order to help us get better search results, and I want that, I want personalization on the one hand, but the things I have to give up about my own personal information, my privacy, I'm giving up to the web to help you make my search better. So talk about the potential pitfalls here, because I know you're thinking about them.

Starting point is 00:28:56 What keeps you up at night? Yeah, sure. In general, there's really a need to balance in a very thoughtful and responsible way the benefits that accrue from seeing various kinds of interaction, understanding how people are interacting with systems, and the potential risks for storing information about individuals that enable these services. For some of the things that we talked about in terms of spelling correction, the fact that there were navigational queries that people had not anticipated, those happen at the aggregate level.

Starting point is 00:29:25 And frankly, a lot of the insights happen at the aggregate level or group level. And some of them happen at the individual level, but many of them happen at a much higher level. We all make these tradeoffs every day. You know, I give credit card information to some services because it's easier for me. I want to save my purchase history in some places because it's much easier to go back and refine things. And I think, you know, as a company, Microsoft is tremendously invested in protecting people's privacy, the security of information that people entrust us with. So I think as an industry, what we need to do is work first and foremost to protect whatever data we have. Also to be clear on what information is being stored, be transparent about it, and provide people with ways of opting out of that.

Starting point is 00:30:18 When I type in the beginning of a name, I would like it to autocomplete. When I move to a new computer where that's not the case, I find it frustrating. And again, there are ways that that data can be stored over different time horizons. It can be aggregated and anonymized. I think search engines in particular, but really almost any web service, tries to strike the right balance between understanding things at a very fine level and then aggregating things where that's relevant and appropriate. All right. It's story time. I happen to know you didn't start out thinking, I'm going to be a computer scientist or the deputy lab director of MSRAI.

Starting point is 00:30:52 So tell us how it all began for you, maybe not back to when you were a baby, but, you know, kind of academically, and how you landed here at MSR in your leadership role today. Well, what you just said is certainly true. Microsoft didn't exist when I was in high school and in graduate school, what you just said is certainly true. Microsoft didn't exist when I was in high school and in graduate school, so I had no aspirations of being there. If I did, I would be incredibly wealthy right now. Yeah, when I look back on my career, I think it's fun to reflect on a few pivot points because the road from where I was as a high school student

Starting point is 00:31:20 and undergraduate in Maine to Redmond, Washington in the tech industry is not one that I had planned for meant to end. And I was able and lucky enough to be in environments where I could take some risks and take some turns. So let me just tell you a few of them that really stand out in my mind. I started college as a math major intending to go to law school. I wanted to do environmental law. I took a course when I was a junior called Mathematical Psychology, which was a course that talked about how people learn information, how they recall information, and how you can precisely describe the evolution of learning and retrieval of information from memory. And I was just smitten. I just thought it was the most fascinating thing, blending algorithms with the ability to understand people and how they work.

Starting point is 00:32:10 And so I just decided that I was going to go to psychology graduate school. I had no idea what it was. My parents were even more concerned. But I did it. I had a blast doing it. And then when I finished my PhD, I had every intent of teaching at a university. And when I was looking for jobs, I got a call from Bell Labs. And they had just started the industry's first human-computer interaction lab. And I was still all set on going to a university. And my undergraduate advisor called and said, I hear you're going to Bell Labs. And I said, no, I don't think so. And he literally said, you ought to have your head examined. And I said, no, I don't think so. And he literally said, you ought to have your head examined. And I asked why. And he made a very good point, which was, you really have nothing to lose

Starting point is 00:32:51 by this and a lot to gain. You're at the beginning of something that could be a really important future direction. And if you decide you don't like it, you can leave. And two years later, you'll be better off than you are now in looking for jobs. And almost 40 years later, you can leave. And two years later, you'll be better off than you are now in looking for jobs. And almost 40 years later, you can say that it suited me very well. And my transition from Bell Labs to Microsoft was also based on opportunities that I decided to seize. We had had a postdoc at Bell Labs who was a product manager in office at the time on FindFast. And he said, hey, Microsoft Research is looking for somebody in information retrieval. I told them they should reach out to you. And

Starting point is 00:33:29 again, I said, I'm not looking to move. But I came. I really enjoyed meeting the people, the problems, the scale of problems. I could just see being very, very different from what I had. And again, now 22 years later, it's been maybe one of the best decisions in my life, in part because what I'm interested in, helping people create, find, manage, make sense of information, is exactly what Microsoft is about. So every question I have, every innovation I have has really natural outlets. And I find that really sort of exciting and fun. MSR is also just this amazingly vibrant intellectual environment that I love. People from lots of different perspectives coming together.

Starting point is 00:34:13 Well, as we close, and I'm sad that we're closing because you're fun. There are a handful of people who've really earned the right through length, depth, and quality of career to give advice to people, then you're one of those people. Let's frame the final question in terms of your leadership role in cultivating the next generation of talent here at MSR.

Starting point is 00:34:34 Tell our audience from your perspective what's on the horizon in the field, and why is now a good time to be a researcher? When I think back on my career and I look at other successful people, I think we all share some traits that I think are important to think about. One is to have a purpose, but also be willing to seize a new opportunity. And I just told you several times how really pivotal points in my life came from having a true north, but also be willing to take not the obvious and straight path to it. So Jack Sparrow's compass? Exactly.

Starting point is 00:35:10 Wherever. No, no, actually not wherever. I had a goal, but I was also willing to deviate when there were opportunities. The second is to be passionate about what you do. I think I'm incredibly fortunate to be in an environment where my passion and what people pay me to do align. But in any endeavor, you're going to work hard. You're going to work long hours. Find something that speaks to you. It might be an application area. It might be a particular

Starting point is 00:35:36 theoretical framework, a methodology. But make sure that at the end of the day, when you've worked really hard, you're proud of that outcome. And perhaps the most important thing is to persevere. Be persistent in what you do. There is no straight path to an aspiration and how you get there. And I think it's often deceptive because students will see this brilliant talk by somebody who's very well known in the field. Oh my gosh, this person is just brilliant. Sure, they may be brilliant, but they've also worked hard behind the scenes. They've tried lots of things that failed.

Starting point is 00:36:11 And I think it's really important to stick with it and learn from failures, but also celebrate successes. In terms of, I think, some of the interesting areas moving forward, let me just mention three. One of them is that I think more and more information retrieval is moving from helping people find information to helping people get things done. I've spent a lot of my life thinking about search. It is nobody's end goal. You don't get up in the morning and say, I'm going to search for the next two minutes.

Starting point is 00:36:45 You're trying to accomplish a task. And search is a means by which you do that. And I think we shouldn't ever forget that. So really trying to go from finding information to using that information in a way that helps you solve the problem. The other one we mentioned briefly before, it's moving off the desktop into the world. More and more our systems are interacting. There's this interesting mix of digital and physical worlds. And I guess the last is a personal one. I think there are really interesting opportunities moving forward to

Starting point is 00:37:15 combine insights from computation, cognitive science, and neuroscience. It's an area that I haven't had as much time to spend as I would like, but I think there's some interesting things coming together in that space. You know, I'm glad that you're passionate and persistent about what you're doing because it's helped my life in many, many ways. You're right. I don't get up and say, I'm going to go search. I have to find something and I need that click to be the one I want. Susan Dume, thank you so much, finally, for coming on the podcast. Thanks, Gretchen. It's really been fun to talk with you. To learn more about Dr. Susan Dume and how the search for better search goes on, visit microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 090 - HCI, IR and the search for better search with Dr. Susan Dumais

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.