Limitless: An AI Podcast - How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein

Starting point is 00:00:03 Since seemingly the beginning of time, search has come in the form of Google and their glorious 10 blue links, ask any question in the world, and the search engine goes off aggregating the world's knowledge and presenting it to you nearly instantly, what can only really be described as magic. But now the interface is shifting. It's shifting from this black box to multimodality and things like Google Lens, where you now have sensors that engage with the real world, and can now give you search in many dimensions, voice, audio, visual, all of the things. And I've been curious about this. EGES, I know you've been really curious about this.

Starting point is 00:00:33 And thankfully, another person has been really interested in this, and this is Robbie Stein, who is now on the show. Welcome, Robbie. Robbie is the VP of product at Google Search. So you seem like the absolute perfect person to discuss this topic with us. And I think the place I want to start is, what does the new version of 10 Blue Links look like? Is search going to continue to be the same? And kind of what is the plan for Google Search moving forward in a world of AI? Yeah, well, first of all, thanks for having me on the show. Really fun to be here and to have this conversation. You know, I think ultimately in this AI moment, it truly is expansionary. We talk about the fact that it's doing more for people than ever before. So when you think about what the future of search looks like, it actually starts from what it is today.

Starting point is 00:01:12 Because the everyday need of someone grabbing a quick phone number, being able to pay a bill online, find a direct link somewhere. It doesn't really go away, but it's that Google search can do so much more. So I think what you're finding is that the search experience has enhanced. And it's become a lot more powerful. And now there's AI experiences that start to show up. It's a little preview at the top of the page where AI could be helpful. You can imagine a world that someone asks a really hard question. You get more and more of those.

Starting point is 00:01:38 And by tapping into there, you're now having this more conversational version of CERC. So the first big theme is that you can go and have an AI-driven experience and one that's a back and forth. You have refinement, you have follow-questions, you have more curiosity. And that's a different paradigm. But it really makes sense for a specific kind of question. You're just kind of browsing, you don't really know what you're trying to ask. Actually, a lot of the core search experience is really optimized for that kind of experience. And that might be the best, which is why you don't see AI for every single query.

Starting point is 00:02:07 But I think the first big change you're going to observe is that conversationality. I think the second is multimodal. I think in the past AI and search in general has been more constrained to a web experience that feels more like a web page. You type text in, you get a page back. But increasingly, it's just this kind of ambient, knowledgeable AI that imagine something that has been encoded to understand as much of the world's information as possible, as much of the web and what parts of the web are most helpful

Starting point is 00:02:37 for every single given question, and it's in this brain. You can talk to it. One way to do it is through text, through the web interface I just mentioned, and you can have a whole conversation. The other way, it could be live. So in our apps right now, you can have a live conversation while you're driving with the exact same model and just start learning about the world

Starting point is 00:02:53 and just talk to you. It's kind of unbelievable. But the other one is images. And we see a 70% increase in the amount of visual searching year over year on Google. It's actually one of the fastest ways people are searching. Because people just want to take their camera out and ask questions about what they're seeing or ask questions about what's on their device. So you kind of move from this world of a page to a world where you can just ask about what you're observing.

Starting point is 00:03:15 And it can kind of fit the form factor that you need given your life, whether you're going for a walk, you're driving, or you're just on your couch on your phone. Google's there with you. I love this description of it's kind of like this ambient brain that's trying to break through the confines of whatever medium it's been chained to. And originally, you know, it's this 10 links, you know, the homepage of Google, the search ranking. This is something that's pretty nostalgic to me, right? And so if it's the doorstep to the internet, anything that evolves with that is super important. And one thing I've been asking myself is, as AI has become popular through LLMs, through chat, GPD, through Google Gemini,

Starting point is 00:03:53 seeing this search page evolve and get to where it is today, I would love to understand the timeline and decision points that made us or made you get to that final medium. So the way I think about it is we've started with the search engine and we get a ranked search page of 10 blue links, right? Ranked from most popular to least popular. And then we had Google Gemini appear, this LLM chat interface. And then this LLM chat interface got access to web search. And then we had Google search powered by Gemini, where we started seeing more of these. AI overviews, and now we even have like Google agents that are doing autonomous work behind the

Starting point is 00:04:29 scenes. Can you briefly walk us through this timeline and decision points of how we got here? Yeah, I mean, I think that there's a couple of pieces in between actually that are helpful to clarify. I mean, I think Google a long time ago had, you know, blue links as this prototypical search page that everyone knows and loves. But actually, it's evolved a lot over the years since then. We think about universal search and how if you ask for local information, maps information shows up, or you ask about visual questions. You get a universal for image search, and you'll see related images.

Starting point is 00:04:58 Or you've asked about trending information, you'll get a top stories unit with articles at the top, right? And then you also have very specific questions someone might ask where we would feature one web result with an extra large snippet at the top of the page. That might be like, hey, how many, I don't know, you have a very specific question that's lots of people ask. And we just kind of highlight that.

Starting point is 00:05:19 So each of those, I think, have been fairly large evolutions to the experience. And I think off the back of that is where AI overviews came. Because we had these rich experiences. You asked about whether you get weather information, right, at the top of the page, where right at the top, you could get more specific information when you had a specific question. And so a natural extension of that is how AI could unlock the ability to really ask these harder, longer questions with natural language. So you could ask a very specific question.

Starting point is 00:05:46 Even if there's not one specific web page that has that information, you could provide this AI overview. And so that was the first big move in evolution of what we saw. And the next one was, okay, well, once you had that, we actually observed that people were trying to get it to trigger more and show up. So people would actually put the word AI in the search box. And so, and then the other thing was they weren't done there. They wanted to ask follow-up questions, and there was no easy way to do that. And so really that led to this idea around AI mode, which is a way to have this conversational experience with search. And it's increasingly been, you know, easy to get to from AI overview. So you search, You get the little AI preview at the top, you expand it, and you can go deeper in AI mode and have a polls.

Starting point is 00:06:24 You can have 10, 20 back and forth conversation with Google now. And so all of that was kind of the main arc of how it's evolved. And then I think it just kind of ascends this kind of curve of complexity and kind of user need. We're like, okay, what's the next thing the user is trying to do? And you just keep asking that question and how else we can be helpful. So, for example, let's say you want to find a restaurant for date night. in San Francisco this Friday at 7 o'clock specifically. Like, it's possible that you could put that into Google and we get an AI overview and

Starting point is 00:06:55 you go back and forth with AI mode until you, you know, like got to some list of great restaurants. But then you're not going to Google just to figure out a restaurant. You actually want the reservation is the ultimate end to that journey. And so your next question is, okay, well, what could we do agentically to actually book that table for you or help, you know, we actually implemented now in his live a way for an agent to browse talk and open table and the web for reservations and then bring back in the list, not only great restaurants that use, you know,

Starting point is 00:07:27 analysis and reasoning to present that list for you, but then in that list, times at the table's available, which is really, really magical. And so hopefully that tells a little bit of the story of how we went, how kind of evolved the experience from one that felt like a page

Starting point is 00:07:42 to one that felt like an interactive AI conversation and one that could increasingly do things, things for you and with you. That interactive conversation is exactly what I think this new world looks like, but I'm kind of struggling to get my head around what that mental leap looks like, right? So I'm used to search. I'm used to tapping keys on my keyboard, typing words, and getting the kind of information that I want, and then kind of scrolling, clicking, and getting to an action. What does the multimodal version of this look like? And maybe we could even do it in the form of like a mental model. So like if classic search was kind of like,

Starting point is 00:08:17 type to links, then what is it when my camera, voice, screenshots, and TV are kind of like the way you interface. So like, what does that analogy look like? Yeah. So I guess the first thing to say is I don't think people want to dramatically change their mental model of Google. They want to basically think, and this is my personal view, and particularly as a user, how I feel, is in general, you kind of want to just say, I have a question. I want to ask Google this question. You put it into Google. And right now, the main way that happens is text, but is very quickly growing to be multimodal, visual voice, et cetera. But they're all just different modes of the same root need, which is, I have a question. It could be a question about something you're looking at. You could take

Starting point is 00:08:58 a screenshot on social media and say, I wonder what this outfit is and how I could buy it. Could be a tree you're walking past. You take a picture of you want to know what that tree is, but it's a question, you know, nevertheless. And so you take your question, you put it into Google. And then you kind of decide what's the easiest way to do it. Well, if you're on your computer on your desktop, or maybe you have a homework question, you'd copy and paste it, or you'd ask a question, or you'd just type it. If you're out on the about, on your phone,

Starting point is 00:09:21 and you'd take a picture. You happen to be an Android device and you're looking on your phone. You use Circle to Search. People are already doing this at enormous scale. It's like a billion people a month using lens, using these products where you can take a photo. So I think this is a pretty commonly understood pattern at this point. And then it just goes to Google.

Starting point is 00:09:38 And from the user's perspective, They don't really want, I think, to be bothered, but too much more than that. They just want to get their question into Google. And then Google should do the work to give you the best possible information. And so if you ask a question that's really basic and you want to browse about it, like let's say there's a new starting quarterback you've never heard of, you put the person's name in. Like, sure, maybe you want a quick description, but you actually want to browse typically that a people search like that.

Starting point is 00:10:01 You want to see photos of the person, quick knowledge panel. You want to see their like recent posts on social media. you want to see articles that have written about them. In many cases, an AI response is actually not great for that kind of a question that's like a browsier kind of need. But if you ask a really specific question, boom, you're going to get AI right at the top with an overview and a way to dive in and have them back and forth.

Starting point is 00:10:22 And so you kind of rely on Google to always give you the best format given the question. And if it's predictable, then you kind of know, okay, if you ask for something inspirational and imagery, you get images, right? You get visual stuff. If you ask a specific question about knowledge, you get AI. If you ask a question about a person, you get see photos of that person, you get

Starting point is 00:10:40 his description of people. So that's how I think about it. And I think what we hear from users. So you're talking about the form factor a lot, which something I'm interested in, because I think a lot of the times when people talk about it, ourselves included of AI, we kind of imagine, well, there's a chat bot, or in terms of search, there's a chat bot, and then there's a text box. And there's not really much in between. And I guess what I'm curious about is the way you see the final form of this kind of evolving over time, because you describe a text box, box with Google, and then we have the multi-modality with cameras. And when I hear this, it kind of reminds me of the early computer where we kind of started with just text-based stuff, which was

Starting point is 00:11:15 better than punch cards. And then we evolved to a graphical user interface, and then we evolved to web browsers. And there was this kind of natural evolution that wasn't obvious, but in hindsight, it was so clear. And I'm curious if you have any ideas on where that evolution goes from here, because I know you pioneered Google lens, which is amazing. I use the product all the time. It's my favorite way of engaging with the world, because it just feels like this magical wand that I could look at anything and get answers from. And I'm curious where you see that kind of progressing to over time. Yeah, well, I mean, you were talking, I mean, earlier you asked me about what's the pattern, like what's the mental model. And I guess in hearing you rearticulate it, the most

Starting point is 00:11:47 succinct way of saying it is it's the mental model of conversation. And so if you think about, you know, when you have a conversation with something or a person, then you have a text field, you have a way to upload photos. You can go live with that person and see them right now, right? You can have like a live video feed, you know, chat with that person. They're all just different modes of accessing that person. And I do think that technology is moving in a direction where it's as simple to communicate with as talking to a person. I mean, it accepts all of the modes and all the ways of discussing with the person too. Now it just turns out that most people for most needs don't want need to have a long conversation. Like they're kind of single shot moments in time. It's like you're

Starting point is 00:12:28 sitting at your computer. You're like, oh, I just totally forgot. I need to call this place. You just Google really quick, phone number, whatever, call them, right? And so you don't need, I don't need to have like a 20-minute conversation with the thing. But if you think about in that little moment, you were kind of communicating with Google, you're expressing a need, you're getting information back. And so I think increasingly that is the right way to think about it from a mental model perspective. And so if you think about where things are going, you can kind of just articulate, you know,

Starting point is 00:12:56 what else, at least for Google, we think a lot about informational needs. So like what other kind of informational task that we could do to do? be helpful for you. Could be new modes, like if new modes come up, that could be useful. But it could also be doing more for you given a certain question. The same way you might text someone to help you with something and say, hey, can you do me a favor and scan all this stuff and tell me what I should know about it? Or, you know, can you, you know, book this event?

Starting point is 00:13:22 I'm trying to get tickets to this game and it's complicated. You know, you might even hire like someone to help you in some cases get the perfect ticket for a Super Bowl game or some, you know, hard to get ticket. You know, maybe an agent could go do that work for you. you. And on the back end, it's almost like getting texting a person and the person getting back to you, say, give me like a few minutes. I'm going to look into this. And then they give you, you know, some result. Actually, these two tickets are available and they're next to each other and they're at this price point. Except that the AI can do things that most people, you know, couldn't pay for or would be really hard for a person to go do because it would require you to search like hundreds of things to get back to you. So I think that's, that's kind of a helpful way of thinking about it. To that last point, the searching hundreds of things, we asked our audience what they wanted to ask you. And one of the questions is, How is Google going to survive when the perceived thing is that, well, AI is kind of cannibalizing search and one has an ad model, one doesn't. And I kind of want to ask you, because I have a feeling what you're going to say, I want you to help me falsify this zero thumb theory or zero

Starting point is 00:14:18 sum theory as it relates to chatbots. And kind of like, I know I've heard you say and I've heard just Google as a whole say that AI overviews and AI as a whole, it kind of makes people search more. And in a way, we're seeing Jevin's paradox where as there becomes more data, there becomes more of an inkling to search more, to create more queries. And I kind of want you to help us understand that phenomenon, I guess, where in the case that there is AI, well, there's actually a lot more search happening. The number of searches don't go down. They actually go up.

Starting point is 00:14:45 Yeah. So I think AI is an expansionary thing. It's like people had all these questions and they could ask so much more, but they didn't because there were limitations. And so I think the best example of that is something like Lens and with AI, which is the best way, right? So you can now take a picture of your bookshelf and say, I've given these books, what should I read next?

Starting point is 00:15:04 Or you could take a screenshot of, you know, a celebrity outfit or something and say, where could I buy this jacket or something? It's possible someone would try to put that kind of a question into Google 10 years ago plus, but it would be pretty hard to do that. In the same way, I would be pretty hard for someone to type in like a 20-sentence question. It's like, I'm going on this trip. I have a kid. The kid has this allergy.

Starting point is 00:15:27 They have this need. I have to go to a hotel. Hotel needs to be far. You just like couldn't do that. And so people would just kind of give up or not do it. And so you get growth when you're unlocking new needs for people, and that's what we're seeing. So I think the best way I can summarize it is the everyday need I just mentioned for people getting fast, efficient information from search isn't really changing. But now you can ask technology so many more questions, and that's where the growth is coming from.

Starting point is 00:15:53 You know, and we talked about this like, you know, AI overviews, you know, we're seeing growth. You know, we talk about where AI overview shows up. It's around when people have a more specific question, but they put like a longer, more specific question. They put that into Google. They get this AI response. Those kinds of questions are up about 10% in large markets like the U.S., which at Google scale is a pretty enormous kind of growth number year over year.

Starting point is 00:16:15 And then these kind of visual searches I mentioned are up 70% year over year. So huge growth in these areas where the market's expanding the fastest. And that's exactly what we're seeing. But obviously the pie is growing and there's lots and lots of opportunities for people to get information from lots of people. And that's exciting. I'm curious. How much importance do you place on consumer hardware devices when you're kind of thinking about building out this vision? You know, Met has been attempting to, you know, revamp the form factor with glasses. Open AI is rumored to be building their own consumer device. Goodness knows what that looks like. And, you know, obviously, Google

Starting point is 00:16:50 values their partnership with Apple being the primary search function through Safari. Is this something you consider at all? Is this something that Google kind of wants to own and kind of like form themselves? Or is it just kind of like, let's see what happens and search will be integrated everywhere? I mean, I think search is so ubiquitous that we think of it as a service that should just be accessible to any device or context that someone needs help in, whether you're on your phone and you have a question of what you're looking at, and you're taking a picture of something. Like, Google wants to, we want people to get the value of Google in whatever that context looks like. I personally am a big fan of the kind of saying the best camera is the one in your pocket.

Starting point is 00:17:30 I think it's just like a really apt point. And so like there's lots of cool new things that will probably be created. They'll probably take time. But really there's just the convenience of what people use every day is important. You know, people rely on certain technologies, certain ways of asking questions. Some are their camera. Like they have a camera. They have it in their pocket and they're constantly taking photos.

Starting point is 00:17:50 And so if you can be close to that experience, great. That means that you can be one tap. away from sending your photo to Google, let's say, and asking a question what you're looking at. And I think that's how a lot of people want to interact with technology. And obviously there's an exciting breakout hardware category. That would be something that I think, historically has always wanted to be a part of every new,

Starting point is 00:18:10 exciting new way that people are interacting with technology. Robbie, the number one question we get from viewers of our and listeners of our show is this sounds magical. How does this work though? Help me understand what's happening under the hood. I would like to point that question towards Gemini search and Google search in general, with the AI overviews and everything you've explained so far, can you give us maybe a high level breakdown as to what's happening under the hood?

Starting point is 00:18:37 So at Google, what we've done with AI is we've really trying to create the world's most knowledgeable AI. So one that really is connected to the vast information that obviously sits at Google, but also around the web. It's about connecting and understanding the world. And that's unique. And so we have an opportunity to have an AI that, you know, obviously there's, there's, say, billions of products in the shopping catalog, hundreds of millions of places that businesses are updating with local information every day. There's trillion facts in the knowledge base,

Starting point is 00:19:06 updated all the time. There's information about live finance data, sports, travel information, live prices for flights, all that stuff. Like, we want to be able to make that easily and quickly accessible to anyone. And then you obviously have the vastness of the web that we want to connect you to. And so the models that we've built, there's an AI model that is kind of based on Gemini's foundational model, which is the one that kind of understands, is a large language model that understands, you know, natural language and multimodal questions and is able to generate responses, is able to also understand all that knowledge. And so you can ask a question and what will happen on the back end is the AI model will start to actually generate Google

Starting point is 00:19:45 searches to start researching that. And given the complexity of the question, may actually spend time thinking and reasoning and doing research. And so if you ask a question about what kinds of sunglasses you should get and learning more about polarized versus not and its benefits, there might be dozens of questions connected to that question. And what would happen under the hood is that the model is actually searching. It's issuing a bunch of Google searches as if a person would. And it's potentially using APIs like the shopping graph to do research as well. It would then retrieve all of the relevant information. And then because of all of the search, knowledge and signals that are available in search,

Starting point is 00:20:23 kind of a good understanding of what information is great for a given question. All of that is brought back into the model, and the model reasons about it and generates a response with links to dive in, learn more, potentially buy the thing you're looking for, and continue your journey. And that all happens, you know,

Starting point is 00:20:40 through this AI experience. And so this is largely previewed in this AI overview. So if you just put a hard question right now, you go, how do I get catch up stains out of a light white couch? put that into Google, you're very likely to get a little AI preview at the top, AI overviews. And if you were to expand it and click AI mode, you can have a whole back and forth with that.

Starting point is 00:20:59 And that's the model that's doing that. Blot the excess catch up immediately, Robbie, and apply a solution of coal water and maldish. I knew that was my problem. I love to hang out too long. Robbie, I have to confess something, which is that I'm a bit of a fanboy and I didn't quite realize until recently because you had a stint at Instagram. And you just mentioned a little earlier.

Starting point is 00:21:21 The best camera is the one in your pocket. I love photography. I love taking photos. And particularly, I love Instagram stories because it's my favorite way of sharing content in the world on the internet. And what I found out is that you were the guy responsible for implementing that at Instagram, which was amazing. It's such a great product application and probably the single reason why I still use Instagram. That's great to hear. A lot of people worked on making that successful, but I was very privileged to have a chance to work on that as well.

Starting point is 00:21:46 Yes. Oh, it's such a great feature. So what I kind of wanted to ask you about that is how you think about implementing features like that, kind of as it relates to search, because I imagine this wasn't an original idea. Famously, Snapchat kind of had it first, but then Instagram implemented it what I believe should be much better, and that's the one that I use. And I remember hearing you on another interview, kind of describing the way you reasoned through, which is like on Snapchat, I couldn't upload my own photos. I had to use the inline camera. And that was just a poor experience. But on Instagram, I could upload my own photos that I thought were much more beautiful. and I much prefer that. So I guess I'm kind of wondering the thought process that was behind that and how you apply that to companies like Google, where now you're developing product for this new AI technology. Yeah, interestingly, I feel like there's a lot of similarities

Starting point is 00:22:30 in terms of what to learn about product building from that experience. And I think, you know, the main one is, is that if you have a product that's beloved and used by lots and lots of people, you don't want to dramatically upend that on people because there's a natural, well-worn path that people are traveling every day, billions of times a day. And you don't want to just show up one day

Starting point is 00:22:47 and just have it feel like upside down world. Like that's just not a service to anyone and we'll create a bunch of problems. Now that said, if you're building in a space where your need for your product is directly connected to what people already come for, in this case, information at Google, but people just want to do more for it.

Starting point is 00:23:03 There's a natural opportunity to expand what you can do for people in the way that people came to Instagram to share through photos. And it turns out there was a potentially even better way to do that for friends through stories because it was this low pressure kind of ephemeral format and allowed for you to get a DM and have a fun competition with your friends

Starting point is 00:23:20 and feel connected. And that whole system really worked well. But it didn't replace Instagram. It became an additional way that Instagram could help you. And I think of AI and search in the same way. People come to Google every day for information billions and billions of times a day. And actually people have tried typing in crazy stuff

Starting point is 00:23:38 into search, even before AI existed, really. And before you couldn't really do much, you know, I might even get to the end of the search results page and say we couldn't find anything. But now you can really help with all almost anything. And so that feels like a natural thing to do. But from the same learning, you have to really design for the needs of your user. And so in the same way for search, like, you know, remember when you asked a question about what was happening today? And models

Starting point is 00:23:59 used to be like, oh, I don't know. I was trained up until a year ago or something. It's like, it's just like, like, it always seemed crazy to me that you couldn't get information within 100 milliseconds of what was happening in the world just because of the way technology evolved. But now, you know, this has evolved to be very different, but particularly in Google. it's finding information in near real-time basis across all of our knowledge. So that's something I think we can do uniquely well. Another example is around visual and inspiration. People come to search all the time.

Starting point is 00:24:27 They search for images. They go to image search as a huge search engine in itself. People look for design. They want wallpapers. They want uplighting ideas. They want to redecorate their kids' bedrooms. And they browse for these images. And if you ask AI these kinds of questions,

Starting point is 00:24:42 it'll like describe in text how to design a bedroom, which I always thought was really, weird. And so now with visual AI mode, you can ask, help me design, you know, at Night Daughter's bedroom, looking for ideas, looking for inspiration. It could be about, you know, anything. You could be shopping for fashion, dresses. And the AI mode will actually go find inspirational images and then you can have a multi-turn conversation. You could say, actually, I want, you know, maximalist dark tones and super brooding theme. It will like know what that means and using a lot of our lens and technology for imagery. Go change the whole grid from something airy

Starting point is 00:25:15 and light in California into like this dark lodge vibe and like it knows what that means visually. And I think these are ways that Google, based on what Google users need, can add needs, can add unique value to the world, you know,

Starting point is 00:25:29 versus just trying to implement another kind of general purpose chat bot, which isn't what our intention is. I'm curious to understand what the advantage that is uniquely Google's because to that example, the reason I'm using a virtual background, I have nothing on my walls. I'd love some help getting some assistance on that. And I understand Google's good for that,

Starting point is 00:25:44 but I'd love to kind of, understand why Google is uniquely good to that. Because if I ask another model, if I ask XAI or GROC, they'll actually go and search the internet, which I assume is mostly indexed by Google. Is there a unique advantage to Google being Google versus having to query against Google? Like the unique data set, the unique kind of profile and indexing that Google does, that separates you from a lot of other companies in the same space? Yeah, I mean, there's a bunch of things that I think allow us to be really uniquely helpful in these cases. I mean, I think One is just in the technology and the inputs itself.

Starting point is 00:26:15 Like there's been many, many, many years of building multimodal capabilities for image recognition and visual understanding. So our models are able to segment the background of your experience. Put attention on the correct parts of the object. If you're to say like, hey, like what's up with the, you know, I want like the little tree behind me on the ledge or a little, you know, plant. Like, well, what's a ledge, you know, and what is behind you mean? And what is the bottom shelf versus the middle shelf? How does the model know which part of the shelf to look at? Our models really understand that really well and uniquely well.

Starting point is 00:26:49 Then once you select that region, now you might say, hey, replace that plant. Like, I want a better one. Okay, well, what visual imagery have other people clicked on and used and what has been inspirational and helpful for those people in those journeys, whereas I could probably find you like a janky plant that technically is a plant on the web, but without ranking or understanding if people found that useful in the past for other plant searches, You know, you might not know that this is actually a really helpful plant that lots of people have found and clicked on and enjoyed when looking for, you know, office decor plants, right?

Starting point is 00:27:22 Which is something that, you know, I think Google might more intuitively be able to offer, given the people that come to Google to search for these kinds of things. I want to shift the conversation to adds the monetization model because in my mind, this breaks completely when everyone has an AI agent that represents them on the, internet that does all their shopping for them, that has access to their wallet, spends everything for them. How does this mental model, or I guess business model, break in Google search? If you're not pitching adverts to human eyeballs and trying to get their attention, how does it work with AI agents? Yeah, I mean, I think there's a lot of unknowns here. This is a very kind of fast-moving space. But I think one thing to mention, I think I mentioned before, is that, you know, people are still doing at scale the kinds of questions that they're doing. And so I think,

Starting point is 00:28:12 view this as, you know, right now, people's ability to do more with agents. And so this feels like, you know, you can't spend an hour, let's say. Like, I was recently trying to find out, like, I was looking at buying a safe. Like, I have some documents that my bank closed my safety deposit box and say they don't offer that anymore. I was like, that sucks. I probably should put all these in a safe somewhere, right? And it's kind of annoying to go to the bank. Too much information, but this is the story. So it's actually really complicated to buy a safe. There's like all, like, I So I used our deep research product, a deep search, and it looked at like hundreds and hundreds of various places, and it created this incredible guide to safes. And it was like, there's different things around moisture, different implications on insurance.

Starting point is 00:28:55 Like, I would have never spent time doing that. But now that I did it, it has all of these links. It has reviews I've read. It has opportunities theoretically for me to go buy those safes in ways that I probably would just put this chore off indefinitely and like never do. and those all create new opportunities, not just for discovery, but for monetization and other things down the road. And then obviously,

Starting point is 00:29:17 if you're talking about agentic tasks where you never need to show anything to the user, theoretically, I don't know, on some infinite timeline, a model knows me so well, a safe would just show up in my house that's like perfect somehow.

Starting point is 00:29:30 I don't know if I totally believe that that's ever true, but let's say it is. I mean, I think things will just evolve in ways we don't totally understand. Yeah, it sounds like the shopping experience actually becomes really, And so we delve more into knowing what you want, whether it's purchasing a safe or buying a new shirt for an occasion that's coming up.

Starting point is 00:29:48 And it feeds Google kind of like this additional information. Do you see the ad model evolving in any way or kind of staying where it is right now? I think the ad model is definitely going to evolve because the format evolves. And typically if history repeats itself, you know, ads and is information. And it's actually really helpful information and content. and it's also a way for people to discover new businesses and services. And so when there was a shift to mobile, there's a new set of formats that came up for mobile. When there's a shift to video and short form video, there's a new type of ad formats for videos

Starting point is 00:30:20 and they're taller and they're more authentic and there are people talking about products and it feels great. And so I think in the AI world, we'll see something similar. And in an agentic world, you might see something similar again. And it'll feel more natural to the format of, hey, like, you're just kind of talking and here's some information. And by the way, hey, here's maybe another, here's a deal you might want to know about, which I think you're starting to see some experiments around. But I mean, we have to address the elephant in the room, which is like this is a lot of power for Google to hold, right? So like how do you think about treading that line behind like, you know, responding to our user prompt in a way that's helpful and factual with also kind of like giving sponsored content or a sponsored product embedded into that response? Yeah, I mean, I think this is something Google's we've had to do for most of the existence of Google.

Starting point is 00:31:10 I mean, people already come to Google for these kinds of tasks, and there's ads, you know, and, you know, on a page with results as well. And I think the principles stay the same. It's like, one, we have an honest results policy. Like, ads will not affect the core experience of anything you see. So in AI, it's no different. Like, it will not affect ranking. An advertiser can't change, you know, the organic reply of what the AI is recommending you.

Starting point is 00:31:32 And now it doesn't mean you can't insert opportunities to discover new things. things, but then those things need to be labeled really transparently to show the user, hey, this is something that you might want to know about. This is an advertisement in the same way, that exact way that it works today on search. So I think the principles don't change. You just, you have to kind of rethink them foundationally every time there's a major move in how people consume information. I think you saw that with video, we saw that with mobile, and I think people will see that again in this kind of more conversational based paradigm that we're seeing. Do you have any ideas of what that form factor is or how that exists? Because I guess the perception

Starting point is 00:32:05 is that if these AI overviews trigger fewer clicks for some of the publishers, then surely there needs to be some sort of monetization or controls. Do you have any ideas that you're considering implementing of things that should exist? I mean, I think we're running experiments right now. I don't think anyone, I think this is a learning exercise. But the principles are similar, which is that if you search for information, you should be able to be able to find it, go deeper, and have control and transparency over what you're seeing.

Starting point is 00:32:32 And I think we've started to experiment, particularly with, different AI surfaces. So with AI overviews in AI mode, there's experiments now with advertising across those experiences to learn about what could work well there, but I'd say those are still in the earlier days. And then I think ultimately what we find is that if you

Starting point is 00:32:49 search and you see an AI overviews, you know, those pages, you know, largely monetize very similarly to ones that don't have AI in them. And so you kind of like get to a point where you're searching for something and once you're kind of down the funnel of like I'm looking for this catch up removal,

Starting point is 00:33:04 thing and I just want to know, I just need to know how to blot it in this way. Like, turns out I was very unlikely to probably want to go buy a product in that moment. I probably just wanted to like know how to deal with this in the next two seconds because I have like a, I've got an active situation on the couch that I need to deal with. And so you kind of learn, you'll also learn what the moments are that are going to be most helpful for people to discover new things and from there. The other thing I'll just say overall is that you mentioned links and how, how to encourage and understand how people can discover new things.

Starting point is 00:33:34 discover the web. This is absolutely essential and something that we take is like a foundational design principle to everything we do. I think Google and search, you know, care more about the web than arguably any, any, any company, any product out there. And so one thing our models do uniquely is they actually are using and understand all these search signals. So they know for a given question what websites are really useful. And so you see, you know, our approach here is not only to provide helpful links alongside, but also to embed them. So as you're reading, you can click and go deeper for anything that you see. And what we're finding is that people do click and they indeed want to go deeper.

Starting point is 00:34:10 The paradigm's kind of changing where people want context first. They kind of want to get a sense, adjust of things, and then they want to click in. So say, I'm trying to get a credit card or buy a mattress. Like, I ultimately probably want to read what the experts are going to say about something and read a whole article. But I'm going to get a little bit of superficial information first. And then I'm going to go read. Let's say there's, I'm going to read what people say probably, like on some social media

Starting point is 00:34:33 threads. I'm going to read what experts say, like who are paid professionals who spend a lot of time analyzing this stuff, and then I'm going to make, and then I'm going to purchase. And that's what we see. But I think our job is to make those connections possible. And our hope is that AI, because it's incremental largely to what we see in search, there are new opportunities to connect you to new services and new websites that you wouldn't have found because the AI is also doing broader searching than what you would do. And so the hope is that you also can promote discovery long term as well. Okay. And then, yeah, that gets to the positive sum pie where there's just a significant more search and curiosity and I think a lot of people perceive as it becomes easier to unlock the answers to that curiosity. So there's one question that I had.

Starting point is 00:35:15 We actually, we had Logan Kilpatrick on the show, who is part of the Gemini team. Amazing guest, amazing episode. I suggest everyone go find that and listen to it after you're done with this one. But what it unlocked for me was kind of the behind the scenes look at what that team does, what the Gemini team does at Google. And it kind of, it reminds me of this Manhattan project type thing where there's this small subset of people working on this really intelligent AI, but there's a slight disconnect where it's kind of under a separate thing and it interfaces with Google and search.

Starting point is 00:35:42 So there's Google, but there's also Gemini. And I guess what I'm curious is, is kind of what that relationship is like between Google and the Gemini team and how you guys kind of work to integrate these products together because there's the Gemini's AI Studio and then there's Google Search.

Starting point is 00:35:56 So kind of what's a good way for myself and for people who are listening to kind of compartmentalize and see where the synergies lie between those two entities? Yeah, I mean, we work incredibly closely with the kind of Google Deep Mine Gemini teams. It's the way to think about it is there's these foundational models

Starting point is 00:36:11 that are increasingly able to understand any question and help find information about it or generate information about it. And there's people working on the frontier of what that looks like in many ways. And then those are many of those folks. But then how that is brought to life with products people use every day in love

Starting point is 00:36:28 is really around the product teams. And so Google Search obviously is one of the largest, not the largest, you know, ways that people interact with AI even today. And we work extremely closely and can kind of think of it as help helping really push the frontier, particularly for how models are using information, and we'll work closely to bring those models into search and customize them and make them work

Starting point is 00:36:52 really well for all the things we just talked about. People are kind of get bedroom inspiration. They're taking photos of things. They're asking about closing times. And what will happen is the modeling team will think about it more as capabilities. Like let's say you want the capability for the model to use a tool because or to have reasoning so that the model can think a little bit more. So someone will go research that and add that capability.

Starting point is 00:37:15 And then, you know, from a search perspective, the tools that it's using is something like finance. And so it can make a real-time query to look up financial information. So if you right now, if you ask about any two stocks, you say like, you go to AI mode and you say compare, you know, last six months of these name two stocks, put them in there. It'll actually use Google Finance as a tool and make a request for live information

Starting point is 00:37:38 and historical data, and then it'll plot that information in the ground. Now, that uses Gemini as a model, so it has this foundational ability to understand what the question is that you were asking, but then it has the search ability to use all of these search tools, which is really cool,

Starting point is 00:37:56 and then it can generate that kind of a response for you. Can you just a follow-up question on that? Can you walk us through what that was like integrating Google search as it was before with an LLM? Like, presumably there was like some frequent that you ran up against, whether it was combining data sets and stuff,

Starting point is 00:38:11 I'm curious what that looked like. I don't think there was too much friction per se. I mean, I think the main thing is when you're adding a model into the mix, it has to just be done in a very specific way because models have different tendencies in terms of how it responds to information than kind of the otherwise, the search stack

Starting point is 00:38:30 that has been built out. But they work together now, you know, harmoniously in the search environment. And so some questions kind of produce these AI responses and other ones kind of have different AI AI enhancements. But I'm not sure if there was a more specific kind of tension you were alluding to. No, no. I was just curious whether there was massive kind of uplift,

Starting point is 00:38:49 whether that was developmental or on kind of like taste making for users that you kind of like ran up against. But it sounds like there wasn't much. Another kind of wild card question that I had, Robbie, is I like the personalization of using Google search. One that I wasn't even aware of was if I would type in someone, someone's name, for example, a celebrity, and it would come up with their net worth or something. And I would show my friends, I would say, see, this is the most search thing on Google.

Starting point is 00:39:16 And they would be like, well, not quite. Like, it's kind of looking at the cookies that you've searched before. And maybe it's kind of like adjusting its preferences to what you might want to search. On that theme, how far down the rabbit hole can we go when it comes to personalization of Google AI search for someone, right? Like, is there a world where I'm not just kind of like discovering new websites or apps or experiences on the internet, but it is highly tailored and personalized to other datasets that are kind of like unique to me, right? It knows my shopping preferences, kind of

Starting point is 00:39:48 knows how much money I have in the bank. It kind of knows that I have an event coming up in a couple of weeks' time. How does that world look for you? Yeah, I mean, I think, first of all, there's Google overall and then there's the questions that people typically use for AI. I think for Google overall, there's plenty of questions out there that aren't great to be personalized. And so we think about a lot the differential value in being personalized. If you ask about how tall the, you know, empire state building is, like it's kind of just like a factual piece of information. Maybe you want to, maybe there's some personalization on the source, like if you really like certain facts or something from specific providers. But, you know, many things are not great.

Starting point is 00:40:21 And I think that there's a value in just having this kind of universal place you go to seeing what things are showing up for a given question. And that said, I think many questions, it's the opposite. It's almost weird not to personalize it. Like if you say like what kind of genes should I get, it's like, well, I don't know. Like what what kind of genes would you like for me that's going to be super different than, you know, some other, if I were to grab a random person in the world and try to get them some genes, right? And so, and I think it turns out that in these AI experiences, like AI overviews in AI mode, people have a lot more questions that are these kinds of advice seeking recommendations questions.

Starting point is 00:41:00 They want to know where to eat for dinner. They want to know where to travel with their family. They want to know. It's kind of in this more subjective camp. It kind of depends. And so we think there's a huge opportunity for our AI to know you better and then to be uniquely helpful because of that knowledge. And, you know, one of the things we talked about at I.O.

Starting point is 00:41:17 is how the AI can get a better understanding of you through connected services like Gmail. So that over time, it could really know, wow, like you tend to like these kinds of products or brands. Here's another one that just came out from them. And how much more useful that would be than just generically showing you. Like, I don't know if just whatever the top 10 selling gene brands are right now, generically, a list of that and how you would, how helpful that would be.

Starting point is 00:41:41 And so that is, I think, very much the vision of building something that can be really knowledgeable for you specifically. But I think there's a nuance there in what you personalize. And I think our thought on this is that the user is always seeing like what parts of information are being shown because of your interests or purchases or things that you've done in the past things that it might think you might like versus things that it's just suggesting generally. I think people want to intuitively understand when they're being personalized, when information's made for them, versus when something that everyone would see if they were to ask this question.

Starting point is 00:42:15 And this is kind of the wisdom of the crowds, you know, so to speak, represented in a Google search page. As you get closer to the end of the show, I love ending on a more optimistic note. So I want to talk about the future vision, the future that you're excited about as someone who builds products that billions of people use. So it was funny, just today, I saw a notification that Jim and I rolled out on my TV, and now I have it on my car or on my phone. And then I have it like on some Google watches and some earbuds. And what about cars? What about things that are sitting on my tabletop? What I'm curious to ask you is like what does the ideal day for you as a product designer where your product slots into this look like for someone when search is kind of everywhere and rarely typed? Like what does the final form look like? And how does that change the way people go about their day to day lives? Yeah, I think that really it's not a final form. It's more of like a multi-form. I don't know if that's a word, but that's kind of how I think about it. Understood. Yeah. It's kind of like there's not a single form. I think that it's adaptive. And so the way that you think about it is I love thinking about these journeys where they're multi-day people of needs that are kind of like pending and they're kind of working on them over time. You know, like I think about a lot people like buying a couch for their apartment, let's say. And it's not just like an easy thing. Like you might be on your computer. You're doing some research and you want to just find cool couches for like apartment in New York, let's say. If you use AI mode today and you ask about it, you'll see a visual grid of couches.

Starting point is 00:43:35 And you might actually click on a few of those and it might recommend you some things that you might like. Okay, great. You kind of like, oh, those are kind of cool. I'm going to think about it. Then you're going for a walk and you walk by, you know, like a furniture store

Starting point is 00:43:47 and you see something that strikes you out of a friend's house. And then you go to your app, you go back to your thread. You upload a photo of it and you're like, oh, actually this is the thing that's super awesome. Like I actually wanted, this is like stuff that's like this. Great. Here's some things like that.

Starting point is 00:44:01 All right. Put that away another time. Blah, blah, blah, blah. Then let's say you're driving and you like pop into your head that you really actually realize that this other color was one you wanted. You go live and you say, hey, remember I was kind of talking about couches? I actually really like this color and I'm mostly focused on these two colors.

Starting point is 00:44:17 Send me some recommendations for those. Great, got it. Boom. And then it does that. And then a few days after that, maybe you get a push alert. That's like there's a deal where one of the ones you were considering in the exact way has that. And there's like a sale. I don't know.

Starting point is 00:44:32 It's a what, Cyber Monday coming up. A Black Friday, all of the above, big shopping season. Maybe one of the things you were loving was available. These are all these ways that Google now across modes, across kind of different aspects of your life, being incredibly helpful to you for this need. And I think that's more of how I think of the future of search than any one specific feature or kind of single form factor.

Starting point is 00:44:57 Well, we are just about coming to the top of the hour. Robbie, thank you so much for taking time out of your busy day to chat with us. That was a fascinating conversation. I think a lot of people in life just kind of take for granted what Google search has brought for them and kind of like

Starting point is 00:45:13 the way that this thing evolves and the way that it kind of like permeates every facet of our life, especially when it goes fully multimodal. It's super important to understand. So thank you for taking us through that journey. Limeless listeners, if you enjoyed this show,

Starting point is 00:45:27 please give it a thumbs up. We know that a bunch of you aren't subscribed to us. So we need you to get on that button and kind of focus on that, please. If you enjoy the show and if you enjoyed any of the episodes that you've listened to so far this week, please give us a five-star rating

Starting point is 00:45:41 and we will see you on the next one. Robbie, thank you again for joining us. Thank you for having me.

Limitless: An AI Podcast - How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.