Limitless Podcast - How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein

Starting point is 00:00:03 Since seemingly the beginning of time, search has come in the form of Google and their glorious 10 blue links, ask any question in the world, and the search engine goes off aggregating the world's knowledge and presenting it to you nearly instantly, what can only really be described as magic. But now the interface is shifting. It's shifting from this black box to multimodality and things like Google Lens, where you now have sensors that engage with the real world, and can now give you search in many dimensions, voice, audio, visual, all of the things. And I've been curious about this. I know you've been really curious about this.

Starting point is 00:00:33 And thankfully, another person has been really interested in this, and this is Robbie Stein, who is now on the show. Welcome, Robbie. Robbie is the VP of product at Google Search. So you seem like the absolute perfect person to discuss this topic with us. And I think the place I want to start is, what does the new version of 10 Blue Links look like? Is search going to continue to be the same? And kind of what is the plan for Google Search moving forward in a world of AI? Yeah, well, first of all, thanks for having me on the show. Really fun to be here and to have this conversation. I think ultimately in this AI moment, it truly is expansionary. We talk about the fact that it's doing more for people than ever before. So when you think about what the future of search looks like, it actually starts from what it is today.

Starting point is 00:01:12 Because the everyday need of someone grabbing a quick phone number, being able to pay a bill online, find a direct link somewhere. It doesn't really go away, but it's that Google search can do so much more. So I think what you're finding is that the search experience has enhanced and it's become a lot more powerful. And now there's AI experiences that start to show up. It's a little preview at the top of the page where AI could be helpful. You can imagine a world that someone asks a really hard question. You get more and more of those. And by tapping into there, you're now having this more conversational version of CERC.

Starting point is 00:01:42 So the first big theme is that you can go and have an AI-driven experience and one that's a back and forth. You have refinement. You have follow questions. You have more curiosity. And that's a different paradigm. But it really makes sense for a specific kind of question. You're just kind of browsing, you don't really know what you're trying to ask. Actually, a lot of the core search experience is really optimized for that kind of experience.

Starting point is 00:02:03 And that might be the best, which is why you don't see AI for every single query. But I think the first big change you're going to observe is that conversationality. I think the second is multimodal. I think in the past, AI and search in general has been more constrained to a web experience that feels more like a web page. You type text in, you get a page back. But increasingly, it's just this kind of ambient, knowledgeable AI that imagine something that has been encoded to understand as much of the world's information as possible,

Starting point is 00:02:35 as much of the web and what parts of the web are most helpful for every single given question, and it's in this brain. You can talk to it. One way to do it is through text, through the web interface I just mentioned, and you can have a whole conversation. The other way, it could be live. So in our apps right now, you can have a live conversation while you're driving with the exact same model

Starting point is 00:02:52 and just start learning about the world and just talk to you. It's kind of unbelievable. But the other one is images. And we see a 70% increase in the amount of visual searching year over year on Google. It's actually one of the fastest ways people are searching. Because people just want to take their camera out and ask questions about what they're seeing or ask questions about what's on their device.

Starting point is 00:03:09 So you kind of move from this world of a page to a world where you can just ask about what you're observing. And it can kind of fit the form factor that you need given your life, whether you're going for a walk, you're driving, or you're just on your couch on your phone. Google's there with you. I love this description of it's kind of like this ambient brain that's trying to break through the confines of whatever medium it's been chained to. And originally, you know, it's this 10 links, you know, the homepage of Google, the search ranking. This is something that's pretty nostalgic to me, right? And so if it's the doorstep to the internet, anything that evolves with that is super important.

Starting point is 00:03:46 And one thing I've been asking myself is, as AI has become popular through LLMs, through chat, GPD, through Google Gemini, seeing this search page evolve and get to where it is today, I would love to understand the timeline and decision points that made us or made you get to that final medium. So the way I think about it is we've started with the search engine and we get a ranked search page of 10 blue links, right, ranked from most popular to least popular. And then we had Google Gemini appear, this LLM chat interface. And then this LLM chat interface got access to web search. And then we had Google search powered by Gemini, where we started seeing more of these. AI overviews, and now we even have like Google agents that are doing autonomous work behind the

Starting point is 00:04:29 scenes. Can you briefly walk us through this timeline and decision points of how we got here? Yeah, I mean, I think that there's a couple of pieces in between actually that are helpful to clarify. I mean, I think Google a long time ago had, you know, blue links as this prototypical search page that everyone knows and loves. But actually, it's evolved a lot over the years since then. We think about universal search and how if you ask for local information, maps information shows up, or you ask about visual questions. You get a universal for image search, and you'll see related images.

Starting point is 00:04:58 Or you've asked about trending information. You'll get a top stories unit with articles at the top, right? And then you also have very specific questions someone might ask where we would feature one web result with an extra large snippet at the top of the page. That might be like, hey, how many? I don't know, you've got a very specific question that's lots of people ask.

Starting point is 00:05:18 And we just kind of highlight that. So each of those, I think, have been fairly large evolutions to the experience. And I think off the back of that is where AI overviews came. Because we had these rich experiences. You asked about whether you get weather information, right, at the top of the page, where right at the top, you could get more specific information when you had a specific question. And so a natural extension of that is how AI could unlock the ability to really ask these harder, longer questions with natural language.

Starting point is 00:05:44 So you could ask a very specific question. Even if there's not one specific web page that has that information, you could provide this AI overview. And so that was the first big move in evolution of what we saw. And the next one was, okay, well, once you had that, we actually observed that people were trying to get it to trigger more and show up. So people would actually put the word AI in the search box. And so, and then the other thing was they weren't done there. They wanted to ask follow-up questions, and there was no easy way to do that. And so really that led to this idea around AI mode, which is a way to have this conversational experience with search. And it's increasingly been, you know, easy to get to from AI overview. So you

Starting point is 00:06:18 search. You get the little AI preview at the top. You expand it and you can go deeper in AI mode and have a poll. You could have 10, 20 back and forth conversation with Google now. And so all of that was kind of the main arc of how it's evolved. And then I think it just kind of ascends this kind of curve of complexity and and kind of user need. We're like, okay, what's the next thing the user's trying to do? You just keep asking that question and how else we can be helpful. So for example, let's say you want to find a restaurant for date night in San Francisco this Friday at 7 o'clock specifically. Like it's possible that you could put that into Google and we get an AI overview and you would go back and forth with AI mode until you, you know, like got to some list of great restaurants. But then you're not going to Google just to figure out a restaurant.

Starting point is 00:07:04 You actually want the reservation is the ultimate end to that journey. And so your next question is, okay, well, what could we do agentically to actually book that table for you or help, you know, we actually implement. now in his live, a way for an agent to browse talk and open table and the web for reservations and then bring back in the list, not only great restaurants that use, you know, analysis and reasoning to present that list for you, but then in that list, times that the table's available, which is really, really magical. And so hopefully that tells a little bit of the story of how we went, how kind of evolved the experience from one that felt like a page to one that felt like an interactive AI conversation and one that could increasingly do things for you and with you.

Starting point is 00:07:48 That interactive conversation is exactly what I think this new world looks like, but I'm kind of struggling to get my head around what that mental leap looks like, right? So I'm used to search. I'm used to tapping keys on my keyboard, typing words, and getting the kind of information that I want, and then kind of scrolling, clicking, and getting to an action. What does the multimodal version of this look like? And maybe we could even do it in the form of like a mental model. So like if classic search was kind of like type to links,

Starting point is 00:08:19 then what is it when my camera, voice, screenshots, and TV are kind of like the way you interface. So like what does that analogy look like? Yeah. So I guess the first thing to say is I don't think people want to dramatically change their mental model of Google. They want to basically think, and this is my personal view, and particularly as a user, how I feel,

Starting point is 00:08:38 is in general you kind of want to just say, I have a question. I want to ask Google this question. You put it into Google. And right now, the main way that happens is text, but is very quickly growing to be multimodal, visual voice, et cetera. But they're all just different modes of the same root need, which is, I have a question.

Starting point is 00:08:55 It could be a question about something you're looking at. You could take a screenshot on social media and say, I wonder what this outfit is and how I could buy it. Could be a tree you're walking past. You take a picture of you want to know what that tree is. But it's a question, you know, nevertheless. And so you take your question, you put it into Google. And then you kind of decide,

Starting point is 00:09:12 What's the easiest way to do it? Well, if you're on your computer on your desktop, or maybe you have a homework question, you'd copy and paste it, or you'd ask a question, or you'd just type it. If you're out on the about, on your phone,

Starting point is 00:09:21 and you'd take a picture. You happen to be on an Android device and you're looking on your phone, you use Circle to Search. People are already doing this at enormous scale. It's like a billion people a month using lens, using these products where you can take a photo.

Starting point is 00:09:33 So I think this is a pretty commonly understood pattern at this point. And then it just goes to Google. And from the user's perspective, they don't really want, I think, to be bothered, but too much more than that. They just want to get their question into Google.

Starting point is 00:09:44 And then Google should do the work to give you the best possible information. And so if you ask a question that's really basic and you want to browse about it, like let's say there's a new starting quarterback you've never heard of, you put the person's name in. Like sure, maybe you want a quick description, but you actually want to browse typically that people search like that. You want to see photos of the person, quick knowledge panel. You want to see their like recent posts on social media.

Starting point is 00:10:08 You know, you want to see articles that have written about them. in many cases, an AI response is actually not great for that kind of a question that's like a browsier kind of need. But if you ask a really specific question, boom, you're going to get AI right at the top with an overview and a way to dive in and have them back and forth. And so you kind of rely on Google to always give you the best format given the question. And if it's predictable, then you kind of know, okay, if you ask for something inspirational and imagery, you get images, right?

Starting point is 00:10:34 You get visual stuff. If you ask a specific question about knowledge, you get AI. If you ask a question about a person, you see photos. of that person you get his description of people. So that's how I think about it. And I think what we hear from users. So you're talking about the form factor a lot, which something I'm interested in,

Starting point is 00:10:48 because I think a lot of the times when people talk about it, ourselves included of AI, we kind of imagine, well, there's a chat bot, or in terms of search, there's a chat bot, and then there's a text box. And there's not really much in between. And I guess what I'm curious about is the way you see the final form of this kind of evolving over time, because you describe a text box with Google,

Starting point is 00:11:08 and then we have the multi-modality with cameras, And when I hear this, it kind of reminds me of the early computer where we kind of started with just text-based stuff, which was better than punch cards. And then we evolved to a graphical user interface. And then we evolved to web browsers. And there was this kind of natural evolution that wasn't obvious, but in hindsight, it was so clear. And I'm curious if you have any ideas on where that evolution goes from here because I know

Starting point is 00:11:28 you pioneered Google lens, which is amazing. I use the product all the time. It's my favorite way of engaging with the world because it just feels like this magical wand that I could look at anything and get answers from. And I'm curious where you see that kind of progressing to over time. Yeah, well, I mean, you were talking, I mean, earlier you asked me about what's the pattern, like what's the mental model? And I guess in hearing you re-articulate it, the most succinct way of saying it is it's the mental model of conversation. And so if you think about, you know, when you have a conversation with something or a person, then you have a text field, you have a way to upload photos. You can go live with that person and see them right now, right? You can have like a live video feed, you know, chat with that person. They're all just different modes of accessing that person. And I do think. think that technology is moving in a direction where it's as simple to communicate with as talking to a person. I mean, it accepts all of the modes and all the ways of discussing with

Starting point is 00:12:17 the person too. Now, it just turns out that most people for most needs don't want need to have a long conversation. Like they're kind of single shot moments in time. It's like you're sitting at your computer. You're like, oh, I just totally forgot. I need to call this place. You just Google really quick phone number, whatever. Call them. Right. And so you don't need, I don't need to have like a 20 minute conversation with the thing. But if you think about in that little moment, you were kind of communicating with Google, you're expressing a need, you're getting information back. And so I think increasingly that is the right way to think about it from a mental model perspective. And so if you think about where things are going, you can kind of just articulate, you know, what else, at least for Google,

Starting point is 00:12:58 we think a lot about informational needs. So like what other kind of informational task that we could do to be helpful for you could be new modes, like if new modes come up, that could be useful. But it could also be doing more for you given a certain question. The same way you might text someone to help you with something and say, hey, can you do me a favor and scan all this stuff and tell me what I should know about it? Or, you know, can you, you know, book this event? I'm trying to get tickets to this game and it's complicated. You know, you might even hire like someone to help you in some cases get the perfect ticket for a Super Bowl game or some, you know, hard to get ticket. You know, maybe an agent could go do that work for you. And on the back end, it's almost like getting texting a person and the person

Starting point is 00:13:37 and getting back to you saying, give me like a few minutes. I'm going to look into this and then they give you, you know, some result. Actually, these two tickets are available and they're next to each other and they're at this price point, except that the AI can do things that, you know, most people, you know, couldn't pay for or would be really hard for a person to go do because it would require you to search like hundreds of things to get back to you. So I think that's kind of a helpful way of thinking about it. To that last point, the searching hundreds of things, we asked our audience what they wanted

Starting point is 00:14:01 to ask you. And one of the questions is, how is Google going to survive when the perceived thing is that well, AI is kind of cannibalizing search and one has an ad model, one doesn't. And I kind of want to ask you, because I have a feeling what you're going to say, I want you to help me falsify this zero thumb theory or zero sum theory as it relates to chat bots. And kind of like, I know I've heard you say and I've heard just Google as a whole say that AI overviews and AI as a whole, it kind of makes people search more. And in a way we're seeing Jevin's paradox where as there becomes more data, there becomes more of an inkling to search more to create more queries. And I kind of want you to help help us understand. And that phenomenon, I guess, where in the case that there is AI, well, there's actually a lot more search happening. The number of searches don't go down. They actually go up. Yeah. So I think AI is an expansionary thing.

Starting point is 00:14:47 It's like people had all these questions and they could ask so much more, but they didn't because there were limitations. And so I think the best example of that is something like lens and with AI, which is the best way, right? So you can now take a picture of your bookshelf and say, I've given these books, what should I read next? or you could take a screenshot of, you know, a celebrity outfit or something and say, where could I buy this jacket or something? It's possible someone would try to put that kind of a question into Google 10 years ago plus, but it would be pretty hard to do that. In the same way, I would be pretty hard for someone to type in like a 20-sentence question.

Starting point is 00:15:24 It's like, I'm going on this trip. I have a kid. The kid has this allergy. They're like, this need. I have to go to a hotel. Hotel needs to be far. You just like couldn't do that. And so people would just kind of give up.

Starting point is 00:15:34 not do it. And so you get growth when you're unlocking new needs for people, and that's what we're seeing. So I think the best way I can summarize it is the everyday need I just mentioned for people getting fast, efficient information from search isn't really changing. But now you can ask technology so many more questions, and that's where the growth is coming from. You know, and we talked about this, like, you know, AI overviews, you know, we're seeing growth. You know, we talk about where AI overview shows up. It's around when people have a more specific question, that they put like a longer, There's more specific question. They put that into Google.

Starting point is 00:16:05 They get this AI response. Those kinds of questions are up about 10% in large markets like the U.S., which at Google scale is a pretty enormous kind of growth number year over year. And then these kind of visual searches I mentioned are up 70% year over year. So huge growth in these areas where the market's expanding the fastest, and that's exactly what we're seeing. But obviously, the pie is growing, and there's lots and lots of opportunities for people to get information from lots of people.

Starting point is 00:16:30 And that's exciting. I'm curious. How much importance do you place on consumer hardware devices when you're kind of thinking about building out this vision? You know, Meta's been attempting to, you know, revamp the form factor with glasses. Open AI is rumored to be building their own consumer device. Goodness knows what that looks like. And, you know, obviously Google values their partnership with Apple being the primary search function through Safari. Is this something you consider at all? Is this something that Google kind of wants to own and kind of like form themselves? or is it just kind of like, let's see what happens and search will be integrated everywhere? I mean, I think search is so ubiquitous that we think of it as a service that should just be accessible to any device or context that someone needs help in,

Starting point is 00:17:14 whether you're on your phone and you have a question on what you're looking at, I'm taking a picture of something. Like, Google wants to, we want people to get the value of Google in whatever that context looks like. I personally am a big fan of the kind of saying, the best camera is the one in your pocket.

Starting point is 00:17:30 I think it's just like a really, apt point. And so like there's lots of cool new things that will probably be created. They'll probably take time. But really, there's just the convenience of what people use every day is important. You know, people rely on certain technologies, certain ways of asking questions. Some are their camera. Like they have a camera. They have it in their pocket and they're constantly taking photos. And so if you can be close to that experience, great. That means that you can be one tap away from sending your photo to Google, let's say, and asking a question what you're looking at. And I think that's how a lot of people want to interact with technology.

Starting point is 00:18:02 And obviously there's an exciting breakout hardware category. That would be something that I think searches historically has always wanted to be a part of every new, exciting new way that people are interacting with technology. Robbie, the number one question we get from viewers of our and listeners of our show is this sounds magical. How does this work, though? Help me understand what's happening under the hood. I would like to point that question towards Gemini's search and Google.

Starting point is 00:18:30 search in general with the AI overviews and everything you've explained so far, can you give us maybe a high level breakdown as to what's happening under the hood? So at Google, what we've done with AI is we've really trying to create the world's most knowledgeable AI. So one that really is connected to the vast information that obviously sits at Google, but also around the web. It's about connecting and understanding the world. And that's unique. And so we have an opportunity to have an AI that, you know, obviously there's, there's, say, billions of products in the shopping catalog, hundreds of millions of places that businesses are updating with local information every day. There's trillion facts in the knowledge base updated all the time.

Starting point is 00:19:08 There's information about live finance data, sports, travel information, live prices for flights, all that stuff. Like, we want to be able to make that easily and quickly accessible to anyone. And then you obviously have the vastness of the web that we want to connect you to. And so the models that we've built, there's an AI model that, is kind of based on Gemini's foundational model, which is the one that kind of understands, is a large language model that understands,

Starting point is 00:19:31 you know, natural language and multimodal questions and he's able to generate responses, is able to also understand all that knowledge. And so you can ask a question, and what will happen on the back end is the AI model will start to actually generate Google searches to start researching that. And given the complexity of the question,

Starting point is 00:19:49 may actually spend time thinking and reasoning and doing research. And so if you ask a question about, what kinds of sunglasses you should get and learning more about polarized versus not and its benefits, there might be dozens of questions connected to that question. And what would happen under the hood is that the model is actually searching.

Starting point is 00:20:07 It's issuing a bunch of Google searches as if a person would. And it's potentially using APIs like the shopping graph to do research as well. It would then retrieve all of the relevant information and then because of all of the search knowledge and signals that are available in search, kind of a good understanding of what information

Starting point is 00:20:24 is great for a given question, all of that is brought back into the model, and the model reasons about it and generates a response with links to dive in, learn more, potentially buy the thing you're looking for, and continue your journey. And that all happens, you know,

Starting point is 00:20:40 through this AI experience. And so this is largely previewed in this AI overview. So if you just put a hard question, right now, you go, how do I get catch-up stains out of a light white couch? Put that into Google.

Starting point is 00:20:51 You're very likely to get a little AI preview at the top, AI overviews, and if you were to expand it and click AI mode, you can have a whole back and forth with that. And that's the model that's doing that. Blot the excess catch up immediately, Robbie, and apply a solution of coal water and moldish. I knew that was my problem. I love to confess something, which is that I'm a bit of a fanboy and I didn't quite realize until recently because you had a stint at Instagram. And you just mentioned a little earlier, the best camera is the one in your pocket. I love photography. I

Starting point is 00:21:24 I love taking photos, and particularly I love Instagram stories because it's my favorite way of sharing content in the world on the internet. And what I found out is that you were the guy responsible for implementing that at Instagram, which was amazing. It's such a great product application and probably the single reason why I still use Instagram. That's great to hear. A lot of people worked on making that successful, but I was very privileged to have a chance to work on that as well.

Starting point is 00:21:46 Yes. Oh, it's such a great feature. So what I kind of wanted to ask you about that is how you think about implementing features like that, kind of as it relates to search, because I imagine this wasn't an original idea. Famously, Snapchat kind of had it first, but then Instagram implemented it what I believe to be much better, and that's the one that I use. And I remember hearing you on another interview kind of describing the way you reasoned through, which is like, on Snapchat, I couldn't upload my own photos. I had to use the inline camera. And that was just a poor experience.

Starting point is 00:22:15 But on Instagram, I could upload my own photos that I thought were much more beautiful, and I much prefer that. So I guess I'm kind of wondering the thought process that was behind that and how you apply that to companies like Google, where now you're developing product for this new AI technology. Yeah, interestingly, I feel like there's a lot of similarities in terms of what to learn about product building from that experience. And I think, you know, the main one is, is that if you have a product that's beloved and used by lots and lots of people, you don't want to dramatically upend that on people because there's a natural, well-worn path that people are traveling every day, billions of times a day. And you don't want to just show up one day

Starting point is 00:22:47 and just have it feel like upside-down world. Like, that's just not a service to anyone and will create a bunch of problems. Now that said, if you're building in a space where your need for your product is directly connected to what people already come for, in this case, it's information at Google. But people just want to do more for it. There's a natural opportunity to expand what you can do for people in the way that people came to Instagram to share through photos. And it turns out there was a potentially even better way to do that for friends through stories because it was this low pressure kind of ephemeral format and allowed for this, you to get a DM and have a fun conversation with your friends and feel connected, and that whole system really worked well. But it didn't replace

Starting point is 00:23:23 Instagram. It became an additional way that Instagram could help you. And I think of AI and search in the same way. People come to Google every day for information billions and billions of times a day. And actually people have tried typing in crazy stuff into search, even before AI existed, really. And before you couldn't really do much, you know, I might even get to the end of the search results page and say, we couldn't find anything. But now you can really help with almost anything. And so that feels like a natural thing to do. But from the same learning, you have to really design for the needs of your user. And so in the same way for search, like, you know, remember when you asked a question about

Starting point is 00:23:58 what was happening today and models used to be like, oh, I don't know, I was trained up until a year ago or something. It's like, it just like, like it always seemed crazy to me that you couldn't get information within 100 milliseconds of what was happening in the world just because of the way technology evolved. But now, you know, this has evolved to be very different, but particularly in Google, it's, It's finding information in near real-time basis across all of our knowledge. So that's something I think we can do uniquely well.

Starting point is 00:24:23 Another example is around visual and inspiration. People come to search all the time. They search for images. They go to image search as a huge search engine in itself. People look for design. They want wallpapers. They want uplighting ideas. They want to redecorate their kids' bedrooms.

Starting point is 00:24:38 And they browse for these images. And if you ask AI these kinds of questions, it'll like describe in text how to design a bedroom, which I always thought was really weird. And so now with visual AI mode, you can ask, help me design, you know, at my daughter's bedroom, looking for ideas, looking for inspiration. It could be about, you know, anything. You could be shopping for fashion, dresses. And the AI mode will actually go find inspirational images. And then you can have a multi-turn conversation.

Starting point is 00:25:02 You could say, actually, I want, you know, maximalist dark tones and super brooding theme. It will, like, know what that means and using a lot of our lens and technology for imagery. Go change the whole grid from something airy and light in California and to, like, like this dark lodge vibe, and like it knows what that means visually. And I think these are ways that Google, based on what Google users need, can add needs, can add unique value to the world, you know, versus just trying to implement another kind of general purpose chatbot, which isn't what our intention is. I'm curious to understand what the advantage that is uniquely Google's, because to that

Starting point is 00:25:38 example, the reason I'm using a virtual background, I have nothing on my walls. I'd love some help getting some assistance on that. And I understand Google's good for that. But I'd love to kind of understand why Google is uniquely good to that. Because if I ask another model, if I ask XAI or GROC, they'll actually go and search the internet, which I assume is mostly indexed by Google. Is there a unique advantage to Google being Google versus having to query against Google? Like the unique data set, the unique kind of profile and indexing that Google does,

Starting point is 00:26:05 that separates you from a lot of other companies in the same space? Yeah, I mean, there's a bunch of things that I think allow us to be really uniquely helpful in these cases. I mean, I think one is just in the technology and the inputs itself. Like there's been many, many, many years of building multimodal capabilities for image recognition and visual understanding. So our models are able to segment the background of your experience, put attention on the correct parts of the object. If you're to say like, hey, like what's up with the, you know, I want like the little tree behind me on the ledge or a little, you know, plant. Like, well, what's a ledge, you know, and what is behind you mean?

Starting point is 00:26:39 and what is the bottom shelf versus the middle shelf? How does the model know which part of the shelf to look at? Our models really understand that really well and uniquely well. Then once you select that region, now you might say, replace that plant. Like, I want a better one. Okay, well, what visual imagery have other people clicked on and used and what has been inspirational and helpful for those people in those journeys,

Starting point is 00:27:02 whereas I could probably find you like a janky plant that technically is a plant on the web, but without ranking or understanding if people, found that useful in the past for other plant searches, you know, you might not know that this is actually a really helpful plant that lots of people have found and clicked on and enjoyed when looking for, you know, office decor plants, right? Which is something that, you know, I think Google might more intuitively be able to offer given the people that come to Google to search for these kinds of things. I want to shift the conversation to adds the monetization model, because in my mind, this breaks completely when everyone has.

Starting point is 00:27:39 as an AI agent that represents them on the internet, that does all their shopping for them, that has access to their wallet, spends everything for them. How does this mental model, or I guess business model, break in Google search? If you're not pitching adverts to human eyeballs and trying to get their attention,

Starting point is 00:27:56 how does it work with AI agents? Yeah, I mean, I think there's a lot of unknowns here. This is a very kind of fast moving space. But I think one thing to mention, I think I mentioned before, is that people are still doing at scale, the kinds of questions that they're doing. And so I view this as, you know, right now,

Starting point is 00:28:15 people's ability to do more with agents. And so this feels like, you know, you can't spend an hour, let's say. Like I was recently trying to find out, like I was looking at buying a safe. Like I have some documents that my bank closed my safety deposit box and say they don't offer that anymore. I was like, that sucks. I probably should put all these in a safe somewhere, right? And it's kind of annoying to go to the bank.

Starting point is 00:28:36 Too much information, but this is the story. So it's actually really complicated. to buy a safe. There's like all, like, so I used our deep research product or deep search. And it's, it looked at like hundreds and hundreds of various places. And it created this incredible guide to safes. And it was like, there's different things around moisture, different implications on insurance.

Starting point is 00:28:55 Like, I would have never spent time doing that. But now that I did it, it has all of these links. It has reviews I'm going to, I've read. It has opportunities theoretically for me to go buy those safes in ways that I probably would just put this chore off indefinitely. and like never do. And those all create new opportunities, not just for discovery, but for monetization and other things down the road. And then obviously if there's, if you're talking about agentic tasks where you never need to show anything to the user, like theoretically, I don't know,

Starting point is 00:29:23 in some infinite timeline and a model knows me so well, a safe would just show up in my house that's like perfect somehow. I don't know if I totally believe that that's ever true, but let's say it is. I mean, I think things will just evolve in ways we don't totally understand. Yeah, it's, It sounds like the shopping experience actually becomes richer. And so we delve more into knowing what you want, whether it's purchasing a safe or buying a new shirt for an occasion that's coming up. And it feeds Google kind of like this additional information. Do you see the ad model evolving in any way or kind of staying where it is right now?

Starting point is 00:29:57 I think the ad model is definitely going to evolve because the format evolves. And typically if history repeats itself, you know, ads and is information. And it's actually really helpful information and content. And it's also a way for people to discover new businesses and services. And so when there was a shift to mobile, there's a new set of formats that came up for mobile. When there's a shift to video and short form video, there's a new type of ad formats for videos. And they're taller and they're more authentic and there are people talking about products and it feels great. And so I think in the eye world, we'll see something similar.

Starting point is 00:30:27 And in an agentic world, you might see something similar again. And it'll feel more natural to the format of, hey, like you're just kind of talking and here's some information. And by the way, hey, here's maybe another, here's a deal you might want to know about, which I think you're starting to see some experiments around. But I mean, we have to address the elephant in the room, which is like, this is a lot of power for Google to hold, right? So like, how do you think about treading that line behind, like, you know, responding to our user prompt in a way that's helpful and factual with also kind of like giving sponsored content or responsive product embedded into that response.

Starting point is 00:31:06 Yeah, I mean, I think this is something Google's we've had to do for most of the existence of Google. I mean, people already come to Google for these kinds of tasks and there's ads, you know, and, you know, on a page with results as well. And I think the principles stay the same. It's like, one, we have an honest results policy. Like, ads will not affect the core experience of anything you see. So in AI, it's no different.

Starting point is 00:31:25 Like, it will not affect ranking. An advertiser can't change, you know, the organic reply of what the AI is recommending you. And now it doesn't mean you can't insert opportunities to discover new things, but then those things need to be labeled really transparently to show the user, hey, this is something that you might want to know about. This is an advertisement in the same way, that exact way that it works today on search. So I think the principles don't change. You have to kind of rethink them foundationally every time there's a major move

Starting point is 00:31:51 in how people consume information. I think you saw that with video, we saw that with mobile, and I think people will see that again in this kind of more conversational-based paradigm that we're seeing. Do you have any ideas of what that form factor is or how that exists? Because I guess the perception is that if these AI overviews trigger fewer clicks for some of the publishers, then surely there needs to be some sort of monetization or controls. Do you have any ideas that you're considering implementing of things that should exist? I mean, I think we're running experiments right now.

Starting point is 00:32:18 I don't think anyone, I think this is a learning exercise. But the principles are similar, which is that if you search for information, you should be able to be able to find it, go deeper, and have control and transparency over. what you're seeing. And I think we've started to experiment, particularly in different AI surfaces. So with AI overviews in AI mode, there's experiments now with advertising across those experiences

Starting point is 00:32:41 to learn about what could work well there, but I'd say those are still in the earlier days. And then I think ultimately what we find is that if you search and you see an AI overviews, you know, those pages, you know, largely monetize very similarly to ones that don't have AI in them.

Starting point is 00:32:56 And so you kind of like get to a point where you're searching for something. And once you're kind of down the funnel of like, I'm looking for this ketchup removal thing and I just want to know, I just need to know how to blot it in this way. Like it turns out I was very unlikely to probably want to go buy a product in that moment. I probably just wanted to like know how to deal with this in the next two seconds because I have like a, I've got an active situation on the couch that I need to deal with.

Starting point is 00:33:20 And so you kind of learn, you'll also learn what the moments are that are going to be most helpful for people to discover new things and from there. The other thing I'll just say overall is that. that you mentioned links and how to encourage and understand how people can discover the web. This is absolutely essential and something that we take is like a foundational design principle to everything we do. I think Google and search, you know, care more about the web than arguably any company and any product out there. And so one thing our models do uniquely is they actually

Starting point is 00:33:50 are using and understand all these search signals. So they know for a given question what websites are really useful. And so you see, you know, our approach here is not only to provide helpful links, along the side, but also to embed them. So as you're reading, you can click and go deeper for anything that you see. And what we're finding is that people do click and they indeed want to go deeper. The paradigm's kind of changing,

Starting point is 00:34:11 where people want context first. They kind of want to get a sense, adjust of things, and then they want to click in. So say, I'm trying to get a credit card or buy a mattress. Like, I ultimately probably want to read what the experts are going to say about something and read a whole article. But I'm going to get a little bit of superficial information first.

Starting point is 00:34:27 And then I'm going to go read, let's say there's a, I'm going to read what people say probably, like on some social media threads. I'm going to read what experts say, like who are paid professionals who spend a lot of time analyzing this stuff and then I'm going to make and then I'm going to purchase. And that's what we see. But I think our job is to make those connections possible. And our hope is that AI, because it's incremental largely to what we see in search,

Starting point is 00:34:50 there are new opportunities to connect you to new services and new websites that you wouldn't have found because the AI is also doing broader searching than what you would do. And so the hope is that you also can promote discovery long term as well. Okay. And then, yeah, that gets to the positive sum pie where there's just a significant more search and curiosity than I think a lot of people perceive as it becomes easier to unlock the answers to that curiosity. So there's one question that I had.

Starting point is 00:35:15 We actually, we had Logan Kilpatrick on the show, who is part of the Gemini team. Amazing guest, amazing episode. I suggest everyone go find that and listen to it after you're done with this one. But what it unlocked for me was kind of the behind the scenes look at what that team does, what the Gemini team does at Google. And it kind of, it reminds me of this Manhattan project type thing where there's this small subset of people working on this really intelligent AI, but there's a slight disconnect where it's kind of under a separate thing,

Starting point is 00:35:39 and it interfaces with Google and search. So there's Google, but there's also Gemini. And I guess what I'm curious is, it's kind of what that relationship is like between Google and the Gemini team and how you guys kind of work to integrate these products together, because there's the Gemini's AI Studio and then there's Google Search. So kind of what's a good way for myself and for people who are listening to kind of compartmentalize and see where the synergies lie between those two entities? Yeah. I mean, we work incredibly closely with the kind of Google Deep Mine and Gemini teams.

Starting point is 00:36:09 It's the way to think about it is there's these foundational models that are increasingly able to understand any question and help find information about it or generate information about it. And there's people working on the frontier of what that looks like in many ways. And then those are many of those folks. but then how that is brought to life with products people use every day and love is really around the product teams. And so Google Search obviously is one of the largest, not the largest, you know, ways that people interact with AI even today. And we work extremely closely and kind of think of it as helping, you know, really push the frontier, particularly for how models are using information. And we'll work closely to bring those models into search and customize them and make them work really well for all the things. things we just talked about. People are to kind of get bedroom inspiration. They're taking photos of things. They're asking about closing times. And what will happen is the modeling team will think about it more

Starting point is 00:37:02 as capabilities. Like, let's say you want the capability for the model to use a tool because or to have reasoning so that the model can think a little bit more. So someone will go research that and add that capability. And then, you know, from a search perspective, the tools that it's using is something like finance. And so it can make a real-time query to look up financial information. Right now, if you ask about any two stocks, you say like, you go to AI mode and you say, compare the last six months of these name two stocks, put them in there. It'll actually use Google Finance as a tool and make a request for live information and historical data.

Starting point is 00:37:40 And then it'll plot that information in the ground. And that uses Gemini as a model. So it has this foundational ability to like understand what the question. is that you were asking, but then it has the search ability to use all of these search tools, which is really cool. And then it can generate that kind of a response for you. Can you just a follow-up question on that? Can you walk us through what that was like integrating Google search as it was before with an LLM? Like presumably there was like some friction that you ran up against where there was like combining datasets and stuff. I'm curious what that

Starting point is 00:38:12 looked like. I don't think there was too much friction per se. I mean, I think the main thing is when you're adding a model into the mix, it has to just be done in a very specific way because models have different tendencies in terms of how it responds to information than kind of the otherwise, the search stack that has been built out. But they work together now,

Starting point is 00:38:33 you know, harmoniously in the search environment. And so some questions kind of produce these AI responses and other ones kind of have different AI enhancements. But I'm not sure if there was a more specific kind of tension you were alluding to. No, no. I was just curious whether there was massive kind of uplift, whether that was developmental or on kind of like taste making for users that you kind of like ran up against. But it sounds like there wasn't much.

Starting point is 00:38:57 Another kind of wild card question that I had, Robbie, is I like the personalization of using Google search. One that I wasn't even aware of was if I would type in someone's name, for example, a celebrity and it would come up with their net worth or something. And I would show my friends, I would say, see, this is the most search thing. on Google and they would be like, well, not quite. Like, it's kind of looking at the cookies that you've searched before and maybe it's kind of like adjusting its preferences to what you might want to search. On that theme, how far down the rabbit hole can we go when it comes to personalization of Google AI search for someone, right?

Starting point is 00:39:34 Like, is there a world where I'm not just kind of like discovering new websites or apps or experiences on the internet, but it is highly tailored and personalized to other data sets that are kind of like unique to me, right? It knows my shopping preferences, kind of knows how much money I have in the bank. It kind of knows that I have an event coming up in a couple of weeks' time. How does that world look for you? Yeah, I mean, I think first of all, there's Google overall, and then there's the questions that people typically use for AI.

Starting point is 00:39:59 I think for Google overall, there's plenty of questions out there that aren't great to be personalized. And so we think about a lot, the differential value in being personalized. If you ask about how tall the, you know, Empire State Building is, like, it's kind of just like a factual piece of information. Maybe you want to, maybe there's some personalization on the source, like, if you really like certain facts or something from specific providers, but, you know, many things are not great.

Starting point is 00:40:20 And I think that there's a value in just having this kind of universal place you go to seeing what things are showing up for a given question. And that said, I think many questions, it's the opposite. It's almost weird not to personalize it. Like if you say, like, what kind of genes should I get? It's like, well, I don't know. Like, what kind of genes would you, like for me, that's going to be super different than, you know, some other, if I were to grab a.

Starting point is 00:40:45 random person in the world and try to get them some genes, right? And so, and I think it turns out that in these AI experiences, like AI overviews and AI mode, people have a lot more questions that are these kinds of advice seeking recommendations questions. They want to know where to eat for dinner. They want to know where to travel with their family. They want to know. It's kind of in this more subjective camp. It kind of depends. And so we think there's a huge opportunity for our AI to know you better and then to be uniquely helpful because of that knowledge. And one of the things we talked about at I.O. was how the AI can get a better understanding of you through connected services like Gmail so that over time it could really know, wow, like you tend to like these kinds of products or

Starting point is 00:41:26 brands. Here's another one that just came out from them. And how much more useful that would be than just generically showing you. Like, I don't know, just whatever the top 10 selling gene brands are right now generically, a list of that. And how you would, how helpful that would be. And so that is, I think, very much the vision of building something that can be really knowledgeable for you specifically. But I think there's a nuance there in what you what you personalize. And I think our thought on this is that the user is always seeing like what parts of information are being shown because of your interests or purchases or things that you've done in the past or things that it might think you might like versus things that it's just

Starting point is 00:42:04 suggesting generally. So I think people want to intuitively understand when they're being personalized, when information is made for them versus when something that everyone would see if they were to ask this question. And this is kind of the wisdom of the crowds, you know, so to speak, represented in a Google search page. As you get closer to the end of the show, I love ending on a more optimistic note. So I want to talk about the future vision, the future that you're excited about as someone who builds products that billions of people use. So it was funny. Just today, I saw a notification that Jim and I rolled out on my TV and now I have it on my car or on my phone. And then I have it like on

Starting point is 00:42:38 some Google watches and some earbuds. And what about cars? What about things that are sitting on my tabletop? What I'm curious to ask you is like, what does the ideal day for you as a product designer where your product slots into this look like for someone when search is kind of everywhere and rarely typed? Like, what does the final form look like? And how does that change the way people go about their day to day lives? Yeah, I think that really it's not a final form. It's more of like a multi-form. I don't know if that's a word, but that's kind of how I think about it. Okay, I understood. Yeah. It's kind of like there's not a single form. I think. I think. it's adaptive. And so the way that you think about it is I love thinking about these journeys

Starting point is 00:43:14 where they're multi-day people have needs that are kind of like pending and they're kind of working on them over time. You know, like I think about a lot people like buying a couch for their apartment, let's say. And it's not just like an easy thing. Like you might be on your computer, you're doing some research and you want to just find cool couches for like apartment in New York, let's say. If you use AI mode today and you ask about it, you'll see a visual grid of couches and you might actually click on a few of those and it might recommend you some things that you might like. Okay, great. You kind of like, oh, those are kind of cool. I'm going to think about it. Then you're going for a walk and you walk by, you know, like a furniture store and you see something

Starting point is 00:43:47 that strikes you out of a friend's house. And then you go to your app, you go back to your thread. You upload a photo of it and you're like, oh, actually this is the thing that's super awesome. Like I actually wanted this is like stuff that's like this. Great. Thanks. Here's some things like that. All right. Put that away another time. Blah, blah, blah, blah. And let's say you're driving. and you like pop into your head that you really actually realize that this other color was one you wanted. You go live and you say, hey, remember I was kind of talking about couches? I actually like this color and I'm mostly focused on these two colors. Send me some recommendations for those.

Starting point is 00:44:19 Great, got it. Boom. And then it does that. And then a few days after that, maybe you get a push alert. That's like there's a deal where one of the ones you were considering in the exact way has that. And there's like a sale. I don't know. It's a what, Cyber Monday coming up.

Starting point is 00:44:34 A Black Friday, all of the above, big shopping season. Maybe one of the things you were loving was available. These are all these ways that Google now across modes, across kind of different aspects of your life, being incredibly helpful to you for this need. And I think that's more of how I think of the future of search than any one specific feature or kind of single form factor. Well, we are just about coming to the top of the hour.

Starting point is 00:45:01 Robbie, thank you so much for taking time out of your busy day to chat with us. That was a fascinating conversation. I think a lot of people in life just kind of take for granted what Google search has brought for them and kind of like the way that this thing evolves and the way that it kind of like permeates

Starting point is 00:45:17 every facet of our life, especially when it goes fully multimodal. It's super important to understand. So thank you for taking us through that journey. Limeless listeners, if you enjoyed this show, please give it a thumbs up. We know that a bunch of you aren't subscribed to us.

Starting point is 00:45:31 So we need you to get on that button and kind of focus on that, please. If you enjoy the show and if you enjoyed any of the episodes that you've listened to so far this week, please give us a five-star rating and we will see you on the next one. Robbie, thank you again for joining us.

Starting point is 00:45:46 Thank you for having me.

Limitless Podcast - How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.