Limitless: An AI Podcast - Announcing Google's Secret New AI Model With The Person Who Built It | Logan Kilpatrick

Starting point is 00:00:00 The best image generation and editing model in the world. It's scary how realistic this stuff is. V-O-3 has kind of like killed the VFX Studio. And this is, I think, principally enabled by vibe coding. My hope is that it actually ends up creating more opportunity for the experts and the specialists. How much the tools that you build do you find are built with vibe coding? I'm like almost 85% of everything that I do vibe-coded. I remember when I first booted up a PC and I just had access to all these different wonderful applications all within one suite.

Starting point is 00:00:28 This kind of feels like that moment for AI. Gemini is feeling faster, but it's also feeling better, and it's also getting cheaper. What's happening behind the scenes? We cross quadrillion tokens, which comes after a trillion if you're not. I haven't thought about numbers higher than a trillion before. It's what comes after a trillion. And there's no slowdown in sight. We have an incredibly exciting episode today because we are joined by Logan Kilpatrick.

Starting point is 00:00:55 Logan is the product lead working on the Gemini platform at Google DeepMind. We have an exciting announcement to break. right here today with Logan, which is the announcement of a model that we previously knew as nanobanana. The reality is this is a brand new image generation model coming out of Google, and you can access it today. So Logan, tell us about this brand new model and what we need to be excited about. Yeah, for people who are not chronically online and seeing all the tweets and everything like that, part of the excitement has been. And over the last, I think, like six months, we've seen the emergence of like native image generation editing models. Historically,

Starting point is 00:01:29 you would see models that could actually do a really good job of generating images. They usually tend to be very beautiful, aesthetic images. The challenge was like, how do you actually use these things in practice to do a lot of stuff? That's where this editing capability is really helpful. And then so we started to see these models that can actually edit images. If you could provide an image, it would, and then prompted it would actually change that image. What's really interesting, though, is this fusion of those two capabilities, with the actual base intelligence of the Gemini model.

Starting point is 00:02:02 And there's a lot of really cool ways in which this manifests itself. And we'll look at some examples of this. But it's this benefit of the world knowledge. The model is like smart. So as you ask it to do things and as you ask it to make changes, it doesn't just like take what you're saying at face value. It takes what you're saying in the context of its understanding of the world and its understanding of physics,

Starting point is 00:02:23 its understanding of light and all this other stuff. And it makes those changes. So it's not just blindly making, you know, edits or generations, they're actually like grounded in reality and in context in which that's useful. And we can look at some examples of this. My favorite thing is, uh, is actually this editing capability. So this is an AI studio and we'll have a link somewhere, uh, hopefully in the show notes that will let us do this. My friend, Amar, um, who is on our team and drives all of our design stuff, um, build this and it's called past forward. And what you can do is you can put in an

Starting point is 00:02:55 image of yourself and it'll regenerate a version of yourself in this sort of like Polaroid-esque vibe following all the different trends from the last 10 or 20, 30 years. So if you look at this example, this is from me from the 1950s and I'm sure I have a picture of my dad from the 1950s somewhere or my grandpa who looks somewhat similar to that. Here's me in the 1980s, which I love. Here's me. Some of these facial expressions are also different. Like you're showing your teeth more in some, and then it's a smirk in others. That's super cool.

Starting point is 00:03:32 I like this sweater. I actually have a sweater that almost looks exactly like this 1970s one, though I don't like my hair in this 1970s one. Same with the 2000s. So one of the cool things about this new model and one of the features I think folks are going to be most excited about is this character consistency, which is as you took the original image

Starting point is 00:03:50 and you made the translation to this 1950s image, It actually looks like me still, which is really cool. So there's lots of these really interesting use cases. I think we'll go out with a sports card demo where you can sort of turn yourself into a figurine sports card, which is really cool. So lots of really interesting examples like this. And another thing you'll notice is actually the speed. And this is where the underlying model is not the code name was nano banana. the actual model is built on Gemini 2.5 Flash, which is our workhorse model.

Starting point is 00:04:25 It's super fast. It's super efficient. It's relatively priced in the market, which is awesome. So you can actually use it at scale. And yeah, so this model behind the scenes are for developers who people who want to build with it as Gemini 2.5 Flash image, which is awesome. So this is a use case that I love. And it's a ton of fun. You can do this in the Gemini app or in AI Studio. I mean, as you said, the character consistency, just from these examples, is like, astounding. I need to give a round of applause. This has been my biggest issue when I'm generating images of myself. Genuinely. And Josh and I are early users of, you know, Mid Journey V1, Open AIs, image generator as well.

Starting point is 00:05:10 And one of our pet peeves was it just couldn't do the most simplistic things, right? We could just say, hey, keep this photo and portrait of me exactly the same. Can you show me what I would look like in a different hairstyle or me holding a model of Coca-Cola instead of this martini? And it just could not do that, right? Just simple video, like photo editing. Can you give us a bit of a background as to what Google did to be able to achieve this? Because, you know, I've been racking my head around like, why other AI companies couldn't

Starting point is 00:05:39 do this? Like, what's happening behind the scenes? Can you give us a bit of insight? Yeah, that's a good question. I think this actually goes back to, and I'll share another example in a second as well, But I think this goes back to this story of what happens when you build a model that has the fusion of all these capabilities together. And I was actually just, this is a sort of parallel example to this, but it's another example of why building a unified model to do all this stuff and not having a separate model that doesn't have world knowledge and all these other capabilities is useful. The same thing is actually true on video.

Starting point is 00:06:10 Like part of the story and we haven't, we have a bunch of stuff coming that sort of tells this a little bit more elegantly than I will right now. But part of the story of like V-O-3 having this really state-of-the-art video generation capabilities, if folks have seen this, is that the Gemini models themselves have this state-of-the-art video understanding capabilities. And a very similar context actually on the image side, which is since the original Gemini model, we've like, with the exception of probably a couple of months in that like two and a half year time horizon, have had state-of-the-art image understanding capabilities. And I think there is this like capability transfer, which is really interesting as you go to do the generation step.

Starting point is 00:06:53 And if you can fuse those two things together in the same model, you end up just being able to do things that other models aren't able to do. And this was part of the bet originally that like, why build Gemini to be the original Gemini, Gemini 1.0 model was built to be natively multimodal. It was built to be natively multimodal because the belief at the time, I think this is turning out to be true, is that that's on the path to AGI, is that you combine these capabilities together and, like, similar to what humans are able to do, we have this fusion of all

Starting point is 00:07:23 these capabilities in a single entity, just like these models should be able to do. Wow. So if I were to distill what you just said, here, Logan, the way you've trained Gemini 2.5 or all future Google Gemini models is it's in a very multimodal fashion. So you're basically, it gets smarter in one particular facet, which trains itself or has transferable capabilities to other facets, whether it's image generation, video generation or even text LLMs to some

Starting point is 00:07:53 extent. I just think that's fascinating. I'm curious. I have one question for you, which I want to hear your take on. How are you going to surface this to the regular consumer, right? Because right now, you provide all of these capabilities through an amazing suite, you know,

Starting point is 00:08:09 called Google AI Studio. But if I want to use this in, say, an Instagram app or my random photo imaging editing app, is this something that could be easily proved to someone or sourced or do we need to go via some other route right now? Let me just diverge really quickly, which is if any of the researchers who I work with are watching this, they will tell me, they'll make sure that I note that capability transfer that we just talked about. You oftentimes don't get that out of the box.

Starting point is 00:08:38 So there is some emergence where like you get a little bit of that. You do have to do like there's like real true research and engineering work that has to happen to make sure that that capability fusion happens. It's not often that you just like make the model really good at one thing and then it translates. Oftentimes actually it's like it has a negative effect, which is as you make the models really good at code, for example, you trade that off against some other, you know, creative writing as a random example of this. So you have to do a lot of like,

Starting point is 00:09:08 active research and engineering work to make sure that you don't lose a capability as you make another one better. But then ultimately, they benefit, if you can make them on the same level, they benefit from this interleaved capability together. To answer the question about like, where is this going to be available? The Gemini app is the place that like for by and large, most people should be going to. So if you go to Gemini.com, there'll be sort of a landing page experience that showcases this new model and makes it really easy and you can put in all your images and do tons of fun stuff like the example that I was showing. If you're a developer and you want to build something with this, in AI Studio, we have this build tab.

Starting point is 00:09:44 And that's what we were just looking at as an example of one of the applets that's available in the build tab. The general essence is that all of these applets can be forked and remixed and edited and modified so that you can keep doing all the things that you want to do with the AI capability built in. So it'll continue to be powered by the same model. It'll do all that stuff, which is awesome. So there's lots of cool fusion capabilities that we have with this. Same thing with this other example that we were looking at. So if you want to go outside of this environment, we have an API.

Starting point is 00:10:17 You could go and build whatever. So if your website is AIPotos.com or whatever, you could go and build with the Gemini API, use the new Gemini 2.5 Flash image model to do a bunch of this stuff, which is awesome. Awesome. So while this is baking, I noticed you had another tab open, which means maybe there's another demo that you were prepared to share. There is another demo. This one I actually haven't tried yet.

Starting point is 00:10:39 But it's this idea of like, how can you take a photo editing experience and make it super, super simple? So I'll grab an image. Actually, we'll take this picture, which is a picture of Demis and I. Legends. We'll put an anime filter on it.

Starting point is 00:10:55 And we'll see. And so this is a completely vibe-coded UI experience and all the code behind the scenes is vibe-coded as well. And we'll see how well this works with Demis and I. How much the tools that you build do you find are built with vibe coding instead of just hard coding software? Are you writing a lot of this as vibe coded through the Gemini model? I think you sometimes you're able to do some of the stuff completely vibe coded.

Starting point is 00:11:20 It depends on how specific that you want to do. I'm like almost 85% of everything that I do vibe coded. Somebody else on my team built this one, so I don't want to misrepresent the work. It could have all been human programmed because we have an incredible set of engineers. The general idea is how can you make this? Oh, interesting. How can you make this Photoshop-like experience? Let's go 90.

Starting point is 00:11:44 Do you all have suggestions? What would a good filter for this be? I don't know. Oh, man. Yeah, like perhaps going back to the last example, maybe like a 90s film or an 80s film grain. All right. And I guess while we wait for that to load,

Starting point is 00:11:56 is there a simple way that you would describe nanobanana or this new image model to just the average person on the street who's, oh, look, there we go. We have the film grain. Okay, so what we're watching for the people who are listening, you're retouching, you can retouch parts of the image, you could crop, adjust, there are filters to be applied. I'm just clicking through buttons, to be honest,

Starting point is 00:12:14 I've never done it before. So it's been live demo day one. This is the exploration you're going to get to do as a user as you play around. Logan is vibe editing. That's what's happening. Yeah, he's experimenting. Vib vibe editing, which is fun. I love it.

Starting point is 00:12:27 That's a great. It's a great way. And the cool thing, again, is like, what I love about this experience is, as you're going through, oh, interesting. This one's like giving me edited outline. Oh, yeah, a little outline. This is helpful for our thumbnail generation. We do a lot of this stuff.

Starting point is 00:12:40 Let's see if I can remove the background as well. Let's see. This removes the background. This is going to be trouble because this is a big feature that we use for a lot of our imagery. Hopefully, come on. Oh, nice. Oh, done. Nicely done.

Starting point is 00:12:54 For those of you who are listening, he's typed in, put me in the library of Congress. So we're going to hopefully see Logan. Yeah, the context on that image was that Demis and I were in the library of the Deep Mind office. Oh, nice. Yeah, so that's the Library of Congress reference in my mind. But yeah, so much that you can do. Again, what I love about this experience is that as you go around and play with this stuff, if you want to modify this experience, you can do so on the left-hand side.

Starting point is 00:13:24 If you say, actually, here are these five editing features that I really care about. The model will go and rewrite the code. and then it'll still be attached to this new 2.5 flash image model so you can do all these types of cool stuff. This experience is something that I'm really excited about that we've been pushing on. Yeah, this is amazing because I myself, I do photography a lot. I was a photographer in my past life and I rely very heavily on Photoshop and Lightroom for editing, which is a very manual process. And they have these smart tools, but they're not quite like this. I mean, this saves a tremendous amount of time if I could just say, hey, re-align, restrain the image, remove the background, add a filter.

Starting point is 00:14:01 I think the plain English version of this makes it really approachable, but also way faster. Yeah, it is, it is crazy fast. I think about this all the time. Like, there's definitely cases where you want to go deep with whatever the pro tool is. I think there's actually something interesting, like on the near horizon that our team has thought a lot about, which is how you can have this experience and how you can sort of in a generative UI capacity, have the experience sort of subtly expose additional detail to users. And I think about this, like, if you're a new Photoshop user as an example and you show up, like the chance that you're going to use all of the bells and whistles is zero. Like you want like the three things. I want to remove a background. I want

Starting point is 00:14:44 to crop something, whatever it is. Don't actually show this all of these bells and whistles. I think the exciting thing about like the progress on coding models is that in the future, the challenge with the challenge with doing this in the present rather is that software is deterministic. You have to build software to build the sort of like modified version of that software for all of these different like skill sets and use cases is extremely expensive. It's not feasible. It doesn't scale to production environments. But if you can have this generative UI capability where like the model sort of knows and as you talk to the model it realizes, oh, you might actually benefit from these other things, it can create the code to do that on the fly and expose them to you,

Starting point is 00:15:24 which is really interesting. So I think there's lots of stuff that is going to be possible as the models keep getting better. This is amazing. So the TLDR on this new announcement, how would I, if I were to go explain to my friend what this does, why this is special, how would you kind of sell it to me? The best image generation and editing model in the world, 2.5 flash image or nanobanana, whichever you prefer, is the model that that can do this. And I think there's so many creative use cases where you're actually bounded by the creative tool. And I feel like this is one of these examples to me where it's like I feel like I'm 10x more career. I was literally helping my friend yesterday doing a bunch of iterations on his LinkedIn

Starting point is 00:16:08 picture because it was like, you know, the background was slightly weird or something like that. I did like 15 iterations and now he's got a great new LinkedIn background, which is awesome. So like there's so many like actual practical use cases where you, and I literally just like built a custom tool on the fly vibe coding in order to solve that use case, which was a ton of fun. Yeah, this is so cool. Okay. So now so this model, nanobanana Gemini 2.5 flash image, and it's out today. So we'll link that in inscription for people want to try it out.

Starting point is 00:16:39 I think one of my complaints for the longest time and I've mentioned this on the show a few times is a lot of times when I'm engaging with this incredible form of intelligence, I just have a text box. And it's up to me to kind of hold the creativity out of my own mind. And I don't get a lot of help along the way. But one of the things that you spend your time in is this thing called Google AI Studio. And I've used AI Studio a lot because it solves a problem for me that was annoying, which is just the blank text box. It kind of has a lot of prompts. It has a lot of helpers. It has a lot of guidance into helping me extract value out of the model. So what I'd love for you to do for people who aren't familiar, Logan, is just kind of explain to everyone what Google AI Studio is

Starting point is 00:17:13 and why it's so important and why so great. Yeah, I love this, Josh. I appreciate that you like using AI Studio. It is a labor of love. Lots of people across Google have put in a ton of time to make progress on this. I really want to show, so I'll make a caveat, which is we have this entirely redesigned AI Studio experience that's coming very soon. I won't spoil it in this episode because it's like half-faked right now, and I wish I could show. And I think actually some of the features that you might see in this UI

Starting point is 00:17:44 might be slightly different at launch time than what you see here. So take this with a grain of salt. We've got a bunch of new stuff coming. And I think actually it should help with this problem that you're describing, which is as you show up to a bunch of these tools today, the onus is really on you as a user

Starting point is 00:17:59 to try to figure out what's capable, what all the different models are capable of, what even are all the different models, like all of that stuff. So at the high level, like we built AI Studio for this like AI builder audience. If you want to take AI models and actually build something with them and not just, you know, chat to AI models, this is the product that was built for you. We have a way to in this like chat UI experience, sort of play with the different

Starting point is 00:18:27 capabilities of the model, feel what's what's possible. What is Gemini good at? What's it not good at? What are the different tools it has access to? But as you go into AI Studio, you'll see something that looks like. like this, you know, we're highlighting a bunch of the new capabilities that we have right now, this URL context tool, which is really great for information retrieval, this native speech generation capability, which is really cool. Folks have used notebook L.M and you want to build

Starting point is 00:18:51 a notebook LM like experience. We have an API for people who want to build something like that and we have this live audio-to-audio dialogue experience where you can share screen with the model and talk to it and it can see the things that you see and engage with it. Of course, we have our native image generation and editing model, the old version 2.0 flash, now the new version, 2.5 flash, and lots of other stuff that's available as you sort of experience what these models are capable of. So really, this playground experience is one version. We have this chat prompt. On the left hand side, we have this stream. This is where you can talk to Gemini and sort of share your screen. And actually, you can like show it things on the webcam and be like, what's this?

Starting point is 00:19:33 How do I use this thing? You can do this on mobile as well, which is really cool. We have this generative media experience where, like, if you want to build things with, we have a music model, we have Vio, which is our video generation model, we have all the text to speech stuff, which is really cool. Because I overwhelm people with so much stuff that you can do in AI Studio. The sort of key threat of all this is we build AI Studio to showcase a bunch of these capabilities. And everything you see in AI Studio has an underlying sort of API and developer experience. So if you want to build something like any of these experiences, all of this is possible. There's like no Google secret magic that's happening pretty much anywhere in AI Studio. It's all things that you could build as someone using a vibe coding product or by hand writing the code.

Starting point is 00:20:19 You could build all these things and even more. And that is the perfect segue to this build tab where we're trying to help also actually help you get started building a bunch of stuff. So you can use these templates that we have. You can use a bunch of the suggestions. You can look through our gallery of a different stuff. And we're really in this experience trying to help you build AI powered apps, which we think is something that folks are really, really excited about. And we'll have much more to share around all the AI app building stuff in the near future.

Starting point is 00:20:50 Awesome. Thanks for the rundown. So as I'm looking at this, I'm wondering, who do you think this is for? What type of person should come to AI Studio and tinker around here? Yeah. So I think, you know, historically, and so you'll see a little bit. bit of this transition, if you play around the product where there's some interesting edges. We were originally focused on building for developers, so it was built. And there is like a part

Starting point is 00:21:11 of the experience, which like is tied to the Gemini API, which tends to be used mostly by developers. So if you go to dashboard, you can see all your API keys and check your usage and billing and things like that. By and large, though, I think the really cool opportunity of what's happening right now is this transition of like who is creating software. And this is, I think, principally enabled by vibe coding. And because of that, like we've re-centered ourselves to be really focused on this AI builder persona,

Starting point is 00:21:39 which is like people who want to build things using AI tools. Also people who are trying to build AI experiences, we think is going to be the like market that creates value for the world. So if you're excited about all the things that you're seeing, if you want to build things, AI Studio is very much like a builder first platform. If you're just looking for like a great everyday AI assistant product, you want to get help on coding questions or homework or life advice or all that type of stuff.

Starting point is 00:22:09 The Gemini app is the right place for this. It's very much like a DAU type of product where like you come back and it has memory and personalization and all this other stuff, which makes it really great as like an assistant to help you in your life versus AI Studio. The artifact is like, we help you create something and then you go put that thing into the world in some way. And you don't necessarily need to come back and use it every day. You use it whenever you want to build something.

Starting point is 00:22:37 It's funny. I'm dating myself a bit here, but I remember when I first booted up a PC and I loaded up Microsoft Office, and I just had access to all these different wonderful applications that were at the time super new or within one suite. This kind of feels like that moment for AI. And you might not take that as a compliment because it's a completely different company, but it was what I built my childhood off of and my fascination with computers.

Starting point is 00:23:03 So I appreciate this and I love that it's this massively like cohesive experience. But kind of zooming out, Logan, I was thinking a lot about Google AI and what that means to me personally. I have to say it's the only company that I think beyond an LLM. And what I mean by that is when I think of Google AI, I don't just think of Gemini. I think of the amazing image gen stuff that you have. I think of the amazing video outputs that you guys have. I think of the text to voice generation that you just demoed and all those kinds of things. I remember seeing this advert that appeared on my timeline.

Starting point is 00:23:43 And I remember thinking, wow, this must be the new GTA. Then I was like, no, no, that's Florida, that's Miami. Nope, people are doing wild stuff. That's an alien. Hang on a second, this can't be real. And then I learned that it was a Google V-O-3 generation of an advert for Kalshi, which is like this prediction markets situation. And I remember thinking, how on earth have we got to AI-generated video that is this high quality and this high fidelity? I think in my mind, V-O-3 has kind of like killed the VFX studio.

Starting point is 00:24:17 It's kind of killed a lot of Hollywood production studios as well. give me a breakdown and insight into how you built or how you guys built V-O-3 and what that means for the future of movie, video production, and more. Yeah, that's a great question. I think there's something really interesting along these threads and not to push back on the notion that it's killing Hollywood because I think there is like, I think it's an interesting conversation. The way that I have seen this play out, and the great example of this, that folks have

Starting point is 00:24:52 seen flow, which is our sort of like creative video tool. And if you're using VO and you want to sort of get the most out of VO, flow is the tool to do that. If you see lots of like the creators who are building, you know, minute long videos using VO and it's like this really cohesive story and it has like a clear visual identity similar to what you'd get from like a, probably not the extent of a Hollywood production, but like somebody thoughtfully choreographing a film, flow is a the product to do that. And actually interesting, like, flow was built in conjunction with filmmakers. And I think that's actually like there is, and I feel this way about vibe coding as well. And it's this thought experiment that I'm always running through in my head, which is, you know,

Starting point is 00:25:37 yes, I think AI is like raising the bar forever or it's raising the floor for everyone. We're like, now everyone can create. What does that mean for people who have expertise? And I think in most cases, what it means is actually the value of your expertise continues to go up. And like, this is my personal bet. And I don't know how much this tracks to like everyone else's worldview. My personal bet is that expertise in the world where the floor is lifted for everyone across all these dimensions is actually more important because there was something about, and I think like video production is a great example for me because I would never have been able to make a video. Like it's not in the cards. Like for my skill set, my creative ability, my financial ability.

Starting point is 00:26:19 like I will never be able to make a video. I can make things with Vio. And now I'm like a little bit closer to imagining like, okay, if I'm serious about this, I need to go out and like actually engage with people. And I've like sort of, it's like, what did my appetite in a way that I don't think I would. It was just like too far in a way.

Starting point is 00:26:39 And I think software is another example where vibe coding, if you were to pull a random person off the street and you start talking to them about coding and C++ and deploying stuff, And all this, they're like, brain turns off, not interested. I don't want to learn to code. That's not cool. It's not fun. It sounds horrible.

Starting point is 00:26:57 And then vibe coding rolls around. It's like, oh, wait, I can actually build stuff. And like, I don't really need to understand all of the details. But there's still a limit to what I can build. And who is actually well positioned to help me take the next step? Like I, you know, vibe code something. And I'm like, this is awesome. I share it with my friends.

Starting point is 00:27:16 They all love it. I want to, you know, go build a business. around this thing that I vibe coded, there's still a software engineer that needs to help make that thing actually happen. So if anything, it's like, it's increasing this, I mean, on the software side,

Starting point is 00:27:30 there's this infinite demand for software, and it's increasing the total addressable market of like what software engineers need to help people build. I think there'll be something similar on the video side. You know, there will be downsides to AI technology in some ways. I think there is like, as the, as the technology shifts happens, there is some amount of disruption that's taking,

Starting point is 00:27:49 place and like someone's workflow is being disrupted. But I do think there's this really interesting thread to pull on, which is my hope is that it actually ends up creating more opportunity for the experts and the specialists. So it sounds like you're not saying VFX studio teams are going to be replaced by software engineers, but rather that team in itself will become more adept at using these AI tools and products to kind of enhance their own skill set beyond what it is today, is that right? Yeah, yeah. And I think we've seen this already play out in some ways, which is interesting.

Starting point is 00:28:23 I think like code is like a little bit wider distribution than perhaps the VFX. And it's VFX also in a space that I'm less familiar with personally. But yeah, I think I think this will, this is likely what is going to play out if I had to guess and bet. Can you help us understand how a product like V-O-3 gets used beyond just like the major Hollywood production stuff, right? Because I've seen a bunch of these videos now, and I'll be honest with you, Logan, it's scary how realistic this stuff is, right? It's like from a high-quality AAA game demo, all the way to something that is shot like at an A-24 film, you know, the scenes, the cuts,

Starting point is 00:29:08 the changes. I think it's awesome. I'm wondering whether that goes beyond entertainment in any way. Do you have any thoughts or ideas there? Yeah, that is interesting. I think one of the ones that is like related to, it's sort of one skip away from video generation itself, which was Genie, which was our sort of world simulation work that was happening. I think if folks haven't seen this, go look up Jeannie 3 and you can see a video.

Starting point is 00:29:33 It's mind-blowing. It's like a fully playable game world simulation. You can like prompt on the go and this environment will change. You can control it on your keyboard similar to a game. I think that work translates actually really well to robotics, which is cool. So as you, like one of the, if folks aren't familiar with this, like one of the principal reasons we don't just have robots walking around everywhere.

Starting point is 00:29:55 And the reason why we have LLMs that can actually do lots of useful stuff is it's this data problem, which is like there's lots of, you know, text data and other data that's like representative of the intelligence of humans and all this stuff that's available. There's actually not a lot of data that is useful for making robotics work. And I think VO could be part of, or like generally that sort of segment of video generation and this like physics understanding and all that other stuff, I think could be really helpful in actually making the long tail of robotics use cases work. And then I can finally have a robot that will fold my laundry so that I don't need to spend my time doing that. That's my like outside of entertainment bet as far as like where that use case ends up creating value in the world. With V-O-3, the goal is to enable humans to become a better version of themselves, a 10x, 100-X better version of themselves using these different tools. So in the example of a VFX studio, you can now kind of like create much better movies.

Starting point is 00:30:56 How does that apply for Genie 3 exactly, right? You gave the example of like being able to create simulated environments, but that's to train these robots. That's to train these models. What about us? What about the flesh humans that are out there? Can you give us some examples about where this might be? applied or used?

Starting point is 00:31:14 Yeah, that's a good example. I mean, the robot answer is like the robots will be there to help us, which is nice. So hopefully there's a bunch of stuff that you don't want to do that you'll be able to get your robot to do. Or there's like industries that are like dangerous for humans to operate in where it's like if you can sort of do that simulation without needing to collect a bunch of human data to do those things, I could see that being super valuable. I think my initial reaction to the genie use case,

Starting point is 00:31:44 like I could see lots of, actually the two that come to mind like one entertainment I think will be cool. Humans want to be entertained. It's a story as all this time. I think there will be some entertainment value of a product experience like Jeannie. I think the other one is actually back to a bunch of use cases where you'd actually want robotics to be able to do some of that work that don't yet, the robot product experience, like, isn't actually there.

Starting point is 00:32:15 This could be things like, you know, mining or, like, heavy industries, things like that where, like, there's actually, like, a safety aspect of, like, how can you do these, like, realistic simulation training experiences in order to make sure that, like, you don't have to, like, physically put yourself in harm's way in order to, like, understand the bounds or, like, the failure cases, like, disaster recovery, things like. like that where it would be you don't want to have to show up at a hurricane the first time to like really understand what the environment could be like. And like being able to do those types of simulations is interesting. And building and software deterministically to solve that

Starting point is 00:32:53 problem would actually be really difficult and expensive and like probably isn't a, you know, a large market that lots of companies are going to go after. But if you have this model that has really great world knowledge, you can throw all these random variables at it and like sort of do that type of like training and simulation. So yeah, it's a perhaps an interesting use case. I don't know if there's actually a plan to use it for things like that, but those are things that come to mind. This is something I've been dying to ask you about because this is something that I've

Starting point is 00:33:21 been fascinated by. When I watched the Genie 3 demo for the first time, it just kind of shattered my perception of where we were at because you see at work and I saw this great demo where someone was painting the wall. We actually filmed an entire episode about this and it retained all of the information. And one theme, as I'm hearing you describe these things, as I'm hearing you describe V-O-3, Genie 3, you're building this like deep understanding of the physical world. And I can't help but notice this trend.

Starting point is 00:33:46 Like, you are just starting to understand the world more and more. And I could see this when it comes to making games as an example where like a lot of people were using Genie 3 to just make these like, not necessarily games, but virtual worlds that you could walk around and interact with. And I'm wondering if you could just kind of share the long-term reasoning why. Because clearly there's a reason, there's a lot of value to it. Is it from being able to create maybe artificial data for robots? If you can emulate the physical world, you can create data to train these robots.

Starting point is 00:34:11 Is it because it creates great experiences? Like perhaps we'll have AAA design studios using Genu 5 to make AAA games like Grand Thevdato. I'm curious, the reasoning behind this like urge to understand the physical world and emulate it even. I had a conversation with Demis about this who's our CEO at Deep Mind and someone who's been pushing on this for a long time. I think a lot of this goes back to like there's two. dimensions. It goes back to like the original ethos of like why deep mind was created and a bunch of the work, the initial work that was happening in deep mind around reinforcement learning. If folks haven't seen this like one of the challenges of like again making AI work is that you need this like flywheel

Starting point is 00:34:51 of like continuing to like iterate and you need a reward function, which is like what is the actual outcome that you're trying to to achieve. And the thing that's interesting about, these like simulated environments is it's really easy to have like a constrained world and be and you're it's really easy to also or not maybe really easy is overly ambitious it's possible to define a simple reward function and then actually infinitely scale this up and the opposite example of this if folks have saw there was some like work a very long time ago and this is like in the AI weeds but there was this like hand, this physical hand that could like robotic hand that could manipulate a Rubik's cube. And they were using AI to like help try to solve this Rubik's cube. And the, again, the analogy

Starting point is 00:35:45 of why this, of why Jeannie and some of this work is so interesting is if you were to go and try to like, hey, we need all the data to go and try to make this little hand, physical robotic hand, be able to do this. It's actually really challenging to scale that up. You need to go and build a bunch of hands. You need to like, what happens when the Rubik's cube drops? You need to have some system to like go and pick a pack up and you just like go through the long tail of this stuff. The hand probably can't run 24 hours a day. Like there's all these challenges with getting the like data in that environment to scale up. And these virtual environments don't have this problem, which is if you can emulate and like self-driving cars is another example of this.

Starting point is 00:36:25 Like again, for folks who aren't familiar, lots of, you know, there's lots of real world data that's involved in self-driving cars, there's also lots of simulated environments where they've built simulations of the world, and this is how they can get like a thousand X scale up of this like data understanding is by having these simulated environments. Robotics will be exactly the same.

Starting point is 00:36:46 If you want robotics to work, it's almost 100% true that you're going to have to have these simulated environments where the robot can fall down the stairs a thousand times, and that's okay because it's a simulated environment and it's not actually going to fall down your stairs. So I think Jeannie is, there is definitely like an entertainment aspect to it. I think it's more so going to be useful for this like simulated environment to help us

Starting point is 00:37:12 not have to do things in the real world and but still have like a really good proxy of what will happen in the real world when we do them. That's pretty funny. I spent the weekend watching the world robot Olympics and there was some very real fails and crashes of these robots, which is pretty funny. Okay, so when I think of Jeannie, I think that it blows my mind

Starting point is 00:37:37 because I still can't get my head around how it predicts what I'm going to look at. I remember seeing this demo of someone just taking a simple video of them walking, and it was like a rainy day on a gravel path, and they stuck that into Jeannie 3, and they could look down and see their reflection in the puddles.

Starting point is 00:37:58 So the physics was astoundingly accurate and astute. Can you give us a basic breakdown of how this works? Is this like a real engine, game engine, like happening in the background? Or is there something more deeper happening? Like, help us understand. My intuition and we can gut check this with folks on the research side to make sure that I'm not fabricating my intuition. But if folks have an intuition as far as like how next token prediction works, which is at some given, like if you're looking through it,

Starting point is 00:38:28 a sentence of text. For each word in that sentence, there's a distribution between like zero and one, basically, of like how likely that word was to be the next word in the sequence. And if you look the, and if you like look through, this is like the basic principle of LLMs. This is why you get like the, you know, if you're to ask the same question multiple times, the LLM will inherently perhaps give you a different answer. And that's why like small change. and the inputs to LLMs actually change this because, like, again, it's this distribution. So, like, if you make one letter difference, it perhaps, like, puts you on a, like, a branching trajectory that looks very different than what the original output that you got

Starting point is 00:39:14 from the model. Similar, similar, like, rough approximation of this, just, like, much more computationally difficult. And I think they use a bunch of architectural differences that sort of, it's not truly next token prediction that's happening for the like pixels colors bunch of other things yeah exactly yeah so it's like you can like roughly map the mental model of like as the as a model looks down or as like the figure looks down in some in some environment like again it has all this like context of the state of the world but then it also knows like what are the pixels that are preceding it etc etc it like loosely

Starting point is 00:39:53 is doing this like next next pixel prediction you could you could sort of approximate with that's happening at the at the genie level, which is, which is an interesting way to think about it. So, Ejiz, one of the things you were mentioning was that it's happening much faster, right? And it's happening presumably much cheaper because now I heard this crazy stat. You're at like 500 trillions of tokens per month that is being pushed out by Gemini. It's unbelievable. And I want to get into the kind of infrastructure that enables this because Gemini is feeling faster, but it's also feeling better. And it's also getting cheaper. And behind you earlier in the show, mentioned, you have a TPU. I understand TPUs are part of this solution. And I want you to kind of

Starting point is 00:40:34 just walk us through how this is happening. How are we getting these quality and improvements across the board? And what type of hardware or software is enabling that to happen? I think like one, you have to give credit to like all of these infrastructure teams across Google that are making this happen. If you think, and I think about this a lot, like, what is Google's differentiated advantage? What does our expertise lend us well to do in the ecosystem? What are the things we shouldn't do because of that. What are the things we should do because of that is something I think about as somebody who builds products. One of the things that I always come back to is our, infrastructure. And like, the thing Google has been able to do time and time again is scale up

Starting point is 00:41:12 multiple products to billions of users, have them work with high reliability, et cetera, et cetera. And that's like a uniquely difficult problem. It's a even more difficult problem to do in the age of AI where the software is not deterministic. The sort of compute footprint required to do these things is really difficult. The models are a little bit tricky and finicky to work with sometimes. So again, like our infrastructure teams have done an incredible job making that scale up. I think the stat was I.O. 2024, we were doing roughly 50 trillion tokens a month. I owe 2025, I think it was like 480 trillion tokens a month, if I remember,

Starting point is 00:41:55 correctly. And just a month or two later, and this was in the conversation I had with Demis, we crossed a quadrillion tokens, which comes after a trillion, if you're not. You haven't thought about numbers higher than a trillion before. It's what comes after a trillion. And there's no flow down in sight. And like, I think this is just a great reminder of like so many of these AI, like, markets and product ecosystems is still so early. And there's this massive expansion. I think about in my own life, like, how much AI do I really have in my life helping me, like, not really that much on the margin? It's like, you know, maybe tens of millions of tokens a month maximum. And like, you think about a future where there's like billions of

Starting point is 00:42:40 tokens being spent on a monthly basis in order to help you and whatever you're doing in your professional life, in your work, in your personal life, whatever it is. We're still so early. And TPUs are a core part of that because it allows us to, like, control every layer of the hardware and software delivery all the way to the actual, like, silicon that the model is running on. And we can do a bunch of optimizations and customizations that other people can't do because they don't actually control the hardware itself. And there's some good examples of the things that this enables.

Starting point is 00:43:14 One of them is, you know, we've been at the Pareto Frontier from a cost performance perspective for a very long time. And again, if folks aren't familiar, the Pareto Frontier is this like tradeoff of costs and intelligence and you want to be on the highest intelligence, lowest cost. And we've been sitting on that for, you know, basically the entirety of the Gemini life cycle so far, which is really important.

Starting point is 00:43:37 So people get a ton of value from the Gemini models. Another example of this is long context. Again, if folks are familiar, there's a limit on like how many tokens you can pass to a model at a given time. Gemini's had a million or two million token context windows since the initial launch of Gemini, which has been awesome. And there's a bunch of research showing

Starting point is 00:43:58 we could scale that all the way up to 10 million if we wanted to. And that is like a core infrastructure enabled thing. Like research, there's a lot of like really important research to make that work and make that possible. But it's also really difficult on the infrastructure side. And you have to be willing to do that work and pay that price. And it's a beautiful outcome for us because we have the infrastructure.

Starting point is 00:44:19 teams that have the expertise to do this. Okay, Logan, one quadrillion tokens. That's a big number. We need to talk about this for a little bit because that is an outrageously, mind-bendingly, big number. And when I hear you say that number, I think I'm reminded of Jevin's paradox for people who don't know, it's increased technological efficiency in using a resource, which can lead to higher total consumption of that resource. So clearly, with these cool new TPUs, this vertically integrated stack you've built, you are able to generate tokens much more cheaply and produce a lot more of them. there for hence the one quadrillion tokens. Do you see this trend continuing? Is there going to be a continued need to just produce more tokens? Or will it eventually be a battle to produce smarter tokens?

Starting point is 00:44:58 I guess the question I'm asking is it, is the quality of the token more important than the amount of the tokens? And do you see a limit in which the quantity of the tokens starts to like kind of go off of a cliff in terms of how valuable it is? Yeah, I could buy that story. And some of this is, And it's something that's actually super top of mind for our teams on the like Gemini model side is around this whole idea of like thinking efficiency, which is like ideally you want to get to the best answer using the limited amount of thoughts possible. Same thing with humans. Like ideally like you're the example of like you're taking a test. You want to as you know, the shortest number of mental hops possible to get you to the answer of whatever the question was is ideally what you want. You don't want to have to just like think for an hour to answer one question.

Starting point is 00:45:43 And there's a bunch of odd parallels in that world to like models and humans doing this approach. So I do think thinking efficiency is top of mind. You don't want to just like use tokens for the sake of tokens. I think even if we were to like 10x reduce the number of tokens required, which would be like awesome and would be like a great innovation. The models are like much more token efficient. I do think there's like a pretty low ceiling to how far. far that will be able to go specifically because of this like next token prediction paradigm of like how the models actually approach solving problems using using like the token as a unit.

Starting point is 00:46:27 So it's not clear to me that you'll be able to just like, you know, a thousand X reduce the amount of tokens required to solve a problem. I think it probably looks much more like 10x or something like that. And then there'll be a 10x reduction in the number of tokens required to solve a problem. And there'll be a 10,000 X in. increase in the total amount of AI and sort of token consumption in the world. So I think you probably, even if we made that reduction happen, I think the graph still looks like it's going up into the right for the most part. It still keeps going. There is no wall. We have virtual data to train models on. We have tons of new tokens coming into play. There's another question I wanted to ask, which is

Starting point is 00:47:04 just a personal question for you, which is a feature that, because I find when a lot of people leave comments on the show and they talk about their experience with AI. A lot of them are just using like chat GPT on their app or they have GROC on their phone. And I think Gemini kind of has some underrated features that don't quite get enough attention. So what I'd like for you to do is maybe just highlight one or two of the features you shipped recently that you think is criminally underrated. What should people try out that you think not enough people are using? I think the one that continues to surprise me the most is deep research. I think deep research is just like a is the North star for building an AI product experience.

Starting point is 00:47:42 And if folks aren't familiar with this, so you can show up with, yeah, it's so, you can show up with like a pretty ill-defined question that's like very open and vague. And the model will traverse essentially across the internet, hundreds or thousands of different web pages, try to accumulate enough context,

Starting point is 00:48:02 and then come back to you with, uh, initially, basically like a research report, could be like a 40 page report in some cases that I've seen. seen, you might hear a 40-page report and say, that's not very useful to me because I'm not going to read 40 pages. And I'd say, you and me are exactly the same because I'm not reading 40 pages either. There's a beautiful feature. Again, if you've used notebook LM, this audio overviews feature,

Starting point is 00:48:25 the same thing actually exists inside of the Gemini app with deep research, which you can just press that button and then get like a, you know, 10, 15-minute podcast that sort of goes through and explains all the different research that's happened. You can, you know, listen to that on your commute or something like that are on a walk, not need to read 40 pages, which is awesome. The part of this that makes it such an interesting experience to me is, I don't know if other people have felt this before, but most AI products, back to that, Josh, that blank slate problem or that like empty chat box problem, you as the user of the product have to put in so much work in order to get useful stuff.

Starting point is 00:49:03 I talk to people all the time who are like, yeah, I use these models and like, they're just not useful for me. And like actually what's happening behind the scenes is the models are super capable. They're really useful. It just requires that you give the models enough context. And I think deep research, there's this new emerging like prompt engineering 2.0 is this context engineering problem where it's like, how do you get in the right information so that the model can make a decision on behalf of a user?

Starting point is 00:49:27 And I think deep research is this really nice balance of going and doing this context engineering for you, bringing all that context into the window of the model. and then being able to answer what your original question was. And principally showing you this like proof of work up front. I think about this proof of work concept in AI all the time, which is I have so much more trust in deep research because as soon as I kick off that query, it's like, boom, it's already at like 50 web pages.

Starting point is 00:49:55 I'm like, great, because I was never going to visit 50 web pages. Like there's pretty much nothing that I'm researching. I could be going and buying a car and I'm going to go and look at less than 50 web pages for that thing or a house. I'm looking at less than 50 web pages. Like, I'm just, it's not in the car. This is maybe personal to me and other people are doing more research. I don't know.

Starting point is 00:50:15 But so automatically I'm like in awe with how much more work this thing is doing. I think there's this is, again, this is the North Star from a AI product experience standpoint. And there's so few products that have like made that experience work. And it just every time I go back to deep research, I'm reminded of this and that team crushed it. And it's not just. deep research from a LLM context that is so fascinating about Google AI, you guys have created some of the most fascinating tools to advance science. And I don't think you guys get enough

Starting point is 00:50:52 flowers for what you guys have built. Some of my favorites, Alpha Fold 3 is crazy. So, you know, this is this model that can predict what certain molecular structures are going to look like. And this could be applied to so many different industries, the most obvious being drug design, creating cheaper, more effective, curable drugs for a variety of different diseases. Then I was thinking about that random model that you guys launched,

Starting point is 00:51:20 where apparently we could translate what dolphins were saying to us and vice versa. Kind of stepping back from all of these examples, can you help me understand what is Google's obsession with AI and science and why you think it's such an important area to focus on. Are we at a point now where we can advance science, you know, to infinity? Or where are we right now?

Starting point is 00:51:42 Are we at our chat GPT moment or do we have more to go? I'll start with a couple of cheeky answers, which Demis, who is the only, you know, Foundation Model Lab CEO to have a Nobel Prize in the science domain, which is for him, for him chemistry, had this comment, which is actually really true. There's lots of people talking about this like impact of AI on science and humanity. And there's very few, if not, only one being deep mind research lab that's like actually doing the science work. And I think it's this like great example of like deep mind.

Starting point is 00:52:20 It's just being in the like culture and DNA of like Demis as a scientist, all of these folks around deep mind are scientists and they like want to push the science and push what's possible in this future of discovery. using our models. And I was in London a couple of weeks ago, meeting with Pushmead, who leads our science team, and hearing about sort of like the breadth of the science that's happening and how like Dolphin Gemma is like a great,

Starting point is 00:52:46 like kind of like funny example, because it's not super applicable in a lot of cases, but it's interesting to think about Alpha Fold. Like if folks haven't watched the movie, The Thinking Game, it's about sort of the early days of Google Deep Mind and they're talking about like folding proteins and why this is such an interesting space.

Starting point is 00:53:09 And I'm not a, not a scientist, but to hit on the point really quickly of like why Alpha Fold is so interesting, the historical context is humans to fold a single protein would take many humans, millions of dollars, and it would take on the order of like five years in order to fold a single protein. The original impetus and why Demas won the Nobel Prize for this in chemistry was because DeepMind was able to figure out using reinforcement learning and other techniques.

Starting point is 00:53:43 They folded every protein in the known universe. Millions of proteins released them publicly, made them available to everyone. And it was like, you know, dramatically accelerated the advancement of like human medicine and a bunch of other domains and disciplines. and now actually with isomorphic labs, which is part of Deep Mind, actually pursuing some of the breakthroughs that they found and actually doing drug discovery and things like that.

Starting point is 00:54:10 So, like, overnight, you see that hundreds of thousands of human years and hundreds of millions of dollars of, like, research and development costs saved through a single innovation. And I think we're going to continue to see that, like, acceleration of new stuff happening. A recent example of this, Alpha Evolve, which was our sort of like geospatial model that came out

Starting point is 00:54:36 and being able to like fuse together all of this, the Google Earth engine with AI and this understanding of the world. Like it's just so much cool science and so much is possible when you sort of layer on the AI capability in all these disciplines. So I think to answer the question, I think we're going to see this acceleration of science progress. I think Deep Mind's going to continue to be at the forefront of this. which is really exciting.

Starting point is 00:55:02 And the cool thing for even for people who aren't in science is all of that innovation and the research breakthroughs that happen, it feeds back to the mainline Gemini model. Like we had a bunch of research work about doing proofs for math. And it's like, that's not very interesting at the face value. But like that research fuels back into the mainline Gemini model. It makes it better at reasoning. It makes it better able to like understand these like really long and difficult

Starting point is 00:55:30 problems, which then benefits like every, like, agent use case that exists because the models are better at reasoning through all these, like, difficult problem domain. So there is this like really cool research to reality, science to like practical impact flywheel that happens at deep mind. As a former biologist, this warms my heart. This is amazing to see this get applied at such scale. Okay, we can't talk about Google AI without talking about search. And this is your bread in butter, right? However, I've personally noticed a trend shift in my habits. I've used a computer for decades now, and I've always used Google Search to find things, Google Chrome, whatever it might be. But I've now started to cheat on this feature. I have started using LLMs directly to do all my

Starting point is 00:56:19 searching for me, to get all my sources for me. And you've got to be thinking about this slogan, right? Is this eating the search business? Is this aiding the search business? Or Are we creating a whole different form factor here? What are your thoughts? There's an interesting form factor discussion. I think on one hand, the AI sort of answer market is definitely distinctly different, it feels like, than the search market to a certain degree. Like, I think we've seen lots of AI products reach hundreds of millions of users,

Starting point is 00:56:48 and search continues to be a great business, and there's billions of people using it and all that stuff. There's also this interesting question, which is, like, what's the obligation of Google in this moment? of this platform shift and all this innovation that's happening. And as somebody who doesn't work on search, but is a fan of all the work that's happening inside of Google and has empathy for folks building these products, it is really interesting. And my perspective has always been that search actually has this,

Starting point is 00:57:18 you know, as the front door to the internet, has this stewardship position that makes it so that they actually can't disrupt themselves for the right reasons at the same pace that, that sort of, you know, small players in the market are able to do. And my assertion has always been that, like, actually, this is the best thing for the world. The best thing for the world and for the internet and for this entire economy that Google has enabled through the internet and bringing people to websites and all this stuff doesn't benefit by, like, you know, day one of the LLM revolution happening all of a sudden.

Starting point is 00:57:53 It's like a fully LLM powered search product and, like, feels and looks completely different. only I think whether I throw users who are still trying to figure out, like, how do I use this technology? What is the way that I should be engaging with it? What are the things that it works well for and it doesn't work for? Not only to throw those people into a bad perspective from like a user from a user journey, but I think it also has impacts on like people who rely on Google from a business perspective. So I think you've seen this sort of like gradual transition and like lots of shots on goal and lots of experiments. How on the search side.

Starting point is 00:58:30 And I think we're now getting to the place where they have confidence that they could do this in a way that is going to be super positive for the ecosystem and is going to create lots of value for people who are going and using these products, like the understanding of AI technology has increased the adoption and the models have gotten better and hallucinations have gone down and all this stuff. And I think there'll be also some like uniquely search things that like only search can do. I've spent a bunch of time with folks on the search team like Robbie Stein as an example who leads all the AI stuff in search.

Starting point is 00:59:06 And there's all of this infrastructure that search has built, which as you think about this age of AI where the ability to generate content, which actually looks somewhat plausible, has basically gone to zero. Like it's very easy to do that. Great search is actually more like this premium is like more important than ever. there's going to be a million X or a thousand X or whatever than X number of like growth and content on the internet. How do you actually get people to the most relevant content from people who have authority who have, you know, expertise and all this stuff? It's a really difficult

Starting point is 00:59:43 problem. And it's like it is the problem of the decade that like search has been solving for the last 20 years and is now a more important problem than ever. So I'm I've never been more excited for the search team. And I think they've never had a bigger challenge ahead of them as they try to figure out how to make these internet scale systems that they build continue to scale up to solve this next generation of problems while also becoming this frontier AI product experience where billions of people are experiencing AI for the first time in a different way than they've done.

Starting point is 01:00:15 There's so many, there's so many interesting use cases too, even around like image search is a great example of this new sort of, it's like one of the fastest growing ways in which people are using search now is showing up with an image and asking questions about it. And just like the way, the way people had to traditionally use search

Starting point is 01:00:36 is already changed. It's like different than it was five years ago or even two years ago. I think we're gonna continue to see that happen. I think search as a, the product you see today will evolve to have things like multi-line text input fields as sort of user questions change and all that stuff.

Starting point is 01:00:55 So there's so much cool stuff on the horizon for a search that I'm really excited. Yeah, as I'm hearing you describe all of these cool new things, particularly funneling into a single model. Like the science breakthroughs are unbelievable. And I think that's what gets me personally really excited like EJAS is this is actually going to help people. Like this is going to make a difference of people's lives. Right now it's a productive thing.

Starting point is 01:01:16 It's a fun thing. It's a creative thing. There's a lot of tools. But then there's also the science part. And a lot of this all funneling down to one amazing model, I think it leaves us in a really exciting place to wrap up this conversation. So Logan, thank you so much for coming and sharing all of this, sharing the news about the new model, sharing all of the updates and progress that you're making everywhere else.

Starting point is 01:01:33 I really enjoyed the conversation. For you, you also have a podcast called Around the Prompt. Is there anything you want to leave listeners with to go check it out or to check out the new AI studio or the new AI model? Let us know what you have interesting going on in your life. I love seeing feedback about AI Studio. So if you have things that don't work that you wish worked, even for both of you, please send them to me.

Starting point is 01:01:53 Would love to make things better. For the new model as well, like if there's, I think this is like still, this is still early days of what this model is going to be capable of. So if folks have feedback on like edge cases or use cases that don't work well, please reach out to our team, send us examples on, on X or Twitter or the like. Would love to help make some of those use cases come to life.

Starting point is 01:02:16 And I appreciate both of you for, for all the thoughtful questions and for the conversation. This was a ton of fun. We got to do it again sometime. Awesome. Yeah, we'd love to you anytime. Please come and join us. We really enjoy the conversation.

Starting point is 01:02:27 So thank you so much for watching for the people who enjoyed. Please don't forget to like, share it with your friends and do all of the good things. And we'll be back again for another episode soon. Thank you so much. I have a fun little bonus for those of you still listening all the way at the end. The Real Fans. When we were first going to record with Logan, we actually had no idea that he would break the exclusive news of Nanobanana on our show.

Starting point is 01:02:49 It was super cool. So we wanted to kind of restructure the episode. to prioritize that at the front, we did record a separate intro where I said, hey, Google makes some really good stuff. In fact, you guys have an 80-something percent chance of being the best model in the world by the end of this month. Can you explain to us why? Why Google is so amazing at what they do? And this was the answer to that question. So here's a nice little nugget for the end to take you out of the episode. Thanks for listening. I really hope you enjoyed and we'll see you guys in the next. My general worldview of like why Google is in such a good place for AI right now,

Starting point is 01:03:18 there's many layers of this depending on sort of what vantage point you want to look at. I think on one hand it's like I think search is this like incredible part of this story, which I think people have historically looked at Google Search as this legacy Google product. And I think search is going through this transition and is actually like today actually just announced as we're recording this earlier that AI mode is rolling out to 180 plus. countries, English supported right now and hopefully other languages in the future. And is a great example of AI overviews and AI overviews sort of double-clicking into AI mode being this product that actually like for many people around, for billions of

Starting point is 01:04:01 people around the world is the first AI product experience that they actually touch. And I think there's like something really interesting where like Google has been on this mission of like deploying AI and like, you know, there's some, you know, some naysayers on Twitter will be like, you know, Google created the transformer and then did nothing with it. And it's actually very far from the truth, which is search has been this like transformer, which is the architecture that powers language models and Gemini, has been powering that experience with this technology for the last like seven years.

Starting point is 01:04:33 The product experience maybe looks slightly different today than it did then. But Google's been an AI first company for as long as I can remember. Basically, as long as AI has existed, that's been the case. And now we're seeing more and more of these product surfaces, like, become these frontier AI products as sort of Google builds the infrastructure to make that the case. I think people also forget, like, it's not easy logistically to deploy AI to billions of people around the world. And now as you look at like, I think Google has like five or six billion plus user products. So the challenges of like even just making a small AI product work today, if anyone's played around, with stuff or tried vibe coding something like it's not easy doing that at the billion user scale is also very difficult um

Starting point is 01:05:19 so i i continue to be more and more bullish and like part of the thing that allows us to do that billion user scale deployment is the whole infrastructure story like if you're watching on video i don't know if you can see but i have a couple of tPUs sitting behind me um yeah and like that tp u advantage which is our sort of equivalent to GPUs um is something that i think is going to continue to play out so there's There's so many things that I get excited about, and the future is looking very bright.

Limitless: An AI Podcast - Announcing Google's Secret New AI Model With The Person Who Built It | Logan Kilpatrick

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.