Limitless Podcast - Announcing Google's Secret New AI Model With The Person Who Built It | Logan Kilpatrick

Starting point is 00:00:00 The best image generation and editing model in the world. It's scary how realistic this stuff is. V-O-3 has kind of like killed the VFX Studio. And this is, I think, principally enabled by vibe coding. My hope is that it actually ends up creating more opportunity for the experts and the specialists. How much the tools that you build do you find are built with vibe coding? I'm like almost 85% of everything that I do vibe-coded. I remember when I first booted up a PC and I just had access to all these different wonderful applications all within one suite.

Starting point is 00:00:28 This kind of feels like that moment for AI. Gemini is feeling faster, but it's also feeling better, and it's also getting cheaper. What's happening behind the scenes? We cross quadrillion tokens, which comes after a trillion if you're not. I haven't thought about numbers higher than a trillion before. It's what comes after a trillion. And there's no slowdown in sight. We have an incredibly exciting episode today because we are joined by Logan Kilpatrick.

Starting point is 00:00:55 Logan is the product lead working on the Gemini platform at Google DeepMind. We have an exciting announcement to break. right here today with Logan, which is the announcement of a model that we previously knew as nanobanana. The reality is this is a brand new image generation model coming out of Google, and you can access it today. So Logan, tell us about this brand new model and what we need to be excited about. Yeah, for people who are not chronically online and seeing all the tweets and everything like that, part of the excitement has been, and over the last, I think, like, six months, we've seen the emergence of like native image generation editing models.

Starting point is 00:01:28 historically, you would see models that could actually do a really good job of generating images. They usually tend to be very, like, beautiful, aesthetic images. The challenge was, like, how do you actually use these things in practice to do a lot of stuff? That's where this editing capability is really helpful. And then so we started to see these models that can actually edit images. If you could provide an image, it would, and then prompted, it would actually change that image. What's really interesting, though, is this fusion of those two capabilities, with the actual base intelligence of the Gemini model.

Starting point is 00:02:01 And there's a lot of really cool ways in which this like manifests itself. And we'll look at some examples of this, but it's this benefit of the world knowledge. The model is like smart. So as you as you ask it to do things and as you ask it to make changes, it doesn't just like take what you're saying at face value. It takes what you're saying in the context of its understanding of the world. And it's understanding of physics, its understanding of light and all this other stuff.

Starting point is 00:02:25 And it makes those changes. So it's not just blindly. making edits or generations, they're actually grounded in reality and context in which that's useful. And we can look at some examples of this. My favorite thing is actually this editing capability. So this is an AI studio and we'll have a link somewhere, hopefully in the show notes that will let us do this. My friend Amar, who is on our team and drives all of our design stuff, build this and it's called Passed Forward. And what you can do is you can put in an image of yourself and it'll regenerate a version of yourself in this sort of like Polaroid-esque vibe following all the different

Starting point is 00:03:05 trends from the last 10 or 20, 30 years. So if you look at this example, this is from me from the 1950s and I'm sure I have a picture of my dad from the 1950s somewhere or my grandpa who looks somewhat similar to that. Here's me in the 1980s, which I love. Here's me. Some of these facial expressions are also different. Like you're showing your teeth more in some and then it's a smirk in others. That's super cool. I like this sweater. I actually have a sweater that almost looks exactly like this 1970s one, though. I don't like my hair in this 1970s one. Same with the 2000s. So one of the cool things about this new model and one of the features I think folks are going to be most excited about is this character consistency, which is as you took the original image and you made the

Starting point is 00:03:52 translation to this 1950s image, it actually looks like me still, which is really cool. So there's lots of these really interesting use cases. I think we'll go out with a sports card demo where you can sort of turn yourself into a, you know, a figurine sports card, which is really cool. So lots of really interesting examples like this. And another thing you'll notice is actually the speed. And this is where the underlying model is not, the code name was nano-banana. the actual model is built on Gemini 2.5 Flash, which is our workhorse model. It's super fast. It's super efficient.

Starting point is 00:04:28 It's relatively priced in the market, which is awesome. So you can actually use it at scale. And yeah, so this model behind the scenes are for developers who people want to build with it as Gemini 2.5 Flash image, which is awesome. So this is a use case that I love. And it's a ton of fun. You can do this in the Gemini app or in AI Studio. I mean, as you said,

Starting point is 00:04:51 the character consistency just from these examples is like astounding. I need to give a round of applause. This has been my biggest issue when I'm generating images of myself. Genuinely. And Josh and I are early users

Starting point is 00:05:05 of mid-Journey V1, Open AIs, image generator as well. And one of our pet peeves was it just couldn't do the most simplistic things, right? We could just say, hey, keep this photo and portrait of me exactly the same, but can you show me

Starting point is 00:05:19 what I would look like in a different hairstyle? or me holding a model of Coca-Cola instead of this martini. And it just could not do that, right? Just simple video, like photo editing. Can you give us a bit of a background as to what Google did to be able to achieve this? Because, you know, I've been racking my head around like, why other AI companies couldn't do this? Like, what's happening behind the scenes?

Starting point is 00:05:41 Can you give us a bit of insight? Yeah, that's a good question. I think this actually goes back to, and I'll share another example in a second as well. But I think this goes back to the story of what happens when you build a model that has the fusion of all these capabilities together. And I was actually just, this is a sort of parallel example to this, but it's another example of why building a unified model to do all this stuff and not having a separate model that doesn't have world knowledge and all these other capabilities is useful. The same thing is actually true on video. Like part of the story and we haven't, we have a bunch of stuff coming that sort of tells this a little bit more elegantly than I will right now. But part of the story of like V-O-3 having this really state-of-the-art video generation capabilities, if folks have seen this, is that the Gemini models themselves have this state-of-the-art video understanding capabilities.

Starting point is 00:06:31 Oh. And a very similar context, actually, on the image side, which is since the original Gemini model, we've like, with the exception of probably a couple of months in that like two-and-a-half-year time horizon, have had state-of-the-art image understanding capabilities. And I think there is this like capability transfer, which is really interesting as you go to do the generation step. And if you can fuse those two things together in the same model, you end up just being able to do things that other models aren't able to do. And this was part of the bet originally that like why build Gemini to be the original Gemini, Gemini 1.0 model was built to be natively multimodal. It was built to be natively multimodal because the belief at the time, and I think this is turning out to be true, is that like that's on the path to AGI. that you combine these capabilities together, and like similar to what humans are able to do,

Starting point is 00:07:22 we have this fusion of all these capabilities in a single entity, just like these models should be able to do. Wow. So if I were to distill what you just said, here, Logan, the way you've trained Gemini 2.5 or all future Google Gemini models is it's in a very multimodal fashion. So you're basically, it gets smarter in one particular facet,

Starting point is 00:07:45 which trains itself or has. has transferable capabilities to other facets, whether it's image generation, video generation, or even text LLMs to some extent. I just think that's fascinating. I'm curious. I have one question for you, which I want to hear your take on.

Starting point is 00:07:59 How are you going to surface this to the regular consumer, right? Because right now, you provide all of these capabilities through an amazing suite, you know, called Google AI Studio. But if I wanted to use this in, say, an Instagram app or my random photo imaging editing app, Is this something that could be easily proved to someone or sourced or do we need to go via some other route right now? Let me just diverge really quickly, which is if any of the researchers who I work with are watching this, they will tell me, they'll make sure that I note that capability transfer that we just talked about.

Starting point is 00:08:36 You oftentimes don't get that out of the box. So there is some, like, there is some emergence where like you get a little bit of that. You do have to do like there's like real true research and engineering work that has to happen. to make sure that that capability fusion happens. It's not often that you just make the model really good at one thing and then it translates. Oftentimes, actually, it has a negative effect, which is as you make the models really good at code,

Starting point is 00:08:59 for example, you trade that off against some other creative writing as a random example of this. So you have to do a lot of active research and engineering work to make sure that you don't lose a capability as you make another one better. But then ultimately, they benefit, if you can make them on the same level, they benefit from this interleaved capability together.

Starting point is 00:09:19 To answer the question about, like, where is this going to be available? The Gemini app is the place that, like, for by and large, most people should be going to. So if you go to Gemini.com, there'll be sort of a landing page experience that showcases this new model and makes it really easy and you can put in all your images and do tons of fun stuff, like the example that I was showing. If you're a developer and you want to build something with this, in AI Studio, we have this build tab. And that's what we were just looking at as an example of one of the applets that's available in the build tab. The general essence is that all of these applets can be forked and remixed and edited and modified so that you can keep doing all the things that you want to do with the AI capability built in.

Starting point is 00:10:02 So it'll continue to be powered by the same model. It'll do all that stuff, which is awesome. So there's lots of cool fusion capabilities that we have with this. Same thing with this other example that we were looking at. If you want to go outside of this environment, we have an API. You could go and build whatever. So if your website is AIPhotos.com or whatever, you could go and build with the Gemini API, use the new Gemini 2.5 Flash image model to do a bunch of this stuff, which is awesome.

Starting point is 00:10:29 Awesome. So while this is baking, I noticed you had another tab open, which means maybe there's another demo that you were prepared to share. There is another demo. This one I actually haven't tried yet. But it's this idea of like, how can you take a photo editing experience? and make it super, super simple. So I'll grab an image.

Starting point is 00:10:48 Actually, we'll take this picture, which is a picture of Demis and I. Legends. We'll put an anime filter on it. And we'll see. And so this is a completely vibe-coded UI experience, and all the code behind the scenes is vibe-coded as well. And we'll see how well this works with Demis and I. How much the tools that you build do you find are built with vibe-coding instead of just hard-coding software?

Starting point is 00:11:11 Are you writing a lot of this as vibe-coded through? the Gemini model? I think you sometimes you're able to do some of the stuff completely vibe-coded. It depends on how specific that you want to do. I do, I'm like almost 85% of everything that I do vibe-coded. Somebody else

Starting point is 00:11:27 on my team built this one, so I don't want to misrepresent the work. It could have all been human programmed because we have an incredible set of engineers. The general idea is how can you make this? Oh, interesting. How can you make this Photoshop like experience? Let's go.

Starting point is 00:11:43 or do you all have suggestions? What would a good filter for this be? I don't know. Oh, man. Yeah, like perhaps going back to the last example, maybe like a 90s film or an 80s film grain. All right. And I guess while we wait for that to load,

Starting point is 00:11:56 is there a simple way that you would describe nanobanana or this new image model to just the average person on the street who's, oh look, there we go. We have the film grain. Okay, so what we're watching for the people who are listening, you're retouching, you can retouch parts of the image. You could crop, adjust, there are filters to be applied. I'm just clicking through a button.

Starting point is 00:12:13 To be honest, I've never done that before. So it's been fun of playing. This is the exploration you're going to get to do as a user as you play around. Logan is vibe editing. That's what's happening. Yeah. He's experimenting. Vibe,

Starting point is 00:12:25 which is fun. I love it. That's a great way. And the cool thing, again, is like what I love about this experience is as you're going through, oh, interesting. This one's like giving me edited outline. Oh, yeah, a little outline. This is helpful for our thumbnail generation.

Starting point is 00:12:39 We do a lot of this stuff. Let's see if I can remove the background as well. Let's see. I should be in this removes the background. This is going to be trouble because this is a big feature that we use for a lot of our imagery. Hopefully. Come on. Oh, nice. Oh, done.

Starting point is 00:12:52 Nicely done. For those of you who are listening, he's typed in, put me in the Library of Congress. So we're going to hopefully see Logan. Yeah, the context on that image was that Demis and I were in the library of the deep mind office. Oh, nice. Yeah, so that's the Library of Congress reference in my mind. But yeah, so much that you can do, again, what I love about this experience is that as you go around and play with this stuff, if you want to modify this experience, you can do so on the left-hand side. If you say, actually, here are these five editing features that I really care about.

Starting point is 00:13:28 The model will go and rewrite the code, and then it'll still be attached to this new 2.5 flash image model. So you can do all these types of cool stuff. This experience is something that I'm really excited about that we've been pushing on. Yeah, this is amazing because I myself, I do photography a lot. I was a photographer in my past life and I rely very heavily on Photoshop and Lightroom for editing, which is a very manual process. And they have these smart tools, but they're not quite like this. I mean, this saves a tremendous amount of time if I could just say, hey, realign, restrain the image, remove the background, add a filter. I think the plain English version of this makes it really approachable, but also way faster. Yeah, it is, it is crazy fast. I think about this all the time. Like there's definitely cases where you want to, uh, go deep with whatever the pro tool is. I think there's actually something interesting like on the near horizon that our team has thought a lot about, which is how you can have this experience

Starting point is 00:14:23 and how you can sort of in a generative UI capacity have the experience sort of subtly expose additional detail to users. And I think about this, like if you're a new, you know, Photoshop user as an example and you show up, like the chance that you're going to use all of the bells and whistles is zero. You want the three things. I want to remove a background. I want to crop something, whatever it is.

Starting point is 00:14:46 Don't actually show this, all of these bells and whistles. I think the exciting thing about the progress on coding models is that in the future, the challenge with doing this in the present, rather, is that software is deterministic. You have to build software to build the sort of like modified version of that software for all of these different like skill sets and use cases is extremely expensive. It's not feasible. It doesn't scale to production. environment, but if you can have this generative UI capability where like the model sort of

Starting point is 00:15:16 knows and as you talk to the model, it realizes, oh, you might actually benefit from these other things. It can create the code to do that on the fly and expose them to you, which is really interesting. So I think there's lots of stuff that is going to be possible as the models keep getting better. This is amazing. So the TLDR on this new announcement, how would I, if I were to go explain to my friend what this does, why this is special, how would you kind of sell it to me? best image generation and editing model in the world, 2.5 flash image or nanobanana, whichever you prefer, is the model that can do this. And I think there's so many creative use cases where you're actually bounded by the creative tool. And I feel like this is one of these examples to me

Starting point is 00:16:01 where it's like I feel like I'm 10x more career. I was literally helping my friend yesterday doing a bunch of iterations on his LinkedIn picture because it was like, you know, the background was slightly weird or something like that. And we were just like, I did like 15 iterations. And now he's got a great new LinkedIn background, which is awesome. So like there's so many like actual practical use cases where you, and I literally just like built a custom tool on the fly vibe coding in order to solve that use case, which was a ton of fun. Yeah, this is so cool.

Starting point is 00:16:32 Okay. So now, so this model, nanobanana Gemini 2.5 flash image. And it's out today. So we'll link that in description for people want to try it out. I think one of my complaints for the longest time. and I've mentioned this on the show a few times, is a lot of times when I'm engaging with this incredible form of intelligence, I just have a text box. And it's up to me to kind of hold the creativity out of my own mind. And I don't get a lot of help along the way. But one of the things that

Starting point is 00:16:53 you spend your time in is this thing called Google AI Studio. And I've used AI Studio a lot because it solves a problem for me that was annoying, which is just the blank text box. It kind of has a lot of prompts. It has a lot of helpers. It has a lot of guidance into helping me extract value out of the model. So what I'd love for you to do for people who aren't familiar, Logan, is just kind of explain to everyone what Google AI Studio is and why it's so important and why so great. Yeah, I love this, Josh. I appreciate that you like using AI Studio. It is a labor of love.

Starting point is 00:17:24 Lots of people across Google have put in a ton of time to make progress on this. I really want to show, so I'll make a caveat, which is we have this entirely redesigned AI Studio experience that's coming very soon. I won't spoil it in this episode because it's like half-faked right now. and I wish I could show. And I think actually some of the features that you might see in this UI might be slightly different at launch time

Starting point is 00:17:46 than what you see here. So take this with a grain of salt. We've got a bunch of new stuff coming. And I think actually it should help with this problem that you're describing, which is as you show up to a bunch of these tools today, the onus is really on you as a user to try to figure out what's capable,

Starting point is 00:18:02 what all the different models are capable of, what even are all the different models, like all of that stuff. So at the high level, like we built AI Studio for this like AI builder audience. If you wanna take AI models and actually build something with them and not just chat to AI models,

Starting point is 00:18:20 this is the product that was built for you. We have a way to in this like chat UI experience, sort of play with the different capabilities of the model, feel what's what's possible, what is Gemini good at, what's it not good at, what are the different tools it has access to. But as you go into AI Studio, you'll see something that looks like this.

Starting point is 00:18:38 You know, we're highlighting a bunch of the new capabilities that we have right now, this URL context tool, which is really great for information retrieval, this native speech generation capability, which is really cool. Folks have used notebook LM, and you want to build a notebook LM like experience. We have an API for people who want to build something like that, and we have this live audio-to-audio dialogue experience where you can share a screen with the model and talk to it, and it can see the things that you see and engage with it. Of course, we have our native image generation and editing model, the old version 2.0 flash, now the new version, 2.5 flash, and lots of other stuff

Starting point is 00:19:14 that's available as you sort of experience what these models are capable of. So really, this playground experience is one version. We have this chat prompt. On the left hand side, we have this stream. This is where you can talk to Gemini and sort of share your screen. And actually, you can like show it things on the webcam and be like, what's this? How do I use this thing? You can do this on mobile as well, which is really cool. We have this generative media experience. where like if you want to build things with, we have a music model, we have Vio, which is our video generation model,

Starting point is 00:19:45 we have all the text to speech stuff, which is really cool. As I overwhelm people with so much stuff that you can do in AI Studio, the sort of key threat of all this is we built AI Studio to showcase a bunch of these capabilities, and everything you see in AI Studio has an underlying sort of API and developer experience.

Starting point is 00:20:02 So if you want to build something like any of these experiences, all of this is possible. There's like no Google secret magic that's happening pretty much anywhere in AI Studio. It's all things that you could build as someone using a vibe coding product or, you know, by hand writing the code. You could build all these things and even more. And that is the perfect segue to this build tab where we're trying to help also,

Starting point is 00:20:27 you know, actually help you get started building a bunch of stuff. So you can use these templates that we have. You can use a bunch of the suggestions. You can look through our gallery of a different stuff. And we're really in this experience trying to help you build AI powered apps, which we think is something that folks are really, really excited about. And we'll have much more to share around all the AI app building stuff in the near future. Awesome.

Starting point is 00:20:50 Thanks for the rundown. So as I'm looking at this, I'm wondering, who do you think this is for? What type of person should come to AI Studio and tinker around here? Yeah. So I think, you know, historically, and so you'll see a little bit of this transition if you play around the product where there's some interesting edges. is we were originally focused on building for developers. So it was built. And there is like a part of the experience,

Starting point is 00:21:12 which like is tied to the Gemini API, which tends to be used mostly by developers. So if you go to dashboard, you can see all your API keys and check your usage and billing and things like that. By and large though, I think the really cool opportunity of what's happening right now is this transition of like who is creating software. And this is I think principally enabled by vibe coding.

Starting point is 00:21:34 And because of that, like we've re-centered ourselves to be really focused on this AI builder persona, which is like people who want to build things using AI tools. Also people who are trying to build AI experiences, we think is going to be the like market that creates value for the world. So if you're excited about all the things that you're seeing, if you want to build things, AI studio is very much like a builder first platform. If you're just looking for like a great everyday AI assistant product, you, you know, want to get help on coding questions or homework or life advice or all that type of stuff, the Gemini app is the right place for this. It's very much like a DAU type of product where like you come back and it has

Starting point is 00:22:17 memory and personalization and all this other stuff, which makes it really great as like an assistant to help you in your life versus AI Studio. The artifact is like, we help you create something and then you go put that thing into the world in some way. And you don't necessarily. necessarily need to come back and use it every day. You use it whenever you want to build something. It's funny. I'm dating myself a bit here, but I remember when I first booted up a PC and I loaded up Microsoft Office and I just had access to all these different wonderful applications that were at the time super new or within one suite. This kind of feels like that moment for AI. And you might not take that as a compliment because it's a completely different company, but it was what I built

Starting point is 00:23:00 my childhood off of my fascination with computers. So I appreciate this and I love that it's, it's this massively like cohesive experience. But kind of zooming out, Logan, I was thinking a lot about Google AI and what that means to me personally. I have to say it's the only company that I think beyond an LLM. And what I mean by that is when I think of Google AI, I don't just think of Gemini. I think of the amazing image gen stuff that you have. I think of the amazing video outputs that you guys have. I think of the text to voice generation that you just demoed and all those kinds of things. I remember seeing this advert that appeared on my timeline.

Starting point is 00:23:43 And I remember thinking, wow, this must be the new GTA. Then I was like, no, no, that's Florida. That's Miami. Nope, people are doing wild stuff. That's an alien. Hang on a second. This can't be real. And then I learned that it was a Google V-O-3 generation of an advert for CalSh.

Starting point is 00:24:00 which is like this prediction markets situation. And I remember thinking, how on earth have we got to AI-generated video that is this high quality and this high fidelity? I think in my mind, V-O-3 has kind of like killed the VFX studio. It's kind of killed a lot of Hollywood production studios as well. Give me a breakdown and insight into how you built or how you guys built V-O-3.

Starting point is 00:24:28 and what that means for the future of movie, video production, and more. Yeah, that's a great question. I think there's something really interesting along these threads and not to push back on the notion that it's killing Hollywood. Because I think there is like, I think it's an interesting conversation. The way that I have seen this play out, and the great example of this, that folks have seen Flow, which is our sort of like creative video tool.

Starting point is 00:24:57 And if you're using VO and you want to sort of get the most out of VO, flow is the tool to do that. If you see lots of like the creators who are building, you know, minute long videos using VO and it's like this really cohesive story and it has like a clear visual identity, similar to what you'd get from like a probably not the extent of a Hollywood production, but like somebody thoughtfully choreographing a film, flow is the product to do that.

Starting point is 00:25:23 And actually interesting, like flow was built in conjunction with film. filmmakers. And I think that's actually like there is, and I feel this way about vibe coding as well. And it's this thought experiment that I'm always running through in my head, which is, you know, yes, I think AI is like raising the bar forever or it's raising the floor for everyone. We're like, now everyone can create. What does that mean for people who have expertise? And I think in most cases, what it means is actually the value of your expertise continues to go up. And like, this is my personal bet. And I don't know how much this tracks to like everyone else's worldview. My personal bet is that expertise in the world where the floor is lifted for everyone

Starting point is 00:26:03 across all these dimensions is actually more important because there was something about, and I think like video production is a great example for me because I would never have been able to make a video. Like it's not in the cards. Like for my skill set, my creative ability, my financial ability, like I will never be able to make a video. I can make things with Vio. And now I'm like a little bit closer to imagining like, okay, if I'm serious about this, I need to go out and like actually engage with people. And I've like sort of, it's like what did my appetite in a way that I don't think I would. It was just like too far in a way. And I think software is another example where vibe coding.

Starting point is 00:26:42 If you were to pull a random person off the street and you start talking to them about coding and C++ and deploying stuff and all this, they're like brain turns off, not interested. I don't want to learn to code. That's not cool. It's not fun. It sounds horrible. And then vibe coding rolls around. It's like, oh, wait, I can actually build stuff. And like, yeah, I don't really need to understand all the details.

Starting point is 00:27:04 But there's still a limit to what I can build. And who is actually well positioned to help me take the next step? Like I, you know, vibe code something. And I'm like, this is awesome. I share it with my friends. They all love it. I want to, you know, go build a business around this thing that I vibecoded. There's still a software engineer that needs.

Starting point is 00:27:23 needs to help make that thing actually happen. So if anything, it's like, it's increasing this, I mean, on the software side, there's this infinite demand for software, and it's increasing the total addressable market of like what software engineers need to help people build. I think it'll be something similar on the video side. You know, there will be downsides to AI technology in some ways.

Starting point is 00:27:43 I think there is like as the technology shifts happens, there is some amount of disruption that's taking place and like someone's workflow is being disrupted. But I do think there's this really interesting thread to pull on, which is my hope is that it actually ends up creating more opportunity for the experts and the specialists. So it sounds like you're not saying VFX studio teams are going to be replaced by software engineers,

Starting point is 00:28:08 but rather that team in itself will become more adept at using these AI tools and products to kind of enhance their own skill set beyond what it is today. Is that right? Yeah, yeah. And I think we've seen this already play out in, some ways, which is interesting. I think like code is like a little bit wider distribution than perhaps the VFX, and it's VFX also in a space that I'm less familiar with personally.

Starting point is 00:28:33 But yeah, I think I think this will, this is likely what is going to play out if I had to guess and bet. Can you help us understand how a product like V-O-3 gets used beyond just like the major Hollywood production stuff, right? Because I've seen a bunch of these videos now and I'll, I'll be honest with you, Logan. It's scary how realistic this stuff is, right? It's like from a high-quality AAA game demo

Starting point is 00:29:01 all the way to something that is shot like at an A-24 film, you know, the scenes, the cuts, the changes. I think it's awesome. I'm wondering whether that goes beyond entertainment anyway. Do you have any thoughts or ideas there? Yeah, that is interesting. I think one of the ones that is like related to, it's sort of one skip away from video generation itself,

Starting point is 00:29:25 which was Genie, which was our sort of world simulation work that was happening. I think if folks haven't seen this, go look up Jeannie 3 and you can see a video. It's mind-blowing. It's like a fully playable game world simulation. You can like prompt on the go and this environment will change. You can control it on your keyboard similar to a game.

Starting point is 00:29:42 I think that work translates actually really well to robotics, which is cool. So as you like one of the, if folks, aren't familiar with this, like one of the principal reasons we don't just have robots walking around everywhere. And the reason why we have LLMs that can actually do lots of useful stuff is it's this data problem, which is like there's lots of text data and other data that's like representative of the intelligence of humans and all this stuff that's available. There's actually not a lot of data that is useful for making robotics work. And I think VEO

Starting point is 00:30:16 could be part of, or like generally that sort of segment of VEO, video generation and this like physics understanding and all that other stuff, I think could be really helpful in actually making the long tail of robotics use cases work. Then I can finally have a robot that will fold my laundry so that I don't need to spend my time doing that. That's my like outside of entertainment bet as far as like where that use case ends up creating value in the world. With V-O-3, the goal is to enable humans to become a better version of themselves, a 10x, 100-x better

Starting point is 00:30:49 version of themselves using these different tools. So in the example of a VFX studio, you can now kind of like create much better movies. How does that apply for Genie 3 exactly, right? You gave the example of like being able to create simulated environments, but that's to train these robots. That's to train these models. What about us? What about the flesh humans that are out there? Can you give us some examples about where this might be applied or used? Yeah, that's a good example. I mean, the robot answer is like the robots will be there to help us, which is nice. So hopefully there's a bunch of stuff that you don't want to do that you don't want to do that

Starting point is 00:31:24 you'll be able to get your robot to do. Or there's like industries that are like dangerous for humans to operate in where it's like if you can sort of do that simulation without needing to collect a bunch of human data to do those things, I could see that being super valuable. I think my initial reaction to the genie use case, like I could see lots of actually the two that come to mind is like one entertainment I think will be cool.

Starting point is 00:31:51 Humans want to be entertained. It's a story as old as time. I think there will be some entertainment value of a product experience like Jeannie. I think the other one is actually back to a bunch of use cases where you'd actually want robotics to be able to do some of that work

Starting point is 00:32:08 that don't yet the robot product experience like isn't actually there. This could be things like, you know, mining or like heavy. industries, things like that where like there's actually like a safety aspect of like, how can you do these like realistic simulation training experiences in order to make sure that like you don't have to like physically put yourself in harm's way in order to like

Starting point is 00:32:32 understand the bounds or like the failure cases like disaster recovery, things like that where it would be you don't want to have to show up at a hurricane the first time to like really understand what the environment could be like. and being able to do those types of simulations is interesting, and building software deterministically to solve that problem would actually be really difficult and expensive and probably isn't a large market that lots of companies are going to go after.

Starting point is 00:33:01 But if you have this model that has really great world knowledge, you can throw all these random variables at it and sort of do that type of training and simulation. So yeah, it's perhaps an interesting use case. I don't know if there's actually a plan to use it for things like that, but those are things that come to mind. This is something I've been dying to ask you about because this is something that I've been fascinated by.

Starting point is 00:33:22 When I watched the Genie 3 demo for the first time, it just kind of shattered my perception of where we were at because you see at work, and I saw this great demo where someone was painting the wall. We actually filmed an entire episode about this, and it retained all of the information. And one theme, as I'm hearing you describe these things, as I'm hearing you describe V-O-3, Genie-3,

Starting point is 00:33:41 you're building this deep understanding of the physical world. and I can't help but notice this trend you are just starting to understand the world more and more and I could see this when it comes to making games as an example where a lot of people were using Genie 3 to just make these like not necessarily games but virtual worlds that you can walk around and interact with and I'm wondering if you could just kind of share

Starting point is 00:34:00 the long-term reasoning why because clearly there's a reason, there's a lot of value to it. Is it from being able to create maybe artificial data for robots? If you can emulate the physical world, you can create data to train these robots? Is it because it creates great experiences? Like perhaps we'll have AAA design studios I was using Genie 5 to make AAA games like Grand Thevdotto.

Starting point is 00:34:18 I'm curious, the reasoning behind this, like, urge to understand the physical world and emulate it even. I had a conversation with Demis about this who's our CEO at DeepMind and someone who's been pushing on this for a long time. I think a lot of this goes back to, like, there's two dimensions. It goes back to like the original ethos of like why DeepMind was created and a bunch of the work, the initial work that was happening in DeepMind around reinforcement learning. If folks haven't seen this, like, one of the challenges of, like, again, making AI work is that you need this, like, flywheel of, like, continuing to, like, iterate.

Starting point is 00:34:54 And you need a reward function, which is, like, what is the actual outcome that you're trying to achieve? And the thing that's interesting about these, like, simulated environments is it's really easy to have, like, a constrained world. and it's really easy to also, or not maybe really easy is overly ambitious. It's possible to define a simple reward function and then actually infinitely scale this up. And the opposite example of this, if folks have saw there was some work a very long time ago,

Starting point is 00:35:30 and this is like in the AI weeds, but there was this like hand, this physical hand that could like robotic hand that could manipulate a Rubik's cube. And they were using AI to like, help try to solve this Rubik's Cube. And again, the analogy of why Genie and some of this work is so interesting is, if you were to go and try to like, hey, we need all the data to go and try to make this

Starting point is 00:35:53 little hand, physical robotic hand, be able to do this, it's actually really challenging to scale that up. You need to go and build a bunch of hands. You need to like, what happens when the Rubik's Cube drops? You need to have some system to like go and pick it back up. And you just like go through the long tail of this stuff. the hand probably can't run 24 hours a day. Like there's all these challenges with getting the data in that environment to scale up.

Starting point is 00:36:18 And these virtual environments don't have this problem, which is if you can emulate and like self-driving cars is another example of this. Like again, for folks who aren't familiar, lots of, you know, there's lots of real world data that's involved in self-driving cars. There's also lots of simulated environments where they've built simulations of the world. And this is how they can get like a thousand X. scale up of this like data understanding is by having these simulated environments, robotics will be exactly the same. If you want robotics to work, it's almost 100% true

Starting point is 00:36:49 that you're going to have to have these simulated environments where the robot can fall down the stairs a thousand times. And that's okay because it's a simulated environment and it's not actually going to fall down your stairs. So I think Jeannie is there is definitely like an entertainment aspect to it. I think it's more so going to be useful for this like simulated environment to help us not have to do things in the real world and but still have like a really good proxy of what will happen in the real world when we do them that's that's pretty funny um i spent the weekend watching the world uh robot Olympics and there was some very real uh fails and crashes of these robots um which is pretty funny um okay so when i think of genie um i think that it blows my

Starting point is 00:37:37 mind because I still can't get my head around how it predicts what I'm going to look at. I remember seeing this demo of someone just taking a simple video of them walking. And it was like a rainy day on a gravel path. And they stuck that into Genie 3 and they could look down and see their reflection in the puddle. So the physics was astoundingly accurate and astute. Can you give us a basic breakdown of how this works? Is this like a real engine, game engine, like happening in the background, or is there something more deeper happening?

Starting point is 00:38:12 Like, help us understand. My intuition, and we can gut check this with folks on the research side to make sure that I'm not fabricating my intuition. But if folks have an intuition as far as like how next token prediction works, which is at some given, like if you're looking through a sentence of text, for each word in that sentence, there's a distribution between like zero and one, basically, of like how likely that word was to be the next word in the sequence. And if you look the, and if you like look through, this is like the basic principle of LLMs. This is why you get like the, you know, if you're to ask the same question multiple times. times, the LLM will inherently perhaps give you a different answer. And that's why small changes in the inputs to LLMs actually change this, because, like, again, it's this distribution.

Starting point is 00:39:03 So, like, if you make one letter difference, it perhaps, like, puts you on a, like, a branching trajectory that looks very different than what the original output that you got from the model. Similar, similar, like, rough approximation of this, just like much more computationally difficult. And I think they use a bunch of architectural differences that sort of, it's not truly next token prediction that's happening for the sort of world modeling. Like pixels, colors, a bunch of other things, yeah.

Starting point is 00:39:33 Exactly. Yeah. So it's like, you can like roughly map the mental model of like as the model looks down or as like the figure looks down in some in some environment. Like again, it has all this like context of the state of the world. But then it also knows like what are the pixels that are proceeding it, et cetera, et cetera. It, like, loosely is doing this, like, next, next pixel prediction you could sort of approximate with that's happening at the genie level, which is an interesting way

Starting point is 00:40:03 to think about it. So, Ejiz, one of the things you were mentioning was that it's happening much faster, right? And it's happening presumably much cheaper because now I heard this crazy stat. You're at like 500 trillion, hundreds of trillions of tokens per month that is being pushed out by Gemini. It's unbelievable. And I want to get into the kind of infrastructure that enables this, because Gemini is feeling faster, but it's also feeling better, and it's also getting cheaper. And behind you earlier in the show, you mentioned, you have a TPU. I understand TPUs are part of this solution. And I want you to kind of just walk us through how this is happening. How are we getting these quality and improvements across the board? And what type of hardware or software is enabling that

Starting point is 00:40:42 to happen? I think, like, one, you have to give credit to like all of these infrastructure teams across Google that are making this happen. If you think, and I think about this a lot, like, what is Google's differentiated advantage? What does our expertise lend us well to do in the ecosystem? What are the things we shouldn't do because of that? What are the things we should do because of that is something I think about as somebody who builds products? One of the things that I always come back to is our infrastructure.

Starting point is 00:41:08 And like the thing Google has been able to do time and time again is scale up multiple products to billions of users, have them work with, high reliability, et cetera, et cetera. And that's like a uniquely difficult problem. It's a even more difficult problem to do in the age of AI where like the software is not deterministic. The sort of compute footprint required to do these things is really difficult. The models are a little bit tricky and finicky to work with sometimes.

Starting point is 00:41:37 So again, like our infrastructure teams have done an incredible job making that scale up. I think the stat was I.O. 2024, we were doing roughly. 50 trillion tokens a month. I owe 2025, I think it was like 480 trillion tokens a month, if I remember correctly. And just a month or two later, and this was in the conversation I had with Demis, we crossed a quadrillion tokens, which comes after a trillion if you're not. You haven't thought about numbers higher than a trillion before. It's what comes after a trillion. And there's no slowdown in sight. And like, I think this is just a great reminder of like, So many of these AI, like, markets and product ecosystems is still so early.

Starting point is 00:42:23 And there's this massive expansion. I think about in my own life, like, how much AI do I really have in my life helping me, like, not really that much on the margin? It's like, you know, maybe tens of millions of tokens a month maximum. And, like, you think about a future where there's, like, billions of tokens being spent on a monthly basis in order to help you and whatever you're doing in your professional life in your work and your personal life, whatever it is,

Starting point is 00:42:48 we're still so early. And TPUs are a core part of that because it allows us to, like, control every layer of the hardware and software delivery all the way to the actual, like, silicon that the model is running on. And we can do a bunch of optimizations and customizations that other people can't do

Starting point is 00:43:09 because they don't actually control the hardware itself. And there's some good examples of the things that this enables. One of them, is, you know, we've been at the Pareto Frontier from a cost performance perspective for a very long time. And again, if folks aren't familiar, the Pareto Frontier is this like tradeoff of costs and intelligence and you want to be on the highest intelligence, lowest cost. And we've been sitting on that for, you know, basically the entirety of the Gemini life cycle so far, which is really important. So people get a ton of value from the Gemini models. Another example of this is

Starting point is 00:43:41 long context. Again, if folks are familiar, there's a limit on. on how many tokens you can pass to a model at a given time. Gemini's had a million or two million token context windows since the initial launch of Gemini, which has been awesome. And there's a bunch of research showing we could scale that all the way up to 10 million if we wanted to.

Starting point is 00:44:01 And that is like a core infrastructure enabled thing. Like research, there's a lot of like really important research to make that work and make that possible. But it's also really difficult on the infrastructure side and you have to be willing to do that work and pay that price. And it's a beautiful outcome for us because we have the infrastructure teams that have the expertise to do this. Okay.

Starting point is 00:44:22 Logan, one quadrillion tokens. That's a big number. We need to talk about this for a little bit because that is an outrageously, mind-bendingly, big number. And when I hear you say that number, I think I'm reminded of Jevin's paradox for people who don't know, it's increased technological efficiency in using a resource, which can lead to higher total consumption of that resource. So clearly, with these cool new TPUs, this vertically integrated stack you've built, you are

Starting point is 00:44:44 able to generate tokens much more cheaply and produce a lot more of them. Hence, the one quadrillion tokens. Do you see this trend continuing? Is there going to be a continued need to just produce more tokens? Or will it eventually be a battle to produce smarter tokens? I guess the question I'm asking, is the quality of the token more important than the amount of the tokens? And do you see a limit in which the quantity of the tokens starts to like kind of go off

Starting point is 00:45:10 of a cliff in terms of how valuable it is? Yeah, I could buy that story. story. And some of this is, and it's something that's actually super top of mind for our teams on the like Gemini model side is around this whole idea of like thinking efficiency, which is like ideally you want to get to the best answer using the limited amount of thoughts possible. Same thing with humans. Like ideally like you're the example of like you're taking a test, you want to as you know, the shortest number of mental hops possible to get you to the answer of whatever the question was is ideally what you want. You don't want to have to just like

Starting point is 00:45:41 think for an hour to answer one question. And there's a bunch of odd parallels in that world to like models and humans doing this approach. So I do think thinking efficiency is top of mind. You don't want to just like use tokens for the sake of tokens. I think even if we were to like 10x reduce the number of tokens required, which would be like awesome and would be like a great innovation. The models are like much more token efficient. I do think there's like a pretty low ceiling to how far that will be able to like, go specifically because of this like next token prediction paradigm of like how the models actually

Starting point is 00:46:21 approach solving problems using using like the token as a unit. So it's not clear to me that you'll be able to just like, you know, a thousand X reduce the amount of tokens required to solve a problem. I think it probably looks much more like 10x or something like that. And then there'll be a 10x reduction in the number of tokens required to solve a problem. And there'll be a 10,000 X increase and the total amount of AI and sort of token consumption in the world. So I think you probably, even if we made that reduction happen, I think the graph still looks like it's going up into the right for the most part. It still keeps going. There is no wall. We have virtual data to train models on. We have tons of new tokens coming into play. There's another question I wanted to ask, which is

Starting point is 00:47:04 just a personal question for you, which is a feature that, because I find when a lot of people leave comments on the show and they talk about their experience with AI, a lot of them are just using like chat GPT on their app or they have grok on their phone. And I think Gemini kind of has some underrated features that don't quite get enough attention. So what I'd like for you to do is maybe just highlight one or two of the features you shipped recently that you think is criminally underrated. What should people try out that you think not enough people are using? I think the one that continues to surprise me the most is deep research.

Starting point is 00:47:37 I think deep research is just like a is the North Star for building an AI product experience. And if folks aren't familiar with this, you can show up with, yeah, it's so you can show up with like a pretty ill-defined question that's like very open and vague. And the model will traverse, essentially across the internet,

Starting point is 00:47:56 hundreds or thousands of different web pages, try to accumulate enough context, and then come back to you with initially, basically like a research report, could be like a 40-page report in some cases that I've seen. You might hear a 40-page report and say that's not very useful to me because I'm not going to read 40 pages. And I'd say you and me are exactly the same because I'm not reading 40 pages either. There's a beautiful

Starting point is 00:48:20 feature. Again, if you've used notebook LM, this audio overviews feature, the same thing actually exists inside of the Gemini app with deep research, which you can just press that button and then get like a 10, 15 minute podcast that sort of goes through and explains all the different research that's happened. You can listen to that on your commute or something like that or on a walk, not need to read 40 pages, which is awesome. The part of this that makes it such an interesting experience to me is, I don't know if other people have felt this before, but most AI products, back to that, Josh,

Starting point is 00:48:52 that blank slate problem or that like empty chat box problem, you as the user of the product have to put in so much work in order to get useful stuff. I talk to people all the time who are like, yeah, I use these models and they're just not useful for me. And like actually what's happening behind the scenes is The models are super capable, they're really useful. It just requires that you give the models enough context.

Starting point is 00:49:15 And I think deep research, there's this new emerging like prompt engineering 2.0 is this context engineering problem where it's like, how do you get in the right information so that the model can make a decision on behalf of a user? And I think deep research is this really nice balance of going and doing this context engineering for you, bringing all that context into the window of the model, and then being able to answer what your original question was. And principally, showing you this like proof of work up front. I think about this proof of work concept in AI all the time, which is I have so much more

Starting point is 00:49:49 trust in deep research because as soon as I kick off that query, it's like, boom, it's already at like 50 web pages. I'm like, great, because I was never going to visit 50 web pages. Like, there's pretty much nothing that I'm researching. I could be going and buying a car and I'm going to go and look at less than 50 web pages for that thing or a house. I'm looking at less than 50 web pages. Like, I'm just, it's not in the car.

Starting point is 00:50:11 This is maybe personal to me and other people are doing more research. I don't know, but so automatically I'm like in awe with how much more work this thing is doing. I think there's this is, again, this is the North Star from a AI product experience standpoint. And there's so few products that have like made that experience work. And it just every time I go back to deep research, I'm reminded of this and that team crushed it. And it's not just deep. research from a LLM context that is so fascinating about Google AI, you guys have created some of the most fascinating tools to advance science. And I don't think you guys get enough flowers

Starting point is 00:50:52 for what you guys have built. Some of my favorites, AlphaFold 3 is crazy. So, you know, this is this model that can predict what certain molecular structures are going to look like. And this could be applied to so many different industries, the most obvious being drug design, creating cheaper, more effective, curable drugs for a variety of different diseases. Then I was thinking about that random model that you guys launched, where apparently we could translate what dolphins were saying to us and vice versa. Kind of stepping back from all of these examples, can you help me understand what is Google's obsession with AI and science and why you think it's such an important? important area to focus on. Are we at a point now where we can advance science, you know,

Starting point is 00:51:40 to infinity or where are we right now? Are we at our chat GPT moment or do we have more to get? I'll start with a couple of cheeky answers, which Demis, who is the only, you know, Foundation Model Lab CEO to have a Nobel Prize in the science domain, which is for him, for him, chemistry, had this comment, which is actually really true. There's lots of people talking about this, like impact of AI on science and humanity. And there's very few, if not, only one being deep mind research lab that's like actually doing the science work. And I think it's this like great example of like deep mind. It's just being in the like culture and DNA of like Demis as a scientist, all of these folks around deep mind are scientists and they like want to push the science and

Starting point is 00:52:29 push what's possible in this future of discovery using our model. And I was in London a couple of weeks ago, meeting with Pushmeet, who leads our science team, and hearing about sort of like the breadth of the science that's happening and how like Dolphin Gemma is like a great, like kind of like funny example, because it's not super applicable in a lot of cases, but it's interesting to think about.

Starting point is 00:52:52 Alpha Fold, like if folks haven't watched the movie, the thinking game, it's about sort of the early days of Google Deep Mind. And they're talking about, about like folding proteins and why this is such an interesting space. And I'm not a, not a scientist, but to like hit on the point really quickly of like why alpha fold is so interesting. The historical like context is humans to fold a single protein would take many humans,

Starting point is 00:53:25 millions of dollars and it would take on the order of like five years in order to fold a single protein. The original impetus and like why Demas won the Nobel piece, the Nobel Prize for this in chemistry was because DeepMind was able to figure out using reinforcement learning and other techniques. They folded every protein in the known universe. Millions of proteins, released them publicly, made them available to everyone. And it was like, you know, dramatically accelerated the advancement of like human medicine and a bunch of other domains and disciplines and now actually with isomorphic labs, which is part of deep mind, like actually pursuing some of the breakthroughs that they found and like actually doing drug discovery and

Starting point is 00:54:09 things like that. So like overnight, you see that hundreds of thousands of human years and hundreds of millions of dollars of like research and development costs saved through a single innovation. And I think we're going to continue to see that like acceleration of new stuff happening. A recent example of this Alpha Evolve, which was our sort of like geospatial model that came out and being able to like fuse together all of this, the Google Earth engine with AI and this understanding of the world. Like it's just so much cool science and so much is possible when you sort of layer on the AI capability in all these disciplines. So I think to answer the question, I think we're going to see this accessibility. of science progress. I think deep mind's going to continue to be at the forefront of this,

Starting point is 00:55:01 which is really exciting. And the cool thing for, even for people who aren't in science, is all of that innovation and the research breakthroughs that happen, it feeds back to the mainline Gemini model. Like we had a bunch of research work about doing proofs for math. And it's like, oh, that's not very interesting at the face value. But like, that research fuels back into the mainline Gemini model. It makes it better at research. It makes it better able to understand these like really long and difficult problems, which then benefits like every like agent use case that exists because the models are better at reasoning through all these like difficult problem domain. So there is this like really cool research to

Starting point is 00:55:42 reality science to like practical impact flywheel that happens at deep mind. As a former biologist, this warms my heart. This is amazing to see this get applied at such scale. Okay, we can't talk about Google AI without totally. talking about search. This is your bread and butter, right? However, I've personally noticed a trend shift in my habits. I've used a computer for decades now, and I've always used Google search to find things, Google Chrome, whatever it might be. But I've now started to cheat on this feature. I have started using LLMs directly to do all my searching for me, to get all my sources for me. And you've got to be thinking about this slogan, right? Is this eating the

Starting point is 00:56:26 search business, is this aiding the search business? Or are we creating a whole different form factor here? What are your thoughts? There's an interesting form factor discussion. I think on one hand, the AI sort of answer market is definitely distinctly different, it feels like, than the search market to a certain degree. Like, I think we've seen lots of AI products reach hundreds of millions of users and, you know, search continues to be a great business and there's billions of people using it and all that stuff. There's also this interesting question, which is like, what's the obligation of Google in this moment of this platform shift and all this innovation that's happening. And as somebody who doesn't work on search, but is a fan of all the work that's

Starting point is 00:57:08 happening inside of Google and has empathy for folks building these products, it is really interesting. And like my perspective has always been that search actually has this, you know, as the front door to the internet, has this stewardship position that makes it so that they actually can't disrupt themselves for the right reasons at the same pace that that sort of, you know, small players in the market are able to do. And my assertion has always been that like, actually, this is the best thing for the world. The best thing for the world and for the internet and for this entire economy that that Google has enabled through the internet and bringing people to websites and all this stuff doesn't benefit by like, you know, day one of the LLM, you know,

Starting point is 00:57:51 revolution happening all of a sudden. It's like a fully LLM powered search product and feels and looks completely different. Not only I think whether that throw users who are still trying to figure out, like, how do I use this technology? What is the way that I should be engaging with it? What are the things that it works well for and it doesn't work for? Not only to throw those people into a bad perspective from like a user, from a user journey, but I think it also has impacts on like people who rely on Google

Starting point is 00:58:19 from a business perspective. So I think you've seen this sort of like gradual transition and like lots of shots on goal and lots of experiments happening on the search side. And I think we're now getting to the place where like they have confidence that they could do this in a way that is going to be super positive for the ecosystem and is going to create lots of value for people who are going and using these products. Like the understanding of AI technology has increased the adoption and the models have gotten better and hallucinations have gone down and all this stuff. And I think there'll be also some

Starting point is 00:58:54 like uniquely search things that like only search can do. And I've spent a bunch of time with folks on the search team like Robbie Stein as an example who leads all the AI stuff in search. And there's all of this infrastructure that search has built, which as you think about this age of AI where the ability to generate content, which actually looks somewhat plausible, has basically gone to zero. Like, it's very easy to do that. Great search is actually more like this premium is like more important than ever. There's going to be a million X or a thousand X or whatever than X number of like growth

Starting point is 00:59:32 and content on the internet. How do you actually get people to the most relevant content from people who have authority who have, you know, expertise and all this stuff? It's a really difficult problem. And it is like, it is the problem of the decade that like search has been solving for the last 20 years and is now a more important problem than ever. So I've never been more excited for the search team. And I think they've never had a bigger challenge ahead of them as they try to figure out how to make these internet scale systems that they build continue to scale up to solve this next generation of problems while also becoming this frontier AI product experience where billions of people are experiencing AI for the first time in a different way than they've done. There's so many, there's so many interesting use cases too, even around like image search is a great example of this like new sort of, it's like, it's like,

Starting point is 01:00:26 like one of the fastest growing ways in which people are using search now is showing up with an image and asking questions about it. And just like the way the way people had just traditionally use search is already changed. It's like different than it was five years ago or even two years ago. I think we're going to continue to see that happen. I think search as a, the product you see today will evolve to have things like multi-line text input fields as as sort of user questions change and all that stuff. So there's there's so much cool stuff. on the horizon for a search that I'm really excited. Yeah, as I'm hearing you describe all of these cool new things, particularly funneling into

Starting point is 01:01:04 a single model. Like, the science breakthroughs are unbelievable. And I think that's what gets me personally really excited, like EJA, is this is actually going to help people. Like, this is going to make a difference in people's lives. Right now, it's a productive thing, it's a fun thing, it's a creative thing. There's a lot of tools. But then there's also the science part.

Starting point is 01:01:20 And a lot of this all funneling down to one amazing model, I think it leaves us in a really exciting place to wrap up this conversation. So, Logan, thank you so much for coming and sharing all of this, sharing the news about the new model, sharing all of the updates and progress that you're making everywhere else. I really enjoyed the conversation. For you, you also have a podcast called Around the Prompt. Is there anything you want to leave listeners with to go check it out or to check out the new AI Studio or the new AI model? Let us know what you have interesting going on in your life. I love seeing feedback about AI Studio.

Starting point is 01:01:49 So if you have things that don't work that you wish worked, even for both of you, please send them to me. would love to make things better. For the new model as well, like if there's, I think this is like still, this is still early days of what this model is going to be capable of. So if folks have feedback on like edge cases or use cases that don't work well, please reach out to our team, send us examples on on Axe or Twitter or the like,

Starting point is 01:02:12 would love to help make some of those use cases come to life. And I appreciate both of you for all the thoughtful questions and for the conversation. This was a ton of fun. We got to do it again sometime. Awesome. Yeah, I would love to. Anytime. Please come and join us. We really enjoy the conversation. So thank you so much for watching for the people who enjoyed. Please don't forget to like, share it with your friends and do all of the good things. And we'll be back again for another episode soon. Thank you so much. I have a fun little bonus. For those of you still listening all the way at the end, the real fans.

Starting point is 01:02:44 When we were first going to record with Logan, we actually had no idea that he would break the exclusive news of Nanobanana on our show. It was super cool. So we wanted to kind of restructure the episode to prioritize that at the front. we did record a separate intro where I said, hey, Google makes some really good stuff. In fact, you guys have an 80-something percent chance of being the best model in the world by the end of this month. Can you explain to us why? Why Google is so amazing at what they do? And this was the answer to that question. So here's a nice little nugget for the end to take you out of the episode. Thanks for listening. I really hope you enjoyed, and we'll see you guys in the next. My general worldview of why Google is in such a good place for AI right now, there's many layers of this, depending on sort of what vantage point you want to look at.

Starting point is 01:03:22 I think on one hand, it's like, I think search is this like incredible part of this story, which I think people have historically looked at Google Search as this legacy Google product. And I think search is going through this transition and is actually, like today actually just announced as we're recording this earlier, that AI mode is rolling out to 180 plus countries, English supported right now and hopefully other languages in the future. And is a great example of AI overviews and AI overview. sort of double-clicking into AI mode being this product that actually like for many people around for billions of people around the world is the first AI product experience that they actually

Starting point is 01:04:05 touch. And I think there's like something really interesting where like Google has been on this mission of like deploying AI and like, you know, there's some, you know, the some naysayers on Twitter will be like, you know, Google created the transformer and then did nothing with it. And it's actually very far from the truth, which is search has been this like transformer, which is the architecture that powers language models and Gemini has been powering that experience with this technology for the last like seven years.

Starting point is 01:04:33 The product experience maybe looks slightly different today than it did then. But Google's been an AI first company for as long as I can remember. Basically as long as AI has existed, that's been the case. And now we're seeing more and more of these product surfaces like become these frontier AI products as as sort of Google builds the infrastructure to make that the case. I think people also forget like it's not easy logistically to deploy AI to billions of people

Starting point is 01:05:00 around the world. And now as you look at like, I think Google has like five or six billion plus user products. So the challenges of like even just making a small AI product work today, if anyone's played around with stuff or tried vibe coding something like, it's not easy. Doing that at the billion user scale is also very difficult. So I continue to be more and more bullish. And part of the thing that allows us to do that billion user scale deployment is the whole infrastructure story. Like if you're watching on video, I don't know if you can see, but I have a couple of TPUs sitting behind me.

Starting point is 01:05:32 And like that TPU advantage, which is our sort of equivalent to GPUs, is something that I think is going to continue to play out. So there's so many things that I get excited about. And the future is looking very bright.

Limitless Podcast - Announcing Google's Secret New AI Model With The Person Who Built It | Logan Kilpatrick

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.