Hard Fork - Google's Gemini 3 Is Here: A Special Early Look

Episode Date: November 18, 2025

Google’s much anticipated new large language model Gemini 3 begins rolling out today. We’ll tell you what we learned from an early product briefing and bring you our conversation with Google execu...tives Demis Hassabis and Josh Woodward, just ahead of the launch. Guests:Demis Hassabis, chief executive and co-founder of Google DeepMindJosh Woodward, vice president of Google Labs and Google Gemini Additional Reading: The Man Who ‘A.G.I.-Pilled’ Google We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.

Transcript
Discussion (0)
Starting point is 00:00:00 Casey, we have a special emergency podcast episode today about the launch of Gemini Yes, Kevin, hotly awaited, much discussed among AI nerds here in Silicon Valley. We are finally about to get our hands on the genuine article. Yeah, so normally we wouldn't break our Friday publication schedule to publish a special episode just about a new model coming out from one of the big AI companies. They're releasing models all the time. But there are a couple reasons that we thought it was worth doing this this week to talk about this model, Gemini 3, in particular.
Starting point is 00:00:56 The first is that we got some time with Demis Asabas and Josh Woodward, two of the leading AI executives at Google. Demis, of course, is the CEO of Google DeepMind, which is their in-house AI lab. And Josh Woodward is the VP of the Gemini team and some other stuff there at Google. So we were excited to talk to them and ask them about this big new model release. But I think there are a couple other reasons we were interested in doing this as well. Yeah, I mean, one big thing, Kevin, is just that maybe more than other models. model releases. This one seems to have the attention of Google's competitors. We're hearing a lot of whispers from folks who work at other AI labs that, hmm, it seems like Gemini 3 has managed
Starting point is 00:01:38 to figure some things out in a way that may be bad for their businesses. And I think around the AI industry, there's sort of this feeling that Google, which kind of struggled in AI for a couple years there, they had the launch of Bard and the first versions of Gemini, which had some issues. And I think they were seen as sort of catching up to the state of the art. And now I think the question is like, is this kind of them taking their crown back? So we'll get into all that with Demis and Josh. But let's just talk Casey about what we know about Gemini 3. They held a briefing early this week and told us a little bit about the new model and what it can do. So what did we learn about Gemini 3? Yeah. Well, so in terms of what it can do,
Starting point is 00:02:19 which is always the most interesting to me, Google shared a few different things. One, in addition to saying all the things you would expect, like, it's better at coding and it's better at vibe coding. It also is going to do some new things around generating interfaces for you when you ask it a question. So nowadays, you ask most chatbots a question. It'll spit back and answer in text. Maybe it shows you an image. According to the Google folks, Gemini 3 is just going to start building custom interfaces for you. So they showed an example where somebody wanted to learn about Vincent Van Gogh, the painter.
Starting point is 00:02:51 and Gemini 3 just sort of like coded up an interactive tutorial that had all sorts of like images and interactive elements. They showed another example that involved building a mortgage calculator for buying a home over a million dollars which is the lowest amount of money that anyone
Starting point is 00:03:05 at Google can imagine spending on a home. So these are the kinds of things that you can expect to find in Gemini 3, Kevin. Yeah. So I would say the theme of the briefing and of the materials that Google shared ahead of the Gemini 3 launch was this is just kind of better than
Starting point is 00:03:21 their last model, Gemini 2.5 Pro, in basically all respect. Some of the benchmarks that caught my attention, one was this benchmark test called Humanity's Last Exam, which is sort of a very hard, interdisciplinary exam that consists of a bunch of questions, like basically a graduate student or PhD level. And their previous model, Gemini 2.5 Pro, got about a 21.6% on that test, and Gemini 3 Pro gets a 37.5% on that test. That's basically the story of all of these benchmarks. They gave more than a dozen examples of various benchmarks where the new model just beats the old one handily.
Starting point is 00:04:01 And to a lot of people, I think that may not matter. Most people who are using Google's AI products are probably not out there trying to solve novel problems in physics. But their basic pitch for this is just like, this is a state-of-the-art model. anything that you could do with ChatchipT or Claude or even the older versions of Gemini, you can do better with Gemini 3 Pro. They also talked about testing what they're calling the Gemini agent,
Starting point is 00:04:26 which is going to be able to do one thing in particular that I've been waiting for somebody to do forever, which is look through your inbox, understand its contents, propose replies, kind of organize, like emails together, and really sort of help you get your inbox under control in a way that I personally have never been able to. So we basically only saw a few animated gifts about that, but that will definitely be one of the first things that I try when I get my hands on Gemini 3.
Starting point is 00:04:55 Yeah, and they are not, we should say, rolling this out to everyone right away. It's going to be available this week for users in the Gemini app and also in the AI mode, which is sort of the tab off to the side of the main Google search engine. It will also be available for developers
Starting point is 00:05:12 in various products. But they're not sort of saying when this will come to things like the Gemini integrations in Google Docs or Gmail, these very popular things that are used by billions of people a day. But I thought it was interesting that they have brought this model to Google search, albeit in this AI mode that's not sort of the main search bar. That to me suggests that they feel like they can serve this model cheaply enough to make it potentially something the billions of people could use and that that would not melt their. their servers and incur billions of dollars of costs. Yeah, so far they say that the usage keeps going up for AI overviews, and every quarter they continue to make more money. So it seems to be working out for them, not working out for the rest of the web, but it's working out well for Google. Yeah, but I think that's like, obviously
Starting point is 00:06:01 Google's big advantage here over their competitors is that, you know, they have products that are used by billions of people a day, and they can kind of shove Gemini three into those products over time and just get more and more usage and get more data and use that to improve their models. Yeah, which is why we always tell students when they ask us for advice, step one, build an illegal monopoly. Yes. And speaking of students, the other notable announcement that Google is making this week is that they are giving all U.S. college students a year of free access to a paid version of Gemini, which is, I think, a smart move. I feel a little gross about it, like essentially
Starting point is 00:06:40 telling students, hey, why don't you use this to maybe do some of your homework, maybe help you with your exams. We'll give you the first hit for free. Yeah, you know, I was also struck during the briefing that we had this morning that I believe three different people use the phrase learn anything. This seems like it has become a very prominent plank of Google's messaging is they're presenting Gemini as a learning tool, which maybe is just sort of a euphemism for a do-your homework tool. I don't know. Yes. Okay. So that is what we know about Gemini 3. We will be doing our own testing and reviewing of Gemini 3 once it is fully out on Tuesday. But for now, we wanted to just give you the basics and also bring you our interview
Starting point is 00:07:21 with Demis Sibbis and Josh Woodward of Google Deep Mind. And before we get to that, we should obviously make our AI disclosures. I work for the New York Times Company, which is doing OpenAI and Microsoft over the training of large language models. And my boyfriend works in Anthropic. Demis and Josh, welcome to Hard Fork. Great to be here. Thank you. So two years ago, Sundar Pichai told us that Bard, rest in peace, was a souped-up civic that was in a race with more powerful cars. What kind of car is Gemini 3?
Starting point is 00:07:58 That's a good one. Demis, do you want to take it? Well, I hope it's a bit faster than a Honda Civic. You know, I don't really think of it in terms of cars. and maybe it's one of those cool drag racers. Yeah, so people are really excited about this model. We have been hearing from folks that have been sort of early testing it. Obviously, you guys have shown off a lot of the benchmarks, very impressive.
Starting point is 00:08:22 What can Gemini do on a concrete level that previous AI models couldn't? Well, I'll jump in maybe a couple of things that stand out. One, we're starting to see this model really excel on reasoning and being able to think many steps. at the same time. Sometimes models in the past would lose their train of thought, lose track.
Starting point is 00:08:44 This one's way better at that. The other thing you'll see tomorrow as well is all kinds of new generative interfaces. This is our best model yet at being able to create new types of interfaces. It gives people really a custom sort of design and sort of answer to their questions. And then maybe the third thing I would say
Starting point is 00:09:03 is we've put a lot of investment in coding itself. And so a lot of the coding, examples you'll see some new products coming out like Google anti-gravity will also kind of showcase that there's been some discussion that for average users the chat use case can feel solved that sort of average users of products like Gemini kind of almost can't even think of a question to ask it that will generate something that feels meaningfully different from what they were able to get in the last model to what extent does that feel true to you in Gemini and to what extent do you think average folks are really going to notice a difference?
Starting point is 00:09:40 Yeah, one of the things, I guess we're seeing in some of the testing and Demis, feel free to chime in too, is I think these are really, for us, this is a model that it's more concise, it's more expressive, it starts to present information in a way that's must easier to understand. And I think for most people, that's going to be a big immediate effect. And then I think what starts to get interesting is how these models start to interact with other types. of information. So we talk a lot about how students are going to be able to learn with this model, or even how this model can connect to other types of data you might have in other Google products with your permission. These are the ways I think we're starting to show kind of it's going beyond just the standard text kind of Q&A back and forth. Yeah, I think I'd add to that just like, you know, it's general reliability on things.
Starting point is 00:10:30 It's incredibly, you know, you'll notice that when you use it. I think also we work quite hard on the persona, which we call it internally, like the style of it. I think it's more succinct. I think it's more to the point. It's helpful. I feel like it's got a better style about it. I find it more pleasant to brainstorm with and use. And then I think, you know, I think there are various things where there's almost a step change. I feel like it's crossed a sort of threshold of usefulness on things like vibe coding. I've been getting back into my games programming. I'm going to, I've got to set myself some projects over Christmas on that because I feel like it's actually got to a point where it's incredibly useful and capable on front end and things like this
Starting point is 00:11:11 that perhaps previous versions weren't so good at. Demis, the last time we had you on the show in May, you said that you think we're five to ten years away from AGI and that there might be a few significant breakthroughs needed between here and there. Has Gemini III in observing how good it is changed any of those timelines or does it incorporate any of those breakthroughs that you thought would be necessary? No, I think it's, I think it's sort of dead on track if you, if you see what I mean. I think we're really happy with this progress.
Starting point is 00:11:41 I think it's an absolutely amazing model and is right on track of what I was expecting and the trajectory we've been on, actually, for the last couple of years since the beginning of Gemini, which I think's been the fastest progress of anybody in the industry. And I think we're going to continue doing that trajectory. And we expect that to continue. But on top of that, I still think there'll be one or two more things that are required to really get the consistency across the board that you'd expect from a general intelligence and also improvements still on reasoning, on memory, and perhaps things like world model ideas
Starting point is 00:12:16 that you also know we're working on with Simmer and Jeannie. They will build on top of Gemini, but extend it in various ways. And I think some of those ideas are going to be required as well to fully solve physical intelligence and things like that. So I'm both are true. I'm really happy with the progress of Gemini 3, I think people are going to be pretty pleasantly surprised, but it's on track of what we were expecting the progress to be. And I think that means still five to 10 years with one or two more perhaps breakthroughs required. You mentioned Gemini 3's style. There's been a lot of discussion recently about AI companions, the relationships people are developing with them. How do you think about Gemini 3's personality and what kind of
Starting point is 00:12:59 relationship do you want users to have with it? I would say in the app itself, we see it on the team a lot as almost like a tool, or it's something you're using to kind of work through and kind of cut through your day. And so whether it's kind of, if it's helping on different types of questions you have or helping you create things, that's really where we see it really kind of excelling and kind of the direction we want to see it. I think if you zoom out, if you look at Gemini or some of our other projects like Notebook L.M. Or flow, we're really kind of trying to think through how does AI really be this superpower, kind of super tool in your toolbox that you can use, whether it's for writing or researching or creating films or whatnot. And so that's really more where we're
Starting point is 00:13:42 focused. I think over time, we're really interested on the team to be able to track things like how many tasks did we help you complete in your day. That's a new type of metric that I think we get excited about and sort of a way that the original sort of Google search worked, you would come to it, you would sort of try to get an answer or sent to a page and sort of move on from there. Well, that all sounds very good and responsible, but I'm wondering about all the viral engagement you're leaving on the table by not making this thing an erotic companion. Big oversight. No comment. Some of your competitors have been very nervous in the days and weeks leading up to Gemini 3, I think they've started hearing the same rumblings that we have about
Starting point is 00:14:26 this model being quite good. And maybe the narrative shifting from sort of Google playing catch-up in AI to now sort of being on top of the race, or at least in a leadership position there. Do you feel like Google is ahead in the AI race right now? Look, it's a, as you guys know, very well, it's a ferocious, you know, competitive environment, probably the most competitive there's ever been. So one can never, you know, It's almost really the only important thing is your rate of progress, right, from where you are. And that's what we're focusing on, and we're very happy about that. I mean, I don't really see it as a sort of like, you know, we were back in the lead or something like that.
Starting point is 00:15:03 We've always pioneered the research part of this. I think it's like getting into our groove in making sure that downstream reflected in all of our products. And I think we're really getting into our stride there. I think you saw that actually last I would say. And we're getting better at better at that. like with GDM being sort of the engine room of Google. And, of course, there's a Gemini app, there's Notebook LM, these AI first products, but there's also powering up all these amazing existing Google products,
Starting point is 00:15:31 whether that's Maps, YouTube, Android, you know, search, of course, with AI first features and actually in some cases, reimagining things from an AI first perspective with, you know, often Gemini under the hood. And that's going amazingly well. And I think we're only midway through that evolution, But it's very exciting to see how much value and excitement our users are getting when they see each of those new features and, you know, for example, workspace and Gmail and so on. There's almost almost endless possibilities there.
Starting point is 00:16:00 So we're really excited about that as well as all of these AI first products that we're also imagining and prototyping. We had a historian on the show last week who was using an unreleased Google model in AI Studio. and it had sort of blown his mind with how it was able to transcribe these very old documents and reason correctly about, you know, what kind of, you know, what was the measurements of the sugar in this sort of 1800s fur trade in Canada? Do you think you can tell us once and for all, was this man using Gemini 3? Not sure about that one. Okay. I will say the model is, though, quite amazing at making these connections. And I don't know if the historian was using kind of photos of old documents.
Starting point is 00:16:45 documents or diaries or whatnot. Yes, that's what he was doing. Yeah, that most suddenly was. Okay, it's very good at this. And, you know, someone like me has pretty poor handwriting. You can take a page of notes, and it'll kind of take that and run with it with no problem, no sweat. You mentioned that on this call that you're going to be integrating this into search in the AI mode that sort of is a side tab on the main Google search engine. Does that mean that you found a way to serve this model more efficiently and cheaply than previous models?
Starting point is 00:17:13 I think we're always on the cut. I think I feel like the thing we do really well, apart from the overall performance of our models and getting better and better at that is the efficiency of our models. And the distillation techniques and many, many other techniques that we sort of created and pioneered
Starting point is 00:17:29 that we're now putting to use. Obviously, it's necessary for us because we have extreme use cases of things like AI overviews and others that we have to serve billions of users. And then, of course, some of our cloud customer, enterprise customers,
Starting point is 00:17:43 appreciate that efficiency, cost efficiency, too. So we've always tried to be on this Pareto frontier of cost to performance. And wherever you want to be on that frontier, if you value performance most, or if you value cost the most, then there'll be one of the models in the model family for you. So, of course, we're only announcing pro today, but we are also working on the other family of models for the 3.0 era. So you'll see a lot more about that, but pretty soon. It seems like every time we see the release of a new frontier model, we get to revisit the discussion about scaling laws. And are we beginning to see diminishing returns? And I can predict a few Twitter accounts that we'll probably have something to say about this over the next few days.
Starting point is 00:18:27 So I thought I would just sort of ask you, before we have that discourse, how are you guys thinking about that in relation to Gemini 3? Yeah, we're very happy with the progress Gemini 3 represents over 2.5. So I would say, sort of actually referencing to what we discussed earlier, that the progress is basically what we're expecting and on track, and we're really pleased with it. But that's not to say that it's like there is some kind of diminishing returns. People when they hear diminishing returns, they think of is it zero or exponential, right? But there's also in between. So they can be diminishing, it's not going to like exponentially double with every era, but it's still well worth doing, right? And an extremely good return on that investment.
Starting point is 00:19:13 So I think we're in that era. And then, you know, as I said, my suspicion is, although we'll see, is that still one or two more breakthroughs are required, research breakthroughs are required to get all the way to AGI. But in the meantime, you're going to obviously need as scaled a possible versions of these foundation models, multimoder foundation models that we're building today and still seeing great progress on. Which of the many benchmarks that you showed off today do you feel like is going to matter most to the average user? Oh, that's a good question.
Starting point is 00:19:42 I think most people don't look at the benchmarks as closely as we do, but the benchmarks are always a proxy, right? So you look at something like cracking the 1500 ELO on LM Arena. That's great, but what really matters is kind of the user satisfaction in the products, too. And I think what's been encouraging to us is these are still moving in the same direction. They're good proxies for each other. And so ultimately, I think we'll put out all the benchmarks and we're very proud of them. And they represent amazing progress.
Starting point is 00:20:14 But you also have to be able to translate that into product experiences that matter. And so we try to do both with every one of these releases. Any new dangerous capabilities or safety concerns that come with the increased power of the model? I think, well, we've done, we've taken quite a long. time on this model to, because it's, it's frontier and, you know, has some new capabilities and it's very capable, as you can see from the benchmarks. And as Josh said, we don't, we don't, you know, we make sure to not over index internally on those benchmarks. They're just a proxy for overall performance. And that's why we care about them across the board. And then ultimately
Starting point is 00:20:51 how our users experience them. But we spend a lot of time on testing, safety testing, all the different dimensions with the safety institutes and also external testers that we work with as well, as well as, of course, doing a ton of internal testing. So I would say this is our most thoroughly tested model so far. Do you want to mention any of those sort of new capabilities that popped up, whether or not it was for a safety thing? Was there something in there where you thought, okay, yeah, we definitely need to make sure we're sending this to a bunch of external researchers? Yeah, well, look, it's just making sure we've worked really hard on things like tool call usage and function calling and these kinds of things. Obviously, there's
Starting point is 00:21:28 super important for coding capabilities and developers want that and so on. And it's very important in general for reasoning. But it also makes them more capable for for riskier things, too, like cyber. So we have to be, you know, we have to be sort of doubly cautious as we improve those dimensions for all the good use cases that we're continually checking on all those kinds of measures that they can't be, they can't be misused. Are we in an AI bubble? I think it's two binary a question, I would say. I think, I mean, my view on this, this is just strictly my own opinion, is that there are some parts of the AI industry that are probably in a bubble.
Starting point is 00:22:12 You know, if you look at like seed investment rounds being multi-10 billion dollar rounds with basically nothing, it seems, I mean, there's talented teams, but it seems like that might be the first signs of some kind of bubble. On the other hand, I think there's a lot of amazing work and value to, at least from our perspective that we see, that not only are there all the new product areas, so Gemini app, notebook LM, but thinking more forward robotics, gaming. I mean, there's incredible uses of, and not just Gemini, but some of our other models, Jeannie. You can imagine my old gamespaying background. You know, I'm itching to think about what could be done there. And drug discovery, we're doing an isomorphic and Waymo.
Starting point is 00:22:53 And so there's all these new greenfield areas. They're going to take a while to mature into a massive multi-hundred billion dollar businesses. But I think that there's actually potential for half a dozen to a dozen there that I think Alphabet will be involved with, which I'm really excited about. But also immediate returns, we got, of course, the engine room, you know, this is the engine room part of Google, where we're pushing this into all of these incredible, you know, multi-billing user products that people use every day. And there's just almost, we have so many ideas, it's just about execution, like how would you reorganize workspace around that, Android, YouTube. There's just so much potential there. And I think a lot of that will also bring in near term revenue and direct returns while we're also investing in the future, not to speak of, you know, cloud revenue and TPUs and all of that, which I think is also going to be huge. So I feel really good about where we are as alphabet.
Starting point is 00:23:51 whether or not there's a bubble or not. I think our job is to be winning in both cases, right? If there's no bubble and things carry on, then we're going to take advantage of that opportunity. But if there is some sort of bubble and there's a retrenchment, I think we'll also be best place to take advantage of that scenario as well. All right. Let's imagine it's Thanksgiving coming up and it's the Bay Area. And one of our listeners, you know, changes the subject from politics, which is upsetting everyone to AI. Give people something. something to be excited about, and someone say, hey, I heard Gemini 3 just came out. Like, what can it actually do?
Starting point is 00:24:26 What's the example that you would have our listeners show their friends, whether it's on their phone and their laptop, to be, get a load of this and save Thanksgiving? Yeah, I don't know if it'll save Thanksgiving, but it could probably provide some laughs. You know, our imagery models in Gemini are still best in the world. So what we would, I would say, grab your phone, can be, you know, iPhone, Android, doesn't matter, pull it out. you can take a selfie, put yourself in it and edit it. People are still doing that at huge amounts, and it's great fun.
Starting point is 00:24:58 And then I think you can then show off any kind of other capabilities in the new Gemini three alongside it. But this is what we're seeing, people are kind of coming for a lot of these interesting use cases and then starting to try other parts of the app too. You heard it here. Nanobanano will save Thanksgiving dinner. Gentlemen, thank you. It's great to talk, and thanks for making the time.
Starting point is 00:25:18 Appreciate it. Thanks for having us. Thank you all. Thanks, guys. Hard Fork is produced by Whitney Jones and Rachel Cohn. We're edited by Jen Poyant. Today's show is engineered by Chris Wood. Original music by Diane Wong, Rowan Nemistow, and Dan Powell.
Starting point is 00:26:03 Video production by Soria Roque, Pat Gunther, and Chris Schott. You can watch this full episode on YouTube at YouTube.com slash hardfork. Special thanks to Paula Schumann, Pueh Wing-Tam, Dahlia Hadad, and Jeffrey Miranda. You can email us, as always, at hardfork at NYTimes.com. You know, I'm sorry.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.