Hard Fork - Google's Gemini 3 Is Here: A Special Early Look
Episode Date: November 18, 2025Google’s much anticipated new large language model Gemini 3 begins rolling out today. We’ll tell you what we learned from an early product briefing and bring you our conversation with Google execu...tives Demis Hassabis and Josh Woodward, just ahead of the launch. Guests:Demis Hassabis, chief executive and co-founder of Google DeepMindJosh Woodward, vice president of Google Labs and Google Gemini Additional Reading: The Man Who ‘A.G.I.-Pilled’ Google We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.
Transcript
Discussion (0)
Casey, we have a special emergency podcast episode today about the launch of Gemini
Yes, Kevin, hotly awaited, much discussed among AI nerds here in Silicon Valley.
We are finally about to get our hands on the genuine article.
Yeah, so normally we wouldn't break our Friday publication schedule to publish a special episode
just about a new model coming out from one of the big AI companies.
They're releasing models all the time.
But there are a couple reasons that we thought it was worth doing this this week to talk about
this model, Gemini 3, in particular.
The first is that we got some time with Demis Asabas and Josh Woodward, two of the leading AI executives at Google.
Demis, of course, is the CEO of Google DeepMind, which is their in-house AI lab.
And Josh Woodward is the VP of the Gemini team and some other stuff there at Google.
So we were excited to talk to them and ask them about this big new model release.
But I think there are a couple other reasons we were interested in doing this as well.
Yeah, I mean, one big thing, Kevin, is just that maybe more than other models.
model releases. This one seems to have the attention of Google's competitors. We're hearing a lot
of whispers from folks who work at other AI labs that, hmm, it seems like Gemini 3 has managed
to figure some things out in a way that may be bad for their businesses. And I think around
the AI industry, there's sort of this feeling that Google, which kind of struggled in AI for
a couple years there, they had the launch of Bard and the first versions of Gemini, which had
some issues. And I think they were seen as sort of catching up to the state of the art. And
now I think the question is like, is this kind of them taking their crown back? So we'll get
into all that with Demis and Josh. But let's just talk Casey about what we know about Gemini
3. They held a briefing early this week and told us a little bit about the new model and what
it can do. So what did we learn about Gemini 3? Yeah. Well, so in terms of what it can do,
which is always the most interesting to me, Google shared a few different things.
One, in addition to saying all the things you would expect, like, it's better at coding and it's better at vibe coding.
It also is going to do some new things around generating interfaces for you when you ask it a question.
So nowadays, you ask most chatbots a question.
It'll spit back and answer in text.
Maybe it shows you an image.
According to the Google folks, Gemini 3 is just going to start building custom interfaces for you.
So they showed an example where somebody wanted to learn about Vincent Van Gogh, the painter.
and Gemini 3 just sort of like
coded up an interactive tutorial
that had all sorts of like images
and interactive elements.
They showed another example
that involved building a mortgage calculator
for buying a home over a million dollars
which is the lowest amount of money that anyone
at Google can imagine spending on a home.
So these are the kinds of things
that you can expect to find in Gemini 3, Kevin.
Yeah.
So I would say the theme of the briefing
and of the materials that Google shared
ahead of the Gemini 3 launch was
this is just kind of better than
their last model, Gemini 2.5 Pro, in basically all respect. Some of the benchmarks that caught
my attention, one was this benchmark test called Humanity's Last Exam, which is sort of a very
hard, interdisciplinary exam that consists of a bunch of questions, like basically a graduate
student or PhD level. And their previous model, Gemini 2.5 Pro, got about a 21.6% on that test,
and Gemini 3 Pro gets a 37.5% on that test.
That's basically the story of all of these benchmarks.
They gave more than a dozen examples of various benchmarks
where the new model just beats the old one handily.
And to a lot of people, I think that may not matter.
Most people who are using Google's AI products
are probably not out there trying to solve novel problems in physics.
But their basic pitch for this is just like,
this is a state-of-the-art model.
anything that you could do with ChatchipT or Claude or even the older versions of Gemini,
you can do better with Gemini 3 Pro.
They also talked about testing what they're calling the Gemini agent,
which is going to be able to do one thing in particular that I've been waiting for somebody to do forever,
which is look through your inbox, understand its contents, propose replies,
kind of organize, like emails together,
and really sort of help you get your inbox under control in a way
that I personally have never been able to.
So we basically only saw a few animated gifts about that,
but that will definitely be one of the first things
that I try when I get my hands on Gemini 3.
Yeah, and they are not,
we should say, rolling this out to everyone right away.
It's going to be available this week
for users in the Gemini app
and also in the AI mode,
which is sort of the tab off to the side
of the main Google search engine.
It will also be available for developers
in various products.
But they're not sort of saying when this will come to things like the Gemini integrations in Google Docs or Gmail, these very popular things that are used by billions of people a day.
But I thought it was interesting that they have brought this model to Google search, albeit in this AI mode that's not sort of the main search bar.
That to me suggests that they feel like they can serve this model cheaply enough to make it potentially something the billions of people could use and that that would not melt their.
their servers and incur billions of dollars of costs.
Yeah, so far they say that the usage keeps going up for AI overviews, and every quarter they
continue to make more money. So it seems to be working out for them, not working out for the
rest of the web, but it's working out well for Google. Yeah, but I think that's like, obviously
Google's big advantage here over their competitors is that, you know, they have products
that are used by billions of people a day, and they can kind of shove Gemini three into those
products over time and just get more and more usage and get more data and use that to improve
their models. Yeah, which is why we always tell students when they ask us for advice, step one,
build an illegal monopoly.
Yes. And speaking of students, the other notable announcement that Google is making this week is
that they are giving all U.S. college students a year of free access to a paid version of
Gemini, which is, I think, a smart move. I feel a little gross about it, like essentially
telling students, hey, why don't you use this to maybe do some of your homework, maybe help you
with your exams. We'll give you the first hit for free. Yeah, you know, I was also struck during
the briefing that we had this morning that I believe three different people use the phrase
learn anything. This seems like it has become a very prominent plank of Google's messaging
is they're presenting Gemini as a learning tool, which maybe is just sort of a euphemism for a
do-your homework tool. I don't know. Yes. Okay. So that is what we know about
Gemini 3. We will be doing our own testing and reviewing of Gemini 3 once it is fully out
on Tuesday. But for now, we wanted to just give you the basics and also bring you our interview
with Demis Sibbis and Josh Woodward of Google Deep Mind. And before we get to that, we should
obviously make our AI disclosures. I work for the New York Times Company, which is doing OpenAI
and Microsoft over the training of large language models. And my boyfriend works in Anthropic.
Demis and Josh, welcome to Hard Fork.
Great to be here.
Thank you.
So two years ago, Sundar Pichai told us that Bard, rest in peace, was a souped-up civic that was in a race with more powerful cars.
What kind of car is Gemini 3?
That's a good one.
Demis, do you want to take it?
Well, I hope it's a bit faster than a Honda Civic.
You know, I don't really think of it in terms of cars.
and maybe it's one of those cool drag racers.
Yeah, so people are really excited about this model.
We have been hearing from folks that have been sort of early testing it.
Obviously, you guys have shown off a lot of the benchmarks, very impressive.
What can Gemini do on a concrete level that previous AI models couldn't?
Well, I'll jump in maybe a couple of things that stand out.
One, we're starting to see this model really excel on reasoning
and being able to think many steps.
at the same time.
Sometimes models in the past
would lose their train of thought,
lose track.
This one's way better at that.
The other thing you'll see tomorrow as well
is all kinds of new generative interfaces.
This is our best model yet
at being able to create new types of interfaces.
It gives people really a custom sort of design
and sort of answer to their questions.
And then maybe the third thing I would say
is we've put a lot of investment in coding itself.
And so a lot of the coding,
examples you'll see some new products coming out like Google anti-gravity will also kind of
showcase that there's been some discussion that for average users the chat use case can feel
solved that sort of average users of products like Gemini kind of almost can't even think of a
question to ask it that will generate something that feels meaningfully different from what they
were able to get in the last model to what extent does that feel true to you in Gemini
and to what extent do you think average folks are really going to notice a difference?
Yeah, one of the things, I guess we're seeing in some of the testing and Demis, feel free to chime in too, is I think these are really, for us, this is a model that it's more concise, it's more expressive, it starts to present information in a way that's must easier to understand.
And I think for most people, that's going to be a big immediate effect.
And then I think what starts to get interesting is how these models start to interact with other types.
of information. So we talk a lot about how students are going to be able to learn with this
model, or even how this model can connect to other types of data you might have in other
Google products with your permission. These are the ways I think we're starting to show kind of
it's going beyond just the standard text kind of Q&A back and forth.
Yeah, I think I'd add to that just like, you know, it's general reliability on things.
It's incredibly, you know, you'll notice that when you use it. I think also we work quite hard
on the persona, which we call it internally, like the style of it. I think it's more succinct.
I think it's more to the point. It's helpful. I feel like it's got a better style about it.
I find it more pleasant to brainstorm with and use. And then I think, you know, I think there
are various things where there's almost a step change. I feel like it's crossed a sort of
threshold of usefulness on things like vibe coding. I've been getting back into my games programming.
I'm going to, I've got to set myself some projects over Christmas on that because I feel like
it's actually got to a point where it's incredibly useful and capable on front end and things like this
that perhaps previous versions weren't so good at.
Demis, the last time we had you on the show in May,
you said that you think we're five to ten years away from AGI
and that there might be a few significant breakthroughs needed between here and there.
Has Gemini III in observing how good it is changed any of those timelines
or does it incorporate any of those breakthroughs that you thought would be necessary?
No, I think it's, I think it's sort of dead on track if you, if you see what I mean.
I think we're really happy with this progress.
I think it's an absolutely amazing model and is right on track of what I was expecting
and the trajectory we've been on, actually, for the last couple of years since the beginning
of Gemini, which I think's been the fastest progress of anybody in the industry.
And I think we're going to continue doing that trajectory.
And we expect that to continue.
But on top of that, I still think there'll be one or two more things that are required
to really get the consistency across the board that you'd expect from a general intelligence
and also improvements still on reasoning, on memory, and perhaps things like world model ideas
that you also know we're working on with Simmer and Jeannie. They will build on top of Gemini,
but extend it in various ways. And I think some of those ideas are going to be required as well
to fully solve physical intelligence and things like that. So I'm both are true. I'm really happy
with the progress of Gemini 3, I think people are going to be pretty pleasantly surprised,
but it's on track of what we were expecting the progress to be. And I think that means still
five to 10 years with one or two more perhaps breakthroughs required. You mentioned Gemini 3's
style. There's been a lot of discussion recently about AI companions, the relationships people
are developing with them. How do you think about Gemini 3's personality and what kind of
relationship do you want users to have with it? I would say in the app itself, we see it on the
team a lot as almost like a tool, or it's something you're using to kind of work through and kind of
cut through your day. And so whether it's kind of, if it's helping on different types of questions you
have or helping you create things, that's really where we see it really kind of excelling and kind
of the direction we want to see it. I think if you zoom out, if you look at Gemini or some of our
other projects like Notebook L.M. Or flow, we're really kind of trying to think through how does
AI really be this superpower, kind of super tool in your toolbox that you can use, whether it's
for writing or researching or creating films or whatnot. And so that's really more where we're
focused. I think over time, we're really interested on the team to be able to track things like
how many tasks did we help you complete in your day. That's a new type of metric that I think
we get excited about and sort of a way that the original sort of Google search worked, you would
come to it, you would sort of try to get an answer or sent to a page and sort of move on from
there. Well, that all sounds very good and responsible, but I'm wondering about all the viral
engagement you're leaving on the table by not making this thing an erotic companion. Big
oversight. No comment. Some of your competitors have been very nervous in the days and weeks
leading up to Gemini 3, I think they've started hearing the same rumblings that we have about
this model being quite good. And maybe the narrative shifting from sort of Google playing catch-up
in AI to now sort of being on top of the race, or at least in a leadership position there.
Do you feel like Google is ahead in the AI race right now?
Look, it's a, as you guys know, very well, it's a ferocious, you know, competitive environment,
probably the most competitive there's ever been. So one can never, you know,
It's almost really the only important thing is your rate of progress, right, from where you are.
And that's what we're focusing on, and we're very happy about that.
I mean, I don't really see it as a sort of like, you know, we were back in the lead or something like that.
We've always pioneered the research part of this.
I think it's like getting into our groove in making sure that downstream reflected in all of our products.
And I think we're really getting into our stride there.
I think you saw that actually last I would say.
And we're getting better at better at that.
like with GDM being sort of the engine room of Google.
And, of course, there's a Gemini app, there's Notebook LM, these AI first products,
but there's also powering up all these amazing existing Google products,
whether that's Maps, YouTube, Android, you know, search, of course,
with AI first features and actually in some cases,
reimagining things from an AI first perspective with, you know,
often Gemini under the hood.
And that's going amazingly well.
And I think we're only midway through that evolution,
But it's very exciting to see how much value and excitement our users are getting when they see each of those new features and, you know, for example, workspace and Gmail and so on.
There's almost almost endless possibilities there.
So we're really excited about that as well as all of these AI first products that we're also imagining and prototyping.
We had a historian on the show last week who was using an unreleased Google model in AI Studio.
and it had sort of blown his mind with how it was able to transcribe these very old documents
and reason correctly about, you know, what kind of, you know, what was the measurements of the sugar
in this sort of 1800s fur trade in Canada? Do you think you can tell us once and for all,
was this man using Gemini 3?
Not sure about that one. Okay. I will say the model is, though, quite amazing at making
these connections. And I don't know if the historian was using kind of photos of old documents.
documents or diaries or whatnot.
Yes, that's what he was doing.
Yeah, that most suddenly was.
Okay, it's very good at this.
And, you know, someone like me has pretty poor handwriting.
You can take a page of notes, and it'll kind of take that and run with it with no problem, no sweat.
You mentioned that on this call that you're going to be integrating this into search in the AI mode that sort of is a side tab on the main Google search engine.
Does that mean that you found a way to serve this model more efficiently and cheaply than previous models?
I think we're always on the cut.
I think I feel like the thing we do really well,
apart from the overall performance of our models
and getting better and better at that
is the efficiency of our models.
And the distillation techniques
and many, many other techniques
that we sort of created and pioneered
that we're now putting to use.
Obviously, it's necessary for us
because we have extreme use cases
of things like AI overviews and others
that we have to serve billions of users.
And then, of course,
some of our cloud customer,
enterprise customers,
appreciate that efficiency, cost efficiency, too. So we've always tried to be on this Pareto
frontier of cost to performance. And wherever you want to be on that frontier, if you value
performance most, or if you value cost the most, then there'll be one of the models in the
model family for you. So, of course, we're only announcing pro today, but we are also working
on the other family of models for the 3.0 era. So you'll see a lot more about that, but pretty
soon. It seems like every time we see the release of a new frontier model, we get to revisit the
discussion about scaling laws. And are we beginning to see diminishing returns? And I can predict a few
Twitter accounts that we'll probably have something to say about this over the next few days.
So I thought I would just sort of ask you, before we have that discourse, how are you guys thinking
about that in relation to Gemini 3? Yeah, we're very happy with the progress Gemini 3 represents
over 2.5. So I would say, sort of actually referencing to what we discussed earlier, that
the progress is basically what we're expecting and on track, and we're really pleased with it.
But that's not to say that it's like there is some kind of diminishing returns. People when
they hear diminishing returns, they think of is it zero or exponential, right? But there's also
in between. So they can be diminishing, it's not going to like exponentially double with every
era, but it's still well worth doing, right? And an extremely good return on that investment.
So I think we're in that era. And then, you know, as I said, my suspicion is, although we'll
see, is that still one or two more breakthroughs are required, research breakthroughs are
required to get all the way to AGI. But in the meantime, you're going to obviously need as scaled
a possible versions of these foundation models, multimoder foundation models that we're building
today and still seeing great progress on.
Which of the many benchmarks that you showed off today
do you feel like is going to matter most to the average user?
Oh, that's a good question.
I think most people don't look at the benchmarks as closely as we do,
but the benchmarks are always a proxy, right?
So you look at something like cracking the 1500 ELO on LM Arena.
That's great, but what really matters is kind of the user satisfaction in the products, too.
And I think what's been encouraging to us is these are still moving in the same direction.
They're good proxies for each other.
And so ultimately, I think we'll put out all the benchmarks and we're very proud of them.
And they represent amazing progress.
But you also have to be able to translate that into product experiences that matter.
And so we try to do both with every one of these releases.
Any new dangerous capabilities or safety concerns that come with the increased power of the model?
I think, well, we've done, we've taken quite a long.
time on this model to, because it's, it's frontier and, you know, has some new capabilities
and it's very capable, as you can see from the benchmarks. And as Josh said, we don't, we don't,
you know, we make sure to not over index internally on those benchmarks. They're just a proxy
for overall performance. And that's why we care about them across the board. And then ultimately
how our users experience them. But we spend a lot of time on testing, safety testing, all the
different dimensions with the safety institutes and also external testers that we work with
as well, as well as, of course, doing a ton of internal testing. So I would say this is our most
thoroughly tested model so far. Do you want to mention any of those sort of new capabilities
that popped up, whether or not it was for a safety thing? Was there something in there where you
thought, okay, yeah, we definitely need to make sure we're sending this to a bunch of external
researchers? Yeah, well, look, it's just making sure we've worked really hard on things like
tool call usage and function calling and these kinds of things. Obviously, there's
super important for coding capabilities and developers want that and so on. And it's very important
in general for reasoning. But it also makes them more capable for for riskier things, too, like
cyber. So we have to be, you know, we have to be sort of doubly cautious as we improve those
dimensions for all the good use cases that we're continually checking on all those kinds of
measures that they can't be, they can't be misused. Are we in an AI bubble?
I think it's two binary a question, I would say.
I think, I mean, my view on this, this is just strictly my own opinion, is that there are
some parts of the AI industry that are probably in a bubble.
You know, if you look at like seed investment rounds being multi-10 billion dollar rounds
with basically nothing, it seems, I mean, there's talented teams, but it seems like that might
be the first signs of some kind of bubble. On the other hand, I think there's a lot of
amazing work and value to, at least from our perspective that we see, that not only are there
all the new product areas, so Gemini app, notebook LM, but thinking more forward robotics,
gaming. I mean, there's incredible uses of, and not just Gemini, but some of our other
models, Jeannie. You can imagine my old gamespaying background. You know, I'm itching to think about
what could be done there. And drug discovery, we're doing an isomorphic and Waymo.
And so there's all these new greenfield areas.
They're going to take a while to mature into a massive multi-hundred billion dollar businesses.
But I think that there's actually potential for half a dozen to a dozen there that I think
Alphabet will be involved with, which I'm really excited about.
But also immediate returns, we got, of course, the engine room, you know, this is the
engine room part of Google, where we're pushing this into all of these incredible, you know,
multi-billing user products that people use every day.
And there's just almost, we have so many ideas, it's just about execution, like how would you reorganize workspace around that, Android, YouTube. There's just so much potential there. And I think a lot of that will also bring in near term revenue and direct returns while we're also investing in the future, not to speak of, you know, cloud revenue and TPUs and all of that, which I think is also going to be huge. So I feel really good about where we are as alphabet.
whether or not there's a bubble or not. I think our job is to be winning in both cases,
right? If there's no bubble and things carry on, then we're going to take advantage of that
opportunity. But if there is some sort of bubble and there's a retrenchment, I think we'll also
be best place to take advantage of that scenario as well. All right. Let's imagine it's Thanksgiving
coming up and it's the Bay Area. And one of our listeners, you know, changes the subject from
politics, which is upsetting everyone to AI. Give people something.
something to be excited about, and someone say, hey, I heard Gemini 3 just came out.
Like, what can it actually do?
What's the example that you would have our listeners show their friends, whether it's on
their phone and their laptop, to be, get a load of this and save Thanksgiving?
Yeah, I don't know if it'll save Thanksgiving, but it could probably provide some laughs.
You know, our imagery models in Gemini are still best in the world.
So what we would, I would say, grab your phone, can be, you know, iPhone, Android, doesn't matter,
pull it out.
you can take a selfie, put yourself in it and edit it.
People are still doing that at huge amounts, and it's great fun.
And then I think you can then show off any kind of other capabilities in the new Gemini
three alongside it.
But this is what we're seeing, people are kind of coming for a lot of these interesting
use cases and then starting to try other parts of the app too.
You heard it here.
Nanobanano will save Thanksgiving dinner.
Gentlemen, thank you.
It's great to talk, and thanks for making the time.
Appreciate it.
Thanks for having us.
Thank you all.
Thanks, guys.
Hard Fork is produced by Whitney Jones and Rachel Cohn.
We're edited by Jen Poyant.
Today's show is engineered by Chris Wood.
Original music by Diane Wong, Rowan Nemistow, and Dan Powell.
Video production by Soria Roque, Pat Gunther, and Chris Schott.
You can watch this full episode on YouTube at YouTube.com slash hardfork.
Special thanks to Paula Schumann, Pueh Wing-Tam, Dahlia Hadad, and Jeffrey Miranda.
You can email us, as always, at hardfork at NYTimes.com.
You know,
I'm sorry.
