Big Technology Podcast - Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential
Episode Date: May 20, 2025Koray Kavukcuoglu is the Chief Technology Officer of Google DeepMind. Kavukcuoglu joins Big Technology to discuss how his team is pushing the frontier of AI research inside Google as the company's Goo...gle IO developer event gets underway. Tune in to hear Kavukcuoglu break down the value of brute scale versus novel techniques and how the new inference-time “DeepThink” mode could supercharge reasoning. We also cover Veo 3’s sound-synced video generation, the open-source-versus-proprietary debate, and what a ten-percent jump in model quality might unlock for users everywhere.
Transcript
Discussion (0)
What's going on in the heart of Google's AI research operation?
We'll find out with Google DeepMind's chief technology officer right after this.
Booking a Big Technology podcast, a show for cool-headed and nuanced conversation of the tech world and beyond.
We have a great show for you today, a bonus show just as Google's I.O. News hits the wire.
We have so much to talk about, including what's going on with the company, what it's announced today, but also what is happening in the research effort underlying.
at all. And we have a great guest for you. Joining us today is Corai Kabuk Cholu. He is the chief
technology officer of DeepMind. We're going to speak with Korai today. And then tomorrow you'll
hear from DeepMind CEO Demis Hasabas. Cori, great to see you. Welcome to the show.
Thank you very much. Folks, by the way, if you're watching on video,
Korai and I are in two separate conference rooms in Google's, I don't know, it's a pretty cool
new building that they have. It's called what gradient wave or something? We call it the gradient
Canopy.
Gradient Canopy.
Anyway, we're here.
And I wanted to ask you a question that we've been asking on the show a lot, which is
the scale question.
Now, Google has a tremendous amount of compute at your disposal.
And so you basically have the option.
Is it scale that you want to throw out these models or is it new techniques?
So let me just ask it to you as plainly as I can.
Is scale the star right now or is it a supporting actor in terms of trying to get models to
the next step?
It's a good question, I think also the way you framed it, because it is definitely an
important factor.
The way I'd like to think about this is it's rare that in any research problem, you would
have a dimension that pretty confidently would give you improvements, right?
Of course, like with maybe diminishing returns, but most of the time with research, it's always
like that.
So, like, when we think about our research right now, in the case of generative AI,
models, right? Scale is definitely one of those, but it's one of those things that are equally
important with other things. When we are thinking about our architectures, like the architectural
elements, the algorithms that we put in there, that make up the model, right, they are as
important as the scale. We, of course, analyze and understand as with scale, how do these different
architectures, different algorithms become more and more effective? That's an important part, because
you know that you are putting more computational capacity. And like you want to make sure that
you research the kinds of architectures and algorithms that pay off the best under that kind
of scaling property. But as I said, that's not the only one. Data is really important. I think
it is as critical as any other thing. The algorithms, architectures, modules that we put into
the system is important. Understanding their properties with data, with more compute,
that is as important, right?
And then, of course, inference time techniques
is as important as well, right?
Because now that you have a particular architecture,
a particular model,
you can multiply its reasoning capabilities
by making sure that you can use that model
over and over again through different techniques at inference time.
You know, to me it's both hopeful and puzzling
to hear about all the different techniques
to make these models better.
And I'll explain that.
It's hopeful because it seems like we're definitely going to see a lot of improvement
from where the models are today.
And the models are already pretty good.
The thing that's puzzling to me is the idea with scale was there was effectively
limitless potential in making these AI models bigger.
And you said the words, diminishing returns.
And we've heard that from you and basically everybody working on this problem.
And it's no secret, right?
that right now we've been waiting forever for GPT-5.
Meta had some problems with Lama.
Anthropic has been trying to tell us there's a new Claude Opus model coming out forever.
We haven't seen it.
So clearly a lot of the research houses, maybe with the exception of Google, are struggling
with what you get from when you make the models bigger.
And so I just want to ask you about that.
I mean, it seems like it's nice that there are all these techniques, but again,
thinking about this one technique that was supposed to have limitless potential, is that a disappointment
for the generative AI field overall, if that's not going to be the case?
Yeah, I really don't think about it that way, because we have been able to push the capabilities
of the models quite effectively, right?
I think, in a way, the whole scale discussion starts from the scaling laws, right?
Like, scaling laws explain the performance of the models under both data and compute and number of parameters, right?
And, like, researching all three in combination is the important thing.
And when I look at the kind of progress that we are getting from that general technology, I think it is still improving.
What I think is important is to make sure that there is a broad spectrum of research.
that is going on across the board.
And rather than thinking about scaling only in one dimension,
there's actually many different ways to think about it.
And investing in those,
and we can see the returns that I think across the field,
really, not just here at Google,
but across the field, many different models
are improving with quite significant steps, right?
So I think as a field, the progress has been quite stellar.
I think it's very exciting.
And in Google, we are very excited about the progress that we have been having with Gemini models.
Going from 1.5 to 2 to 2.5, I think we had a very steady progress, very steady improvement in the capabilities of models,
both in the spectrum of the capabilities that we have, but also at the quality level for each capability as well, right?
So I think what I'm excited about is we are pushing the frontier all the time, and we see returns in many research directions and many different dimensions of research directions.
And I'm excited that there's actually, I think there is, there's a lot more progress to do.
And there's a lot more progress that needs to happen for reaching AGI as well.
We had Jan Lekun on the show a couple of weeks ago.
You worked in Jan's lab.
Yan emphatically stated, there is no way the AI industry is going to reach human-level intelligence,
which is his term for AGI, just by scaling up LLMs.
Do you agree?
Well, I mean, I think that's a hypothesis, right?
That might turn out to be true or not.
But also, I don't think that there's any research lab that is trying to only do scaling up the LLM.
So, like, I don't know if anyone is actually trying to negate that hypothesis or not.
I mean, we are not.
From my point of view, we are investing in such a broad spectrum of research that I think
that is what is necessary.
And clearly, I think, like many of the researchers that I talk to and me myself, I think
that there is a lot more critical elements that needs to be invented, right?
So there is critical innovations on our path to AI that we need to.
we need to get through.
That's why we are still looking at this as a very ambitious research problem.
And I think it is important to keep that kind of critical thinking in mind.
With any research problem, you always try to look at multiple different hypotheses,
try to look at many different solutions.
A research problem, this ambitious, like probably the most important problem
that we are working in our lifetimes, right?
it is the hardest problem.
Maybe we are working as a problem, as a research problem in our work.
I think having that really ambitious research agenda and portfolio and making investments
in many different directions is the important thing.
From my point of view, what is important is defining where the goal is, that our goal is
AGI, our goal is not to build AGI in a particular way. What's important is build the AGI in the
right way that is positively impactful, that is building on it, that we can bring a huge amount
of benefits to the world. That's why we are trying to research AGI. That's why we are trying
to build AGI. Like AGI in itself, sometimes it might come across as it's a goal in itself.
The goal in itself is the fact that if we do that, then we can huge the benefit all of society, all of the world, right?
That's the goal.
So, like, with that responsibility, of course, like you put in not just particular, it's not very important to me if that particular hypothesis is important or not.
What is important is we reach that with doing a very ambitious research, by pursuing a very ambitious research agenda and building a very strong understanding.
of the field of intelligence.
Okay, so let's get to a little bit of that research agenda.
One of the announcements that you're making at I.O., which is this week,
which just when this airs, it will just have been made,
is that there's a new product called DeepThink that you're releasing,
which is relying on reasoning, whereas you put it, test time compute.
I think I have that right in terms of what the product is going to look like.
how effective has including reasoning in these models been and advancing them?
I mean, would you say, when you think about all the different techniques that you've
discussed so far today, scaling included, what sort of a magnitude improvement are you
seeing by using reasoning and talk a little bit about deep think?
Okay, I mean, first of all, deep think is not necessarily, it's not a separate product.
It is a mode that we are enabling our 2.5 Pro model
so that it can spend a lot more time during inference time
to think, to build hypotheses.
And the important thing is to build parallel hypotheses
rather than a single chain of toads.
It can build parallel ones.
And then can reason over multiple of those,
build a hypothesis, build an understanding over those,
and then continue building those parallel chains of thots.
But this one thinks a little bit longer than your traditional reasoning model?
It will – I mean, in the current setup, yes, it takes longer, and it takes – because, like, understanding those parallel thoughts and building those parallel thoughts, it's all a much more longer process.
But, like, one thing that we are also – that we are also positioning it as is right now,
it's research, right?
Like, we are sharing some initial research results.
We are excited about it.
We are excited about the technique that what it enables,
what it can actually enable in terms of new capabilities
and new performance levels.
But it's early days, and that's why we are only sharing it right now.
We are going to start sharing with safety researchers
and some trusted testers,
because we want to also understand the kinds of problems
that people want to solve with it
and the kinds of new capabilities it brings
and how we should train it the way that we want to train.
So it is early days on that,
but it is like what I think is an exciting research direction
that we found in the inference time thinking model space.
Yeah, so can you talk about what precisely it does different
than traditional reasoning models?
Like the current reasoning thinking models,
most of the time at least I can talk from our research,
research point of you, builds a single chain of thought, right?
And then as you build a single chain of thought and as the model continues to attend to its
chain of thought, it builds a better understanding of what response it wants to give you.
It can alternate between different hypotheses, reflect on what it has done before.
Now, of course, like one, if you think about just also in a visual kind of space, one kind
of scalability that you can bring onto the table is can you have multiple parallel chains
of thoughts so that you can actually analyze different hypotheses in parallel, and then you will have
more capacity exploring different kinds of hypotheses, and then you can look at, you can compare
those, and then you can eliminate the ones, or you can continue pursuing, and you can sort
of expand on particular ones. It's a very intuitive process in a way, but of course it is more
involved. I just want to cap this segment by asking you in terms of the pace of improvement of
models. Like, I'm just going to use the open AI schema just to give an example. The progress,
this is something that everybody who comes on this show says, the progress of going from like
GPT3 to GPT4 was undeniable. GPD4 to 4.5, less of a leap. So I want to ask you just in terms
of the velocity of improvement, if that's the right way to put it, are we coming?
Coming back down to Earth a little bit right now.
Again, when I look at our model family, right, going from Gemini 1 to 1.5 to 2 to 2.5, I'm very excited
about the pace that we have.
When I look at the capabilities that we keep adding, right, like we have always designed Gemini
models to be multimodal from the beginning, right?
That was our ambition because we want to build AGI, we want to make sure that we have
have models that can fulfill the capabilities that we expect from a general intelligence.
So multi-modality was key from the beginning. And we have been, as the versions have been progressing,
we have been adding that natural multi-modality more and more and more. And when I look at the
pace of improvement in our reasoning capabilities, like lately we have added the thinking
capabilities, and I think with 2.5 pro, we wanted to make a big leap in our reasoning capabilities,
our coding capabilities.
And I think one of the critical things is we are bringing all these together in one single model family.
And that is actually one of the catalyzers of improvement and improvement at pace as well.
It's harder, but we find that creating a single model that can understand the world,
and then you can ask questions about, oh, can you code me this sort of like a simulation of a tree growing,
and then it can do it.
Right? That requires understanding of a lot of the things, not just how to code, because, like, again, we are trying to bring these models to be useful, to be usable by a very broad audience.
And I think our pace has been really reflective of the research investments that we have been doing across the board.
So no velocity slowdown is what I'm hearing from you.
Let me just put it in the way that I'm very excited about everything that we have been doing as Gemini Progressive.
and research is getting more and more exciting.
Of course, for us, folks who are doing research, it is really good.
Okay, so I want to ask you, you know, you're on the model side.
I want to ask you, basically, sometimes we debate on the show what the value is of improving models.
So let me just, like, put a thought experiment to you.
What do you think the value of improving these models by 10% would get us?
The question there is, like, how do we define 10%?
right? Like that is where the value is defined already.
One of the important things about doing research and improving the models is quantifying progress.
We use many different ways to quantify progress.
And not every one of them is linear and not every one of them is linear with the same slope.
So when we say by improving 10%, if we can improve 10%, if we can improve 10%,
by its understanding in math, right?
Understanding of really highly complex reasoning problems.
I think that is a huge improvement because then that actually expands the general knowledge
that would indicate that the general knowledge and the capabilities of the models
have expanded a lot, right?
And you would expect that that would make the model a lot more applicable to a broader range of problems.
And what about if you improved the model by like 50%, what would that get you?
Is your product team like saying there are things that we can build if this model was just like 50% better?
Again, I think like we work with product teams a lot, right?
Like that's actually taking a step back.
That's a quite important thing for me.
Thinking about AGI as a goal, I think that also goes through working with the product teams.
Because it is important that when we are building AGI, it's a research problem.
We are doing research.
But the most critical thing is we actually understand what kind of problems to solve,
what kind of domains to evolve these models from the users.
So that user feedback and that knowledge from the interaction with the users is actually quite critical.
So when our products tell us about, okay, here is an area that we want to improve on,
then that is actually a quite important feedback for us
that we can then turn into metrics and pursue those.
As you ask, I mean, as we increase the capabilities of the model
across, I think what is important is across a broad range of metrics,
which I think we have been seeing in Gemini, as I said,
from like 1.5 to 2.5, right?
You can see the capability increases across the model,
A lot more people can actually use the models in their daily life to help them to either learn something new or to help them solve an issue that they see.
But that's the goal, right?
Like at the end of the day, again, like the reason we build this technology is to build something that is helpful.
And the products are a critical aspect of how we measure and how we understand what is helpful and what is not.
And as we increase more in that, I think that's our main ambition.
That's great.
Let's take a concrete example that, again, the company Google is releasing today,
talking about today, which is V-O-3.
So this is your video generation model.
And I think we've really seen an unbelievable acceleration in terms of what these models can do
from the first generation to second generation to the third.
And for listeners and viewers, what Google is doing now is not only are you able to generate scenes,
you're able to generate them with sound
and having watched one of these videos
or a couple of them I can tell you the sound matches
and then there's this other crazy product that
Google's putting out I think it's called Flow
where you could just extend the scene that you've generated
and storyboard out like your own
basically short film
so I'd love to hear your perspective on how this happened
and is this like
you know I kind of ask you what do we get at 10
10%, 50%, but is this kind of that perfect example of the model getting better, producing
something that goes from, you know, that's a fun little video to like, oh, I can really use this
now.
Yes.
I think the main difference, the main progress going from VO2 to VO3, from VO1 to VO2, it was a lot more
about understanding the physics and the dynamics of the world.
With EO2, I think for the first time we could comfortably say that for many, many cases, the model has understood the dynamics of the world well.
That's very important, right?
To be able to have a model that can generate scenes and complex scenes where there's dynamic environment happening and also there's interactions of objects happening.
I remember one of the things that was quite viral was, like, cutting the tomato, where it was so precise the video generated by VO2 that it looked so realistic that a person was slicing tomatoes and the dynamics there and how about the, like, not just any single object, like how the hand moves, but also the interaction between the, between different objects, the blade, the tomato, how the slice falls down and everything.
it was very precise, right?
So that interactive element was important.
Understanding the dynamics is about not just understanding the dynamics
of a particular single object,
but it's also multiple objects interacting with each other,
which is much, much more complex.
So I think there we had a big jump.
With V-O-3, I think we are doing another jump in that aspect,
but I see the sound as an orthogonal, a new capability that is coming in.
Of course, our real world, we have multiple senses,
and vision and some go hand in hand, right?
Like they are perfectly correlated.
We perceive them all at the same time, and they complement each other.
So to be able to have a model that understands that interactivity, that complementarity,
and being able to generate scenes and videos that can generate both at the same time,
I think that speaks to the new, like the capability level of the model.
And like the quality, I think, like this is the first step.
There are very impressive examples that are examples that are like a little bit more falling short of what you would say, okay, this is really natural.
But like I think this is an exciting step in terms of expanding that capability.
And as you said, I think I'm excited to see how like this kind of technology can be useful, right?
Like you just said that, oh, it is becoming useful.
I think that is great to hear, right?
Like that, like now this is a technology that can be built.
And I think flow is an experiment in that direction to give it to the,
to give it to users so that for people to experiment and build something with it.
Yeah, you prompt a scene, and then it creates a scene,
then you prompt the next scene, and you can continue to have a story flow,
which is a good name for it.
All right, this next question comes to me from a pretty smart AI researcher.
They basically talked about how there's this basic,
there's a tension between open source and proprietary.
and of course we have companies like Google that's building, you know, obviously attention is all
you need. The transformer came from Google. Now Google's building proprietary models. We saw DeepSeek
push the state of the art forward. You could argue. So this person wanted to know, and I think it's a
really good question. Is there a coordination or possible between open source and proprietary?
Maybe we see Open AI doing their new open source model or teasing it.
Or should each sort of side try to get its own part of the market?
What do you think?
I think, like, I want to say a couple of things, right?
Like, first and foremost, again, like, take a step back.
There's a lot of research that went into building this technology, right?
Like, of course, like in the last two, three years, I think it became so,
accessible and so general that people are using in their daily lives. But there's a long history
of research that built up to this point. Right. So, like, as a research lab, Google, and
like before, of course, there was Deep Mind and Google Brain, two separate labs that are working in
tandem in different aspects. And many of the technologies that we see today has been built as
research prototypes, right, as research ideas and have been published in papers, as you said,
transformers, the most critical technology that is underlying things. And then models like
AlphaGo, right, AlphaFault, all of these kinds of things, all these research ideas have been
evolving into building the knowledge space that we have right now. All that research, I think
publications and open sourcing all those have been a critical element because we were
really in the exporter space at those times. Nowadays, I think,
Like the other thing that we always need to remember is, actually, we have at Google, we have our Jemma models, right?
That are the open weights models, just like Lama open weights models.
We have the Jemma open weights models.
The reason to do those for us is also there's a different community of developers and users who want to interact with those models,
who actually need that kind of being able to download those weights into their own environment and use that and build.
with that. So I feel like it's not an either-or. I think there are different kinds of use cases
and communities that actually benefit from different kinds of models. But what is most important
is at the end of the day, in the path towards AGI, of course, it's important that we are being
conscious about what we enable with the technologies that we develop. So when we develop our frontier
technologies, we choose to develop them under the Gemini umbrella, which are not OpenWaids
models, because we want to also make sure that we can be responsible in the way that they
are used as well. Right? But at the end of the day, what really matters is the research
that goes into building the technology and doing that research and pushing the frontier of the
technology and building it the right way with the positive impact. And I think it can happen
both in open-weight ecosystem or in the closed system.
But when I think about all the sort of the umbrella of things that we are trying to do,
we are quite ambitious goals, building AGI and doing it the right way with the positive impact.
That's how we develop our Gemini models.
Okay, I have like 30 seconds left with you.
Your chief technology officer, are you a fan of vibe coding?
Yes, exactly.
I find it really exciting, right?
I mean, because what it does is all of a sudden
it enables a lot of people who do not necessarily
who do not necessarily have that coding background
to build applications.
It's a whole new world that is opening, right?
Like you can actually say, oh, I want an application like this
and then you see it.
You can imagine what kinds of things could be possible
in the space of learning, right?
You want to learn about something.
You can have a textual representative,
But you can ask the model to build you an application that explains you certain concepts, and it would do it, right?
And this is the beginning, right?
Like some things it does well, some things it doesn't well, it doesn't do well.
But I find it really exciting.
This is the kinds of things that the technology brings.
All of a sudden, like the whole space of building applications, the whole space of building dynamic, interactive applications,
becomes accessible to a large broader community and set of people.
Corite, great to see you.
Thank you so much for coming on the show.
Yeah, thank you very much.
Thanks for inviting Alex.
Definitely.
We'll have to do it again in person some time.
All right, everybody.
Thank you for listening.
We'll have Demis Asabas on,
the CEO of Google DeepMind tomorrow.
And so we invite you to join us then.
We'll see you next time on Big Technology Podcast.