Big Technology Podcast - Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential

Episode Date: May 20, 2025

Koray Kavukcuoglu is the Chief Technology Officer of Google DeepMind. Kavukcuoglu joins Big Technology to discuss how his team is pushing the frontier of AI research inside Google as the company's Goo...gle IO developer event gets underway. Tune in to hear Kavukcuoglu break down the value of brute scale versus novel techniques and how the new inference-time “DeepThink” mode could supercharge reasoning. We also cover Veo 3’s sound-synced video generation, the open-source-versus-proprietary debate, and what a ten-percent jump in model quality might unlock for users everywhere.

Transcript
Discussion (0)
Starting point is 00:00:00 What's going on in the heart of Google's AI research operation? We'll find out with Google DeepMind's chief technology officer right after this. Booking a Big Technology podcast, a show for cool-headed and nuanced conversation of the tech world and beyond. We have a great show for you today, a bonus show just as Google's I.O. News hits the wire. We have so much to talk about, including what's going on with the company, what it's announced today, but also what is happening in the research effort underlying. at all. And we have a great guest for you. Joining us today is Corai Kabuk Cholu. He is the chief technology officer of DeepMind. We're going to speak with Korai today. And then tomorrow you'll hear from DeepMind CEO Demis Hasabas. Cori, great to see you. Welcome to the show.
Starting point is 00:00:45 Thank you very much. Folks, by the way, if you're watching on video, Korai and I are in two separate conference rooms in Google's, I don't know, it's a pretty cool new building that they have. It's called what gradient wave or something? We call it the gradient Canopy. Gradient Canopy. Anyway, we're here. And I wanted to ask you a question that we've been asking on the show a lot, which is the scale question.
Starting point is 00:01:07 Now, Google has a tremendous amount of compute at your disposal. And so you basically have the option. Is it scale that you want to throw out these models or is it new techniques? So let me just ask it to you as plainly as I can. Is scale the star right now or is it a supporting actor in terms of trying to get models to the next step? It's a good question, I think also the way you framed it, because it is definitely an important factor.
Starting point is 00:01:36 The way I'd like to think about this is it's rare that in any research problem, you would have a dimension that pretty confidently would give you improvements, right? Of course, like with maybe diminishing returns, but most of the time with research, it's always like that. So, like, when we think about our research right now, in the case of generative AI, models, right? Scale is definitely one of those, but it's one of those things that are equally important with other things. When we are thinking about our architectures, like the architectural elements, the algorithms that we put in there, that make up the model, right, they are as
Starting point is 00:02:16 important as the scale. We, of course, analyze and understand as with scale, how do these different architectures, different algorithms become more and more effective? That's an important part, because you know that you are putting more computational capacity. And like you want to make sure that you research the kinds of architectures and algorithms that pay off the best under that kind of scaling property. But as I said, that's not the only one. Data is really important. I think it is as critical as any other thing. The algorithms, architectures, modules that we put into the system is important. Understanding their properties with data, with more compute, that is as important, right?
Starting point is 00:03:00 And then, of course, inference time techniques is as important as well, right? Because now that you have a particular architecture, a particular model, you can multiply its reasoning capabilities by making sure that you can use that model over and over again through different techniques at inference time. You know, to me it's both hopeful and puzzling
Starting point is 00:03:25 to hear about all the different techniques to make these models better. And I'll explain that. It's hopeful because it seems like we're definitely going to see a lot of improvement from where the models are today. And the models are already pretty good. The thing that's puzzling to me is the idea with scale was there was effectively limitless potential in making these AI models bigger.
Starting point is 00:03:49 And you said the words, diminishing returns. And we've heard that from you and basically everybody working on this problem. And it's no secret, right? that right now we've been waiting forever for GPT-5. Meta had some problems with Lama. Anthropic has been trying to tell us there's a new Claude Opus model coming out forever. We haven't seen it. So clearly a lot of the research houses, maybe with the exception of Google, are struggling
Starting point is 00:04:15 with what you get from when you make the models bigger. And so I just want to ask you about that. I mean, it seems like it's nice that there are all these techniques, but again, thinking about this one technique that was supposed to have limitless potential, is that a disappointment for the generative AI field overall, if that's not going to be the case? Yeah, I really don't think about it that way, because we have been able to push the capabilities of the models quite effectively, right? I think, in a way, the whole scale discussion starts from the scaling laws, right?
Starting point is 00:04:50 Like, scaling laws explain the performance of the models under both data and compute and number of parameters, right? And, like, researching all three in combination is the important thing. And when I look at the kind of progress that we are getting from that general technology, I think it is still improving. What I think is important is to make sure that there is a broad spectrum of research. that is going on across the board. And rather than thinking about scaling only in one dimension, there's actually many different ways to think about it. And investing in those,
Starting point is 00:05:33 and we can see the returns that I think across the field, really, not just here at Google, but across the field, many different models are improving with quite significant steps, right? So I think as a field, the progress has been quite stellar. I think it's very exciting. And in Google, we are very excited about the progress that we have been having with Gemini models. Going from 1.5 to 2 to 2.5, I think we had a very steady progress, very steady improvement in the capabilities of models,
Starting point is 00:06:10 both in the spectrum of the capabilities that we have, but also at the quality level for each capability as well, right? So I think what I'm excited about is we are pushing the frontier all the time, and we see returns in many research directions and many different dimensions of research directions. And I'm excited that there's actually, I think there is, there's a lot more progress to do. And there's a lot more progress that needs to happen for reaching AGI as well. We had Jan Lekun on the show a couple of weeks ago. You worked in Jan's lab. Yan emphatically stated, there is no way the AI industry is going to reach human-level intelligence, which is his term for AGI, just by scaling up LLMs.
Starting point is 00:06:57 Do you agree? Well, I mean, I think that's a hypothesis, right? That might turn out to be true or not. But also, I don't think that there's any research lab that is trying to only do scaling up the LLM. So, like, I don't know if anyone is actually trying to negate that hypothesis or not. I mean, we are not. From my point of view, we are investing in such a broad spectrum of research that I think that is what is necessary.
Starting point is 00:07:25 And clearly, I think, like many of the researchers that I talk to and me myself, I think that there is a lot more critical elements that needs to be invented, right? So there is critical innovations on our path to AI that we need to. we need to get through. That's why we are still looking at this as a very ambitious research problem. And I think it is important to keep that kind of critical thinking in mind. With any research problem, you always try to look at multiple different hypotheses, try to look at many different solutions.
Starting point is 00:08:03 A research problem, this ambitious, like probably the most important problem that we are working in our lifetimes, right? it is the hardest problem. Maybe we are working as a problem, as a research problem in our work. I think having that really ambitious research agenda and portfolio and making investments in many different directions is the important thing. From my point of view, what is important is defining where the goal is, that our goal is AGI, our goal is not to build AGI in a particular way. What's important is build the AGI in the
Starting point is 00:08:46 right way that is positively impactful, that is building on it, that we can bring a huge amount of benefits to the world. That's why we are trying to research AGI. That's why we are trying to build AGI. Like AGI in itself, sometimes it might come across as it's a goal in itself. The goal in itself is the fact that if we do that, then we can huge the benefit all of society, all of the world, right? That's the goal. So, like, with that responsibility, of course, like you put in not just particular, it's not very important to me if that particular hypothesis is important or not. What is important is we reach that with doing a very ambitious research, by pursuing a very ambitious research agenda and building a very strong understanding. of the field of intelligence.
Starting point is 00:09:38 Okay, so let's get to a little bit of that research agenda. One of the announcements that you're making at I.O., which is this week, which just when this airs, it will just have been made, is that there's a new product called DeepThink that you're releasing, which is relying on reasoning, whereas you put it, test time compute. I think I have that right in terms of what the product is going to look like. how effective has including reasoning in these models been and advancing them? I mean, would you say, when you think about all the different techniques that you've
Starting point is 00:10:12 discussed so far today, scaling included, what sort of a magnitude improvement are you seeing by using reasoning and talk a little bit about deep think? Okay, I mean, first of all, deep think is not necessarily, it's not a separate product. It is a mode that we are enabling our 2.5 Pro model so that it can spend a lot more time during inference time to think, to build hypotheses. And the important thing is to build parallel hypotheses rather than a single chain of toads.
Starting point is 00:10:49 It can build parallel ones. And then can reason over multiple of those, build a hypothesis, build an understanding over those, and then continue building those parallel chains of thots. But this one thinks a little bit longer than your traditional reasoning model? It will – I mean, in the current setup, yes, it takes longer, and it takes – because, like, understanding those parallel thoughts and building those parallel thoughts, it's all a much more longer process. But, like, one thing that we are also – that we are also positioning it as is right now, it's research, right?
Starting point is 00:11:31 Like, we are sharing some initial research results. We are excited about it. We are excited about the technique that what it enables, what it can actually enable in terms of new capabilities and new performance levels. But it's early days, and that's why we are only sharing it right now. We are going to start sharing with safety researchers and some trusted testers,
Starting point is 00:11:56 because we want to also understand the kinds of problems that people want to solve with it and the kinds of new capabilities it brings and how we should train it the way that we want to train. So it is early days on that, but it is like what I think is an exciting research direction that we found in the inference time thinking model space. Yeah, so can you talk about what precisely it does different
Starting point is 00:12:20 than traditional reasoning models? Like the current reasoning thinking models, most of the time at least I can talk from our research, research point of you, builds a single chain of thought, right? And then as you build a single chain of thought and as the model continues to attend to its chain of thought, it builds a better understanding of what response it wants to give you. It can alternate between different hypotheses, reflect on what it has done before. Now, of course, like one, if you think about just also in a visual kind of space, one kind
Starting point is 00:12:53 of scalability that you can bring onto the table is can you have multiple parallel chains of thoughts so that you can actually analyze different hypotheses in parallel, and then you will have more capacity exploring different kinds of hypotheses, and then you can look at, you can compare those, and then you can eliminate the ones, or you can continue pursuing, and you can sort of expand on particular ones. It's a very intuitive process in a way, but of course it is more involved. I just want to cap this segment by asking you in terms of the pace of improvement of models. Like, I'm just going to use the open AI schema just to give an example. The progress, this is something that everybody who comes on this show says, the progress of going from like
Starting point is 00:13:41 GPT3 to GPT4 was undeniable. GPD4 to 4.5, less of a leap. So I want to ask you just in terms of the velocity of improvement, if that's the right way to put it, are we coming? Coming back down to Earth a little bit right now. Again, when I look at our model family, right, going from Gemini 1 to 1.5 to 2 to 2.5, I'm very excited about the pace that we have. When I look at the capabilities that we keep adding, right, like we have always designed Gemini models to be multimodal from the beginning, right? That was our ambition because we want to build AGI, we want to make sure that we have
Starting point is 00:14:24 have models that can fulfill the capabilities that we expect from a general intelligence. So multi-modality was key from the beginning. And we have been, as the versions have been progressing, we have been adding that natural multi-modality more and more and more. And when I look at the pace of improvement in our reasoning capabilities, like lately we have added the thinking capabilities, and I think with 2.5 pro, we wanted to make a big leap in our reasoning capabilities, our coding capabilities. And I think one of the critical things is we are bringing all these together in one single model family. And that is actually one of the catalyzers of improvement and improvement at pace as well.
Starting point is 00:15:07 It's harder, but we find that creating a single model that can understand the world, and then you can ask questions about, oh, can you code me this sort of like a simulation of a tree growing, and then it can do it. Right? That requires understanding of a lot of the things, not just how to code, because, like, again, we are trying to bring these models to be useful, to be usable by a very broad audience. And I think our pace has been really reflective of the research investments that we have been doing across the board. So no velocity slowdown is what I'm hearing from you. Let me just put it in the way that I'm very excited about everything that we have been doing as Gemini Progressive. and research is getting more and more exciting.
Starting point is 00:15:55 Of course, for us, folks who are doing research, it is really good. Okay, so I want to ask you, you know, you're on the model side. I want to ask you, basically, sometimes we debate on the show what the value is of improving models. So let me just, like, put a thought experiment to you. What do you think the value of improving these models by 10% would get us? The question there is, like, how do we define 10%? right? Like that is where the value is defined already. One of the important things about doing research and improving the models is quantifying progress.
Starting point is 00:16:35 We use many different ways to quantify progress. And not every one of them is linear and not every one of them is linear with the same slope. So when we say by improving 10%, if we can improve 10%, if we can improve 10%, by its understanding in math, right? Understanding of really highly complex reasoning problems. I think that is a huge improvement because then that actually expands the general knowledge that would indicate that the general knowledge and the capabilities of the models have expanded a lot, right?
Starting point is 00:17:11 And you would expect that that would make the model a lot more applicable to a broader range of problems. And what about if you improved the model by like 50%, what would that get you? Is your product team like saying there are things that we can build if this model was just like 50% better? Again, I think like we work with product teams a lot, right? Like that's actually taking a step back. That's a quite important thing for me. Thinking about AGI as a goal, I think that also goes through working with the product teams. Because it is important that when we are building AGI, it's a research problem.
Starting point is 00:17:53 We are doing research. But the most critical thing is we actually understand what kind of problems to solve, what kind of domains to evolve these models from the users. So that user feedback and that knowledge from the interaction with the users is actually quite critical. So when our products tell us about, okay, here is an area that we want to improve on, then that is actually a quite important feedback for us that we can then turn into metrics and pursue those. As you ask, I mean, as we increase the capabilities of the model
Starting point is 00:18:30 across, I think what is important is across a broad range of metrics, which I think we have been seeing in Gemini, as I said, from like 1.5 to 2.5, right? You can see the capability increases across the model, A lot more people can actually use the models in their daily life to help them to either learn something new or to help them solve an issue that they see. But that's the goal, right? Like at the end of the day, again, like the reason we build this technology is to build something that is helpful. And the products are a critical aspect of how we measure and how we understand what is helpful and what is not.
Starting point is 00:19:12 And as we increase more in that, I think that's our main ambition. That's great. Let's take a concrete example that, again, the company Google is releasing today, talking about today, which is V-O-3. So this is your video generation model. And I think we've really seen an unbelievable acceleration in terms of what these models can do from the first generation to second generation to the third. And for listeners and viewers, what Google is doing now is not only are you able to generate scenes,
Starting point is 00:19:42 you're able to generate them with sound and having watched one of these videos or a couple of them I can tell you the sound matches and then there's this other crazy product that Google's putting out I think it's called Flow where you could just extend the scene that you've generated and storyboard out like your own basically short film
Starting point is 00:20:04 so I'd love to hear your perspective on how this happened and is this like you know I kind of ask you what do we get at 10 10%, 50%, but is this kind of that perfect example of the model getting better, producing something that goes from, you know, that's a fun little video to like, oh, I can really use this now. Yes. I think the main difference, the main progress going from VO2 to VO3, from VO1 to VO2, it was a lot more
Starting point is 00:20:37 about understanding the physics and the dynamics of the world. With EO2, I think for the first time we could comfortably say that for many, many cases, the model has understood the dynamics of the world well. That's very important, right? To be able to have a model that can generate scenes and complex scenes where there's dynamic environment happening and also there's interactions of objects happening. I remember one of the things that was quite viral was, like, cutting the tomato, where it was so precise the video generated by VO2 that it looked so realistic that a person was slicing tomatoes and the dynamics there and how about the, like, not just any single object, like how the hand moves, but also the interaction between the, between different objects, the blade, the tomato, how the slice falls down and everything. it was very precise, right? So that interactive element was important. Understanding the dynamics is about not just understanding the dynamics
Starting point is 00:21:45 of a particular single object, but it's also multiple objects interacting with each other, which is much, much more complex. So I think there we had a big jump. With V-O-3, I think we are doing another jump in that aspect, but I see the sound as an orthogonal, a new capability that is coming in. Of course, our real world, we have multiple senses, and vision and some go hand in hand, right?
Starting point is 00:22:10 Like they are perfectly correlated. We perceive them all at the same time, and they complement each other. So to be able to have a model that understands that interactivity, that complementarity, and being able to generate scenes and videos that can generate both at the same time, I think that speaks to the new, like the capability level of the model. And like the quality, I think, like this is the first step. There are very impressive examples that are examples that are like a little bit more falling short of what you would say, okay, this is really natural. But like I think this is an exciting step in terms of expanding that capability.
Starting point is 00:22:50 And as you said, I think I'm excited to see how like this kind of technology can be useful, right? Like you just said that, oh, it is becoming useful. I think that is great to hear, right? Like that, like now this is a technology that can be built. And I think flow is an experiment in that direction to give it to the, to give it to users so that for people to experiment and build something with it. Yeah, you prompt a scene, and then it creates a scene, then you prompt the next scene, and you can continue to have a story flow,
Starting point is 00:23:22 which is a good name for it. All right, this next question comes to me from a pretty smart AI researcher. They basically talked about how there's this basic, there's a tension between open source and proprietary. and of course we have companies like Google that's building, you know, obviously attention is all you need. The transformer came from Google. Now Google's building proprietary models. We saw DeepSeek push the state of the art forward. You could argue. So this person wanted to know, and I think it's a really good question. Is there a coordination or possible between open source and proprietary?
Starting point is 00:24:04 Maybe we see Open AI doing their new open source model or teasing it. Or should each sort of side try to get its own part of the market? What do you think? I think, like, I want to say a couple of things, right? Like, first and foremost, again, like, take a step back. There's a lot of research that went into building this technology, right? Like, of course, like in the last two, three years, I think it became so, accessible and so general that people are using in their daily lives. But there's a long history
Starting point is 00:24:39 of research that built up to this point. Right. So, like, as a research lab, Google, and like before, of course, there was Deep Mind and Google Brain, two separate labs that are working in tandem in different aspects. And many of the technologies that we see today has been built as research prototypes, right, as research ideas and have been published in papers, as you said, transformers, the most critical technology that is underlying things. And then models like AlphaGo, right, AlphaFault, all of these kinds of things, all these research ideas have been evolving into building the knowledge space that we have right now. All that research, I think publications and open sourcing all those have been a critical element because we were
Starting point is 00:25:26 really in the exporter space at those times. Nowadays, I think, Like the other thing that we always need to remember is, actually, we have at Google, we have our Jemma models, right? That are the open weights models, just like Lama open weights models. We have the Jemma open weights models. The reason to do those for us is also there's a different community of developers and users who want to interact with those models, who actually need that kind of being able to download those weights into their own environment and use that and build. with that. So I feel like it's not an either-or. I think there are different kinds of use cases and communities that actually benefit from different kinds of models. But what is most important
Starting point is 00:26:15 is at the end of the day, in the path towards AGI, of course, it's important that we are being conscious about what we enable with the technologies that we develop. So when we develop our frontier technologies, we choose to develop them under the Gemini umbrella, which are not OpenWaids models, because we want to also make sure that we can be responsible in the way that they are used as well. Right? But at the end of the day, what really matters is the research that goes into building the technology and doing that research and pushing the frontier of the technology and building it the right way with the positive impact. And I think it can happen both in open-weight ecosystem or in the closed system.
Starting point is 00:27:02 But when I think about all the sort of the umbrella of things that we are trying to do, we are quite ambitious goals, building AGI and doing it the right way with the positive impact. That's how we develop our Gemini models. Okay, I have like 30 seconds left with you. Your chief technology officer, are you a fan of vibe coding? Yes, exactly. I find it really exciting, right? I mean, because what it does is all of a sudden
Starting point is 00:27:32 it enables a lot of people who do not necessarily who do not necessarily have that coding background to build applications. It's a whole new world that is opening, right? Like you can actually say, oh, I want an application like this and then you see it. You can imagine what kinds of things could be possible in the space of learning, right?
Starting point is 00:27:52 You want to learn about something. You can have a textual representative, But you can ask the model to build you an application that explains you certain concepts, and it would do it, right? And this is the beginning, right? Like some things it does well, some things it doesn't well, it doesn't do well. But I find it really exciting. This is the kinds of things that the technology brings. All of a sudden, like the whole space of building applications, the whole space of building dynamic, interactive applications,
Starting point is 00:28:22 becomes accessible to a large broader community and set of people. Corite, great to see you. Thank you so much for coming on the show. Yeah, thank you very much. Thanks for inviting Alex. Definitely. We'll have to do it again in person some time. All right, everybody.
Starting point is 00:28:39 Thank you for listening. We'll have Demis Asabas on, the CEO of Google DeepMind tomorrow. And so we invite you to join us then. We'll see you next time on Big Technology Podcast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.