No Priors: Artificial Intelligence | Technology | Startups - Google DeepMind's Vision for AI, Search and Gemini with Oriol Vinyals from Google DeepMind

Starting point is 00:00:00 Hi, listeners, and welcome to No Pryors. Today, we're talking to Oriole Vinyl, the VP of Research at Google DeepMind and Technical Co-Leod for Gemini. His storied career in machine learning includes leading the Alpha Star team, which built a professionally competitive and pioneering Starcraft agent all the way to today, and we're really excited to get his historical perspective on where we are in machine learning. Welcome to the show, Oriol. Yeah, amazing.

Starting point is 00:00:27 Thanks, Sarah, for the invitation. and likewise, thanks a lot for hosting me. Yeah, thanks for joining. Last year was an eventful year at Google and DeepMind. You know, how is that research effort organized now? And what do you think of the mission as internally? Yeah, so sure. I mean, I'm happy to obviously discuss the different phases

Starting point is 00:00:46 that research organizations have gone through in the last many years. But focusing on last year, two major events happened. One was that the Gemini project was formed, as a result of having two sort of parallel efforts on LLMs, mostly led by Google Brain and what we now call Legacy DeepMind. So earlier in the year, there was an effort to merge the two projects, and that's when sort of Jeff, Jeff and I came together and brought the two teams together to create the very first Gemini model, which was eventually released later in the year. Then the second big event was to take all the organizations that were doing AI research or AI research and also form a singular organization. That's what today is called Google DeepMind, and it comes from Google Brain and Legacy

Starting point is 00:01:42 DeepMind coming again together under one roof. Obviously, Gemini being a very large and very important project within that organization. And really the goal of Gemini itself is to create. an awesome core model to power the technology that, of course, LLMs today are powering all around the world, and we obviously expect this to unincreased. How do you interact with the rest of the company and, like, Google is a business? And I feel like I have to ask you, does AI replace traditional search? So even running that from a research standpoint is super interesting, right?

Starting point is 00:02:17 There's two major centers, one in California, one in London, given the organization, that we come from. So that in itself is very interesting. In a way, we have the project running 24-7, which is helpful when you train these large models. And then you have to do a few things, right? One of the things we do, of course, is trying to build state-of-the-art technology, showing from sort of a research, knowing where the field is coming from and when it's going to, trying to really showcase from our own sort of intuitions and ambition what might come next, right?

Starting point is 00:02:53 might come next, right? So a prime example of these was, for example, the long context that we released earlier in the year, right? Millions, millions of tokens now have been able to be processed by our models. But then, of course, we also sort of take into consideration all the different needs, right, from the different products that we work with. Google has a lot of product areas. So we try to focus, of course, initially, especially to form the project. We try to focus on critical projects, and you see that very much by how Gemini is first surface to users or to enterprises, right? So obviously, cloud and enterprise is very important, developers as well.

Starting point is 00:03:37 Super cool to put these models in the hands of creative minds that are going to do things you didn't even anticipate these models could do. And then very important, formerly known Bart, now Gemini App, which is sort of the chat bot surface of our models. And then maybe the last very important piece indeed is search, which is trying to integrate, of course, this technology into their product. And of course, has a lot of users.

Starting point is 00:04:03 So it's extremely exciting to think, well, the decisions you make at modeling eventually, and eventually means just maybe a few couple months after or so will make it into the users that, maybe are signing up for a beta, et cetera. So super exciting, and it's obviously connected. It's the core of the company, really, especially for the products that require very intelligent AI systems like the ones we're creating today. How do you think about the various types of use cases that fall under chat-based model versus search-based models? Because I remember I was

Starting point is 00:04:36 at Google many years ago at this point. And at the time, a lot of the different types of search queries were broken into different chunks. There's navigational queries. You're trying to get to some other site. And it just kind of helps direct you there. There were sort of strong intent commerce queries. You know, you could kind of break it down. There's medical queries. And so you can kind of map out the world of different types of things that the users are actually trying to do with the searches of the interface to get there. What do you think will move more towards chat and what do you think remains in the domain of more traditional search-based approaches? We might not know the answer. And like we're sort of experimenting in a way, right? So you can kind

Starting point is 00:05:11 kind of think an LLM first experience, like that's the chatbot, right? So in there, search has a role to play because it can be seen as a tool that enhances the answers of your chat experience, right? It provides citations, a bit more reliance. We know that, of course, language models can and do hallucinate. So there's that sort of LLM first point of view, which we are building up sort of from, it's kind of a more of a new product, And then search itself, which is obviously embracing sort of enhancing some of, as you said, different query types that could be useful to be sort of do a bit more like summaries, like AI summaries, but there's a lot more coming.

Starting point is 00:05:56 I think IO actually showcased this year quite a lot of a good breadth of vision of what search is trying to do with language models, right? So some of it is not ready, but it's being tested and obviously, like, feedback is very important. But I think for now, it's hard to think of a convergence even or that one sort of dominates the other. I think both right now seem useful in different ways. I as a user use both, certainly. But I think what is very clear, though, is that even if search is sort of the initial point where you have a query and you, want to research something, it just feels like that experience will tremendously be enhanced by these

Starting point is 00:06:44 models. So the search product itself is trying to figure out how to integrate the LLM answers or their capabilities, reasoning capabilities, et cetera. So I think we're going to see a lot of that and I kind of call like from search integrating LLMs and of course that vice versa is very obvious as well. And then as a core model project like Gemini project is, we have an open mind. We don't really need to, you know, decide one or the other. And one of the things of operating at the scale that Google operates is that, yes, we are the research team that builds the models, but then, of course, the PAs are the ones that are going to drive the strategy, of course, influenced by what the models can do and with input from many of us that have been dreaming about this world.

Starting point is 00:07:32 And it's kind of cool to also have the very experienced folks, right, sort of iterate with their users and their use cases. That makes a lot of sense. I mean, I think one of the things that's very understated or forgotten about Google is the degree to which it really was the world's first AI first company or ML first company as it used to be called, right? I mean, the entire product, both in search and ads, was very AI driven in the very earliest days. And part of that, I guess, is also inferring user intent through action and then algorithmically

Starting point is 00:08:01 adding that as an outcome. And so I guess to your point, you could continue to use search as a primary UI or interface or entry point. And then if it routes to more of a chat-based thing or a chat bubble could pop up or something else, it could just happen organically based on the type of intent that the user is exhibiting. And so I think you're raising really interesting points about the capabilities of Google and all the rest of it.

Starting point is 00:08:24 I guess related to that, what has been the most surprising thing about how users or companies have been interacting with Gemini? Yeah, there's quite a few, right? I guess maybe the ones that surprised me the most because initially I even thought this was just a number that you could report is the fact that, you know, infinite context length is coming eventually. So I thought, look, I mean, this is interesting, right? We come from a world where we had recording neural networks and LSTMs that actually had infinite memory, although it was not very capable, right? the models in practice, they never remember more than a few hundred words or so. So that was kind of first that we could make the context length so long and then seeing the

Starting point is 00:09:12 use cases just emerge even internally when we were first just trying the model. That, I don't know, that seems very trivial now in hindsight, but, you know, putting a whole one-hour video and just ask anything and it feels superhuman, right? you just literally put the video in, and after 10 seconds, 30 seconds, I mean, it does take some time to process the context, but you just can't ask anything, right? And I mean, thinking of computer vision as a field or video question answering some of these data sets that we come from, I mean, they all that seem very dwarf than compared to the capability that was in our hands.

Starting point is 00:09:48 And then we put it in the hands of developers, and we saw, like, I mean, amazing, obviously demos and things that people could do. Even as mind-blowing, as you point the camera to the screen directly and not even the code, right? It's just the letters that appear on the screen, and you can debug like that, right? So you can imagine how future interfaces will be effortless. I mean, you just need to point the camera, ask a question, and you get answers. It's an interesting research problem, of course, but maybe it's not going to be that useful to now thinking, wow, that's amazing. Yet, it's not very mainstream either, right?

Starting point is 00:10:24 So we are kind of still trying to discover, right? What does this enable? We showed Project Astra where, of course, memory, you get the phone and you just interact with it as if it is an agent. Memory is very important, but it's still not very clear in a few years what this might be, although you could imagine, well, the whole web is in the context or all your personal data because it's remained in your device because it's the working memory of the model rather than the weights in lots of applications, but still fairly early days, right?

Starting point is 00:10:57 So it surprised me yet, of course, in a way, it hasn't taken off fully, gone fully mainstream, although it does feel magical when you start interacting and realizing you can ask anything, you get an answer like this without watching any of these movies or many books you can upload, etc. What do you think is a time frame for a very long context windows really being in broad-based use? This is not as specific to Google question, but it seems like, like multiple people are on a trajectory to add this at scale and you're also seeing it crop up in biology models and other things like that.

Starting point is 00:11:29 So I'm just a little bit curious in terms of the time frame for which it actually hits either a large enterprise or consumer use cases. And to your point, you can imagine ones where the company is doing something on behalf of the consumer by adding all the context from their device or from their interactions into the context window, or it could be an enterprise that uploads a large folder including a bunch of legal documents and then it gets incorporated into some query or something. If there is a compelling use case, the technology is not far away from being able to be deployed at scale. And of course, hardware is also being updated based on what, I guess, the research developments are, right?

Starting point is 00:12:05 So certainly, like, one, two years, this might be quite, the context that is a commodity will be definitely enhanced by a factor, I mean, I don't know, 10X or so everywhere. And then, I mean, extremely long context, I think it's going to be a motivational drive for definitely from a research perspective. And then deploying it at scale, some techniques, like many have been explored already, like hierarchical memories and so on. And even rack is pretty common, right? So we're going to combine this and probably thanks to the use cases, I expect, I mean, order of one, two years, you might see, wow, we went another order of magnitude. from both state-of-the-art and, of course, what's considered a commodity. So I'm pretty certain about these. That's, again, modular finding use cases that will be compelling to serve the model that

Starting point is 00:13:00 requires, I mean, more memory. There are some certain limitations, of course. But these will be figured out from a technological standpoint if there's the motivation, for sure. In the very quickly coming era of infinite context, like what is the relevance of retrieval architectures and more hierarchical memory? And, like, you know, I think you can make an argument for this being continually relevant just from an efficiency perspective. But, you know, how do you think about it? Yeah, I think definitely the efficiency argument to hierarchical memories to make context even longer make a lot of sense.

Starting point is 00:13:35 And even from just efficiency of learning and, of course, of retrieving the memory in a sort of course to fine manner like, like, you know, an intelligent being like ourselves might do makes sense. So I think that even the quality will motivate this sort of solution regardless. And we do have a lot of experience, of course, of retrieval-based methods, definitely at Google, and then combining them with neural-based methods, I think it's a matter of time and finessing the details and the use cases of how much the problem with, of course, retrieval-based methods is they tend to simplify things to say, hey, like this whole book is just a single vector, whereas if you just upload a whole book into Gemini and ask questions, it can really reason about every single word, right? So probably

Starting point is 00:14:27 finding the middle grounds for different use cases is needed, but I think to me it seems like a feature not a back that we do have a bit of a hybrid mode perhaps going into the future and research will be driven like this. How do you contextualize this moment in time? just in terms of, you know, what the biggest limitations are for current state-of-the-art LLMs and, like, what's worth working on. I mean, there's one reflection that even many years ago with friends who were kind of early, quote-unquote, in the game, they said, well, get ready, lots of brilliant people as this gets mainstream will enter the field.

Starting point is 00:15:04 And you certainly see this, right, with open sourcing and a bit of a random search, even, right? It's not like, I mean, you just selection bias, like someone does something random, but people actually want that and then that becomes sort of viral in a way. So I think there's the sheer size of the field that is one aspect that I think we were sort of anticipating, but to me that's one of the biggest changes that I've seen, that there's more brains, more different backgrounds coming into the field. And that is combined. I usually tend to assign credit with what has happened to, of course, the scale of data and compute, algorithmic advances that you can simplify,

Starting point is 00:15:49 but there's certainly been some that have been important in the last 10, 20 years or so. And then actually the accessibility, right? The software, the open sourcing efforts, those have been quite critical to then create these sort of exponentials or linear trends in log scale that we're seeing. Now, a bit more into sort of how I see the field from maybe, like, I tend to call maybe the 2000, let's say 10 to 20, like deep learning era. So what that era did, right, is it took a set of algorithms that were general, right? The algorithms are like stochastic rate in the sand, deep learning, neural networks, right,

Starting point is 00:16:31 reinforcement learning. And you could think of these are ingredients. They're common. You expand them over the years, but they're certainly the same. And then you just apply them to a domain, and you get extremely good at that domain, right? So we have, you know, mastering the game of Go, beating the ImageNet Challenge, becoming state-of-the-art speech recognition, a state-of-the-art image generations, right? So that decade is kind of the models themselves certainly are not general,

Starting point is 00:17:00 but the algorithms are general, right? you could just take the same algorithm and then change the dataset and, of course, tweak a couple of things and voila, you get like protein folding really enhanced, right, from the traditional methods. And I think then the greatest insight, of course, came from realizing that, and I think that's a lucky factor for us because it makes communication with these entities much easier. But modeling language turns out to be such a powerful abstraction for generality. So, of course, the GPT two paper especially posed that sort of, in the abstract, it says, look, you can solve every task, not one task, by modeling language. And then perfecting that, the whole field, of course, building up from a lot of years of research, created what is not just the algorithms are reusable, but actually the model now becomes general. So that's why I think, like, okay, HGI is getting closer, that, you know, we have powerful models first into, 2010-20, now we have general models.

Starting point is 00:18:02 And I think multimodality has been more recently another amazing breakthrough that these techniques expand not to language, but also to vision, sound, videos, etc. So that means that we have very powerful general models that from an AGI definition standpoint, it starts to tick many boxes. Reasoning capabilities of the models are there, but I don't think we've perfected sort of making the reasoning very crisp and accurate so that these models would not hallucinate or would not, you know, the model might solve an, you know, an Olympiad mathematical problem yet failed to then discern a very basic puzzle, right? And I think

Starting point is 00:18:45 what you do with the model, uh, this reasoning step, um, there's a lot of ideas and a lot of experience and a lot of algorithmic advances we've also done in the last few years over like surge and so on, but we haven't quite perfected this. And then, of course, the question is, you know, push the frontier and move forward in certain domains, at least. But yeah, we've come a long way. I think the investment and luckily these models are finding usefulness. So the resources that go into the training in the models, right now there's a good sort of feedback

Starting point is 00:19:20 loop, right, of revenue and then reinvesting certainly from the biggest players. So as a researcher, of course, I mean, we welcome that. What is the difference between having reasoning capability and having reasoning capability that's crisp and accurate? Sometimes the distinction between probabilistically solving something, right? So right now, you know, you get these models, they assign, they still assign probability mass over every sequence of, let's simplify and think of not multimodal, but words. So every single sequence of words, it will assign. a probability distribution over those. So then you, of course, are observing all the knowledge on the internet

Starting point is 00:20:00 and then sharpening those models around following instructions, being aligned with humans. But you still have this probability distribution that will assign non-zero probability to certain things that would not seem to be correct. Although in language, of course, there's so many ways to say the same thing correctly. So that's why these models shine in the end of the day. they are very efficient ways to integrate over all possible sequences, right? Now, it's possible that, let's say now you predict when it's a hard problem, right, a tricky question that requires deep knowledge, you might be at 95% accuracy,

Starting point is 00:20:41 but of course, these will create errors, even if it's small. This is deployed to the whole world, so certainly you will get to see the mistakes. So one thought would be like you just keep making the models larger, you keep improving the algorithms, and you're going to hit a point where the probability of a mistake vanishes. It's possible. We will obviously explore that. But to accelerate that sort of progress, you want to start sort of really exploring what's the reasoning the model has. and by making it more redundant, more logical,

Starting point is 00:21:22 by iterating more on these kind of ideas, you could imagine generating a very small program that runs slowly with the language model at the center and then getting that 99.9% faster. And of course, you're going to do both as an ambitious lab and so on. But the crisply means that these problems, probability of error diminishes. And of course, we can always put more compute power. We humans will make mistakes. We get tired, et cetera. But I mean, these models are powerful.

Starting point is 00:21:57 We can put more hardware that inference. So the hope is that in that sense, they become at least as good as humans. You know, one way to like frame this problem is that even deep mind or any large lab or, you know, the human race, we have some limited amount of compute toward this problem. right? And it's like there's there seems to have been a strong shift of how much of that compute should be scaled at training time versus at inference time with, you know, test time search or some of the techniques that you describe in system two. What is your prediction of like what that mixes of compute at training versus inference time compute, you know, let's say two or three years from now? What we're kind of all aiming to to discover is to make the video.

Starting point is 00:22:45 lesson from reached out and true, right? The bitter lesson states that you scale learning and use skill search, and then that's all you have to do as a computer scientist. I mean, it's controversial. I certainly like, simply, I'm a deep learning at heart, so I don't disagree with that. Certainly the learning scalability part has been tested and proven quite heavily recently. The search, at least when you do not have access to a perfect reward, which is this is the the current, I think, current problem we have in language, that is, you know, one of the key research

Starting point is 00:23:20 areas, right? How do you assign reward fastily to statements that, I mean, not even, again, you and I might agree to be true, right? I mean, is the sky blue? I mean, I don't know. At night, it's not, right? It's quite an interesting kind of point that assigning truth or one or zero, which is required in games is not so applicable here. Now, historically, if you look at AlphaGo, which actually followed quite closely the recipe of, you pre-trained your model on all human data, you then use RL to make it better,

Starting point is 00:23:55 and then you do some search at inference time, the compute there was very skewed for the middle step, the reinforcement learning step, right? So I don't even know the exact numbers, but certainly the majority of compute was spent on the training of the self-play, as we called it, at the time, self-improvement loop. And if you look at today, that's clearly not the case. Most compute is spent in pre-training, and in fact, you will see over and over that you overfit, so you have to actually stop the reinforcement learning process to overfit to these imperfect reward functions.

Starting point is 00:24:33 otherwise you start sort of doing a bit of adversarial search against a reward function that might be perfect because you have a data set of human preferences and all of a sudden the model might discover, hey, you can issue lots of emoticons and these reward function things, that's great. And clearly, like, we have a problem of reward functions not being as accurate as in the game of go or chess and whatnot. And then there's the third component, which is now you have your model trained. How much do you let it ponder, which in AlphaGo again, using the same example, well, the rules of course say, don't quote me, but let's say roughly a game must last

Starting point is 00:25:11 four hours of compute time from a few months' perspective. So we know we had quite a limited inference time because we obviously couldn't go over time. So of course, there was paralysis and so on involved, but the compute there was certainly not as big as the one you used to train. So to me that balance feels correct. Like some are on pre-trading, and here we're trying to learn every task. So certainly that's going to be, you know, let's say it can be as high as 50%, not as high as over 90 like today. And then the rest mostly on reinforcement learning or if we get access to good rewards, that would be kind of the next maybe piece of compute, but much bigger than it is today.

Starting point is 00:25:53 And then inference time, I mean, system two is slow. but it's not terribly slow. So unless you ask the model, I mean, soft protein folding, then probably you can go for a vacation for a month and the model will come back with a solution. I think a few seconds of compute is okay for inference. So that would be the rough.

Starting point is 00:26:15 So that's a small percentage probably compared to the amount you're spending training. Of course, you're serving many queries for billions of people. And then that means research is needed, especially on the middle bucket of reinforcement learning step that currently feels still reasonably in the early stages of research. How do you think about scaling the reward function beyond games once you really had superhuman performance of the model? I'm thinking of things like the older like Med Palm two models and things like that

Starting point is 00:26:42 where they outperformed human physicians in terms of output relative to physician expert panels. And then obviously you could then do some post-training with physician experts as the key. But at some point the machine will be better than that. And so how do you keep scaling reward functions? Traditionally, right? You just get it's supervised learning, right? Reward function means good or bad. So we can scale that process as much as we have so far.

Starting point is 00:27:10 I mean, obviously many, many players are realizing the power of human annotation. And in fact, deep learning comes thanks to amazing like Fayfayle and lab effort to label a data set of a million examples, right? So that way of scaling is one. But then I strongly believe that there might be a bootstrapping effect of the models that become better at judging their own outputs, right? And so maybe, and that really is probably maybe even the main hypothesis of reinforcement learning as I see. I mean, I'm not a huge expert in RL. But if checking that something is correct is easier than creating the solution, then we're in business because the language models

Starting point is 00:27:56 will be able to evaluate their own samples more accurate than to generate them. And then we have a sort of reinforcement learning loop because we can reinforce the ones that seem more promising and then the model gets better, right? So that, using the model itself as a reward, which incidentally uses language, which is already fuzzy, is one area that, I mean, I'm excited about.

Starting point is 00:28:20 There's a leaderboard of reward models. Some of them are, I think, the name that they use is maybe generative reward model. I think that area goes beyond this kind of need for specific task annotations. We might need specific task annotations. Then the question is, how many labels will we need? And the hope is that in the limit, maybe you need as many labels as the user will provide the system

Starting point is 00:28:45 when the user wants to teach it something new. Not to have used it, but a friend of mine used to use Nyquist-Channon sampling theorem as sort of a proxy for how much an intelligent person or machine can actually extrapolate the intelligence of something smarter than itself. And, you know, it feels like you're almost falling into some version of that where, you know, I think that theorem basically states that, you know, you have some frequency on a wave and you're sampling it and you need to sample above a certain rate to be able to actually reconstruct the wave, right?

Starting point is 00:29:19 And you could argue that that's some form of learning or intelligence or something else. And so you need to be smart enough to actually tell how smart you can be in some sense. Yeah, I love that analogy. My undergrad degree, I studied Nyquist theorem quite heavily, which is funny because we broke it so much, right? I mean, let's, okay, sorry, aside on Nyquist, but, you know, what it says roughly is like, look, I mean, if you want to, let's say, output certain resolution or frequency in the Fourier domain, you need to sample at half the frequency you want to reconstruct, right? Otherwise, the information is simply not there, and you can see if you take a sinusoid and you sample too little, you will not recover the original frequency.

Starting point is 00:30:03 But then look at this super resolution generative models, right? You input like a 32 by 32 pixel image. It feels in the details in a way that completely violates that principle. So I talked to my signal processing teachers about this. And of course, it is violating and it is inventing. It's hallucinating. But of course, it's cheating because it turns out that the world has certain structure that you can learn, which is essentially what all these generative models of images do.

Starting point is 00:30:33 Going back to your point, I mean, I agree. That's a bit of like this sort of argument that there's these emergence properties, right? At some point, the model might have the capability, let's say, to self-correct. Let's call it self-correction. And as soon as it hits that capability, you can see how, wow, bootstrapping now is trivial. And of course, it's not going to be that blanket across all domains. But certainly, we're going to see some of these. It's not going to be that dramatic.

Starting point is 00:31:02 It's going to be more like, hey, this one now emerge and so on. But, of course, once that capability is there, you need to have the algorithms to exploit it, especially these reward models that will be mostly driven by the model itself, right? And then how many labels? How are you going to correct the mistakes that it might still do? Those are very interesting questions. And even from a product standpoint, they're quite cool, right? Like, I mean, we all play with these models. I mean, if they fail, I mean, you just could say, look, I didn't like this and the models should adapt. And that's where long context also plays a bit into the equation, right?

Starting point is 00:31:41 You have long interactions, you finesse what the model is doing for you. That sounds good at priori. So how do you enable those capabilities and so on is one, one of the many kind of exciting future directions in the field? We have this progression of the field from general algorithms to increasingly general models. We'll see how far we get that in that vein. deep mind has also done really amazing work in particular domains right protein folding material science whatever where are there are we fully in the era of general models or are you know

Starting point is 00:32:19 is starting with language and going to audio video as such does that solve the rest for us or what what other data sets or domains do you think are unique that are not well represented in that corpus the way you expect is perfect but I guess there's there's a sort of a time and also what level of performance question here, right? So could current models do a seal the reasonable job at folding proteins, perhaps, right? They might even use tools and the internet, download a bit of a piece of software and figure it out. So one way I kind of characterize general models is they are, I mean, the level of performance is development, but they're like 20% good at everything.

Starting point is 00:33:06 Okay, so that's the level of performance. They reach, but it's general, which, I mean, it's powerful. So you want that 20% to keep going up relatively uniformly across the board. So you don't want to over-specialize the models or the research. But the world has very important challenges that are worth solving with specialization, right? So I think what I tell people in the team and around me is like, look, if the problem is then let's obviously specialize. Probably we can and we might see more on more bootstrapping

Starting point is 00:33:39 from Gemini and these models to then a specific solution that the model is going to be throw away except for maybe cracking protein folding or figuring out nuclear fusion or modeling the weather, right? Some of the projects that are currently active in the deep mine portfolio, well, you better choose the ones the ones that will matter because the timeline is such that, well, we'll get there sooner if we specialize on those domains and then perhaps eventually, like, the general model will

Starting point is 00:34:14 overtake, but I mean, that's just probably far away or further away. So when it's worth it, do it. And that's, I think, we're going to still see this hybrid mode. Although, again, more and more taking maybe Gemini and then doing something amazing in a more narrow domain. And I think that to me seems like a good situation to be in, because that directionality of taking a generalist model and doing something by fine-tuning it, there's going to be a loop back as well, right? Then you're going to do something amazing, let's say, in math, right? We recently did that. And then, well, the data or as a reward model or something,

Starting point is 00:34:54 that model will do will help the main model get better as well. So it's quite, synergistic thing to do as well. But if it's important for the world, it's okay. I'm okay with still task specialization. That's for sure. There is a vein of criticism of math and computer science as with games or any other constrained domain like this that, you know, it's, and tell me if you think this is like a really niche view, but that it's a dead end versus like general reasoning

Starting point is 00:35:25 advancement, even though it has all these attractive attributes like the, you ability to generate and self-validate in many ways. Like, how do you react to that criticism? There's the validity that, again, going back to the reward function sort of question, right? You want to create the most general model or agent or intelligence. Turns out that reward is never perfectly defined by the environment. Even if you go on or surviving, look, it is extremely complicated to compute a reward. So you could argue, like, look, when you do get these rewards from certainly somewhat artificial,

Starting point is 00:36:07 you know, interesting but artificial domains, that might not generalize to the real problem. And in that sense, it could be at that end. Then how you do the research is important, right? Let's use math as an example, right? We do have access to the reward. But even there, it's not that simple, right? you know, if, I mean, if I'm doing a simple calculation, okay, yeah, 4 plus 4 equal 8, that I can check, simple. But now you start thinking, well, prove this theorem. The proof is either correct

Starting point is 00:36:43 or not, but that starts to be more complex to get a crisp reward signal sometimes, unless you can formalize it, and then there might be back in the formalization process or in the maybe in the engine that checks the math underneath. And even then, I mean, if I say four plus four, and instead of saying eight, I say four plus four equals eight, is that correct or not, right? How do you check correctness? I mean, you start to interleap language, explanations. So I think by trying to solve these problems, not in a strictly, hey, the reward is sharp

Starting point is 00:37:19 and crisp. Did you win or did you not? but you start also saying, hey, the reward model just needs to understand what's correct or roughly correct and look at what you wrote and assess, then you might start generalizing and going away from what would be this niche world, which is reward is perfectly defined to, hey, like, what's truth, what's not true? I mean, that's quite complicated, right? So I think in that sense, depending how you attack the problem, it's definitely,

Starting point is 00:37:51 not at that end. And I think even because it's so hard to have a perfect reward when you interleaf language, I think even by accident, the field will move forward as we get to discover this more general reward functions, train them better, and they will themselves and push the models and hopefully some sort of self-improvement loop will appear more than it has so far. I think you already said you feel AI's within grasp in the next few years. Like what is your most contrarian take on this or contrarian under-discussed take on this or AI in general? I sort of didn't like the term AGI too much, which is funny because Shane is a co-founder had a lot of, I mean, incredible foresight, right? Shane, I just had a discussion with him about

Starting point is 00:38:41 AGI timelines recently. And I mean, in 2009, he predicted it's 2028. And that seems, again, depending on how you take the definition and how strict and what's the test. That was quite, you know, a long time ago that he claimed that. And I think that still metacalculus, et cetera, probably suggests the world agrees with that prediction, maybe a bit still pessimistic, rather optimistic his prediction versus the world estimate is at 2030 or something. The contrarian view would be, I'm not sure it matters that we achieved AI. I think it might not look like, hey, like it's going to be exactly like the cognitive task we can do and then we reach parity.

Starting point is 00:39:30 It's going to be a distribution of things that these models can do or can't do. And that to me feels like what's still worth pursuing, right? And I mean, you see plenty of examples of, as I was saying earlier, the model will crack this impossible puzzle in math. and then it will just contradict itself trivially somehow, right? So I think we need to be ready to not be too fixated on AGI, still fix the most, obviously, egregious errors. That's very important because I think they are something profound that is wrong in the models.

Starting point is 00:40:06 But I think it's not the exact goal probably. It's just, I think more about distributionally rather than the one point that before it wasn't and now we have it. And honestly, it's going to be also impossible to get agreement. So it's going to be quite a cloud of moments that people might feel it has happened now. But it doesn't matter because the models might be used for amazing things and products and research itself, bootstrapping research and science is one of the things. Of course, we're excited about.

Starting point is 00:40:38 Google DeepMind's mission has science very much present in the mission statement. That's what sort of motivates at least myself. So I don't really care maybe to build ATI in the strict definition. But I understand it. It's a good goal. And I think I appreciate to have a single number to aim for. But I think it's going to be hard to agree with everyone, as usual. Maybe one last one for you would just be, do you, it sounds, I'm guessing no, because it sounds like the mission continues quite a bit beyond that.

Starting point is 00:41:10 But do you live your life any differently, believing in 2028? I think I was reflecting on like several like I guess smartphones right so so yes with kids like how do you how do they do you present with the option of smartphones and I mean we don't have that many samples or data points but then I mean obviously like that's obsolete right there's there's these technologies I have my kids are young young so I don't get to the kind of you can try like this jamming I chat GPT, whatnot. But I think that is worrying in the more human being at that sense. I think you adapt as well sort of how you do with like scaling up, right? You know, scaling up is very important. As you progress in your career, you go from individual contributor, writing codes to helping others sort of figure out what their, you know, their path is. So I think the scaling up thanks to this technology is pretty, there's a huge opportunity. So I've been, of course, trying to use these to, you know, figure out what has happened in the endless chat rooms

Starting point is 00:42:23 that I'm in. I'm in London. So when I wake up, I mean, California has, like, given me, like, a lot of tokens and long context to process. So I think there's, of course, personally, you also try to figure out how to best use the technology to scale yourself. And I talk to quite a few people that are not that much into technology. And all I say is, look, try to figure out how you could collaborate or use it as a tool because I think that change is definitely coming, no matter if HGI or not. And I might self-applied. But then, yeah, from an intelligence creation standpoint, which is having kids,

Starting point is 00:43:01 that is much more complicated. Sadly, it's a bit zero-shot learning. So we'll see. I'll tell you next podcast, probably, how that's going. Can I ask you one more question? If you were to give advice to somebody with kids today, what do you think their children should study? They go to college in 10 years. What should they major in? What should they be doing to prepare for the future world? Honestly, from a studying perspective, I feel like there's always your passion element that cannot be sort of, I mean, in the early days, right, you did deep learning not because it was like the thing to do. I mean, it was just what many of us like to do. And I feel, personally, I cannot give advice that says, you know, find the top professions and choose based on that. So I'll modulate my answer saying, definitely, find which aspect, right, is your true

Starting point is 00:43:52 passion or, you know, you admire someone, role models, etc. Now, what I will say, and I've been saying this for actually quite a few years, is that project how that thing changes with AI, exploit that. Of course, not everyone will understand basics of math or technology, but to be honest, as I was saying, language is the driver. So it's not that hard to kind of understand how this works. I was talking to my sister, who is a teacher, and I mean, I just, look, she just used it to create like a summary of a kid's homework, uploading all the homework and, okay, you know, that way of thinking, that needs to basically go into quite a few professions. So I guess first order to be it, find your passion, follow that, and then second, really embrace

Starting point is 00:44:42 some of the tools that are present today. Of course, if you're into technology, computer science, then another very fruitful direction is to find the corners of the space that AI hasn't gotten into. Maybe you will train a specialized models because, again, it might be worth doing. There's still quite plenty of opportunities. I was just chatting to someone about climate management. modeling, weather modeling is kind of cracked with deep learning, but climate is quite different. We only have one sample, which is one planet. So it's pretty tricky. But then, yeah, think about

Starting point is 00:45:15 that as a kind of what areas could be enhanced. And of course, otherwise, if you like the technology, I think there's quite a lot of LLM research and related to be done for, I mean, five, ten years at least. So that's probably still a worthwhile. way to investigate. And if you go into research, there's definitely what else to do. Well, thanks for doing this, Oriol. It was a great conversation. Thanks for joining. Yeah, likewise. Great questions. And hopefully next time I'm in debate, we can we can do one where we're in 3D. You know,

Starting point is 00:46:06 Thank you.

No Priors: Artificial Intelligence | Technology | Startups - Google DeepMind's Vision for AI, Search and Gemini with Oriol Vinyals from Google DeepMind

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.