No Priors: Artificial Intelligence | Technology | Startups - What is Digital Life? with OpenAI Co-Founder & Chief Scientist Ilya Sutskever

Starting point is 00:00:00 Open AI, a company that we all know now, but only a year ago was 100 people, is changing the world. Their research is leading the charge to AGI. Since ChatGPT captured consumer attention last November, they show no signs of slowing down. This week, Eladenai sit down with Ilius Sutskiver, co-founder and chief scientist at OpenAI, to discuss the state of AI research where we'll hit limit. the future of AGI and what's going to take to reach super alignment. Delia, welcome to no priors. Thank you.

Starting point is 00:00:36 It's good to be here. Let's start at the beginning. Pre-AlexNet, nothing in deep learning was really working. And then given that environment, you guys took a very unique bet. What motivated you to go in this direction? Indeed, in those dark ages, AI was not an area where people had hope and people were not accustomed to any kind of success at all. And because there hasn't been any success, there was a lot of debate

Starting point is 00:01:05 and there were different schools of thoughts that had different arguments about how machine learning and AI should be. And you had people who were into knowledge representation from a good old-fashioned AI. You had people who were Bayesians and they liked Bayesian non-parametric methods. You had people who like graphical models and you had the people who like neural networks. Those people were marginalized because neural networks did not have the property that you can't prove math theorems about them.

Starting point is 00:01:36 If you can't prove theorems about something, it means that your research isn't good. That's how it has been. But the reason why I gravitated to neural networks from the beginning is because it felt like those are small little brains. And who cares if you can't prove any theorems about them? Because we are training small little brains and maybe they'll become, maybe they'll do something one day. and the reason that we were able to do Alex NetWNBD is because a combination of two factors,

Starting point is 00:02:02 three factors. The first factor is that this was shortly after GPUs started to be used in machine learning. People kind of had an intuition that that's a good thing to do but it wasn't like today where people exactly knew what they need GPUs for. It was like, oh, let's play with those cool, fast computers and see what you can do with them.

Starting point is 00:02:21 It was an especially good fit for neural networks so that definitely helped us. I was very fortunate in that I was able to realize that the reason neural networks of the time weren't good is because they were too small. So like if you try to solve a vision task with a neural network which has like a thousand neurons, what can it do?

Starting point is 00:02:44 It can't do anything. It doesn't matter how good your learning is and everything else. But if you have a much larger neural network, it will do something unprecedented. Well, what gave you the intuition to think that that was the case? Because I think at the time it was reasonably contrarying to think that, despite to your point, you know, a lot of the human brain in some sense works that way or different, you know, biological neural circuits.

Starting point is 00:03:05 But I'm just curious, like, what gave you that intuition early on to think that this was a good direction? I think, yeah, looking at the brain and specifically, if you, like, all those things follow very easily, if you allow yourself, if you allow yourself to accept the idea, right now this idea is reasonably well accepted, back then, people still talked about it, but they haven't really accepted it or internalized. The idea that maybe an artificial neuron in some sense is not that different from a biological neuron. So now whatever you imagine animals do with their brains, you could perhaps assemble some artificial neural network of similar size. Maybe if you train it, it will do something similar.

Starting point is 00:03:51 So there, so that leads you to start to imagine. Okay, like you almost imagine the computation being done by the neural network. You can almost think, like, if you have a high-resolution image and you have like one neuron for like a large group of pixels, what can the neuron do? It's just not much it can do if you, but if you have a lot of neurons, then they can actually do something and compute something. So I think it was like our, like it was, it was considerations like this, plus a technical realization.

Starting point is 00:04:20 The technical realization is that if you have a large training set that specifies the behavior of the neural network, and the training set is large enough such that it can constrain the large neural network sufficiently, and furthermore, if you have the algorithm to find that neural network, because what we do is that we do is that we, turned the training set into a neural network which satisfies a training set. Neural network training can almost be seen as solving a neural equation. Solving a neural equation where every data point is an equation and every parameter is a variable. And so it was multiple things. The realization that the bigger neural network could do something

Starting point is 00:05:11 unprecedented, the realization that if you have a large data set together with the compute to solve the neural equation, that's what gradient descent comes in. But it's not gradient descent. Gradient descent was around for a long time. It was certain technical insights about how to make it work. Because back then the prevailing belief was, well, you can't train those neural nets anything. It's all hopeless. So it wasn't just about the size. It was about even if someone did think, gosh, it would be cool to train a big neural net. They didn't have. the technical ability to turn this idea into reality. You need it not only to code the neural net.

Starting point is 00:05:48 You need to do a bunch of things right, and only then it will work. And then another fortunate thing is that the person whom I work with, Alex Khrushchevsky, he just discovered that he really loves GPUs, and he was perhaps one of the first person who really mastered writing really, like really perform it code for the GPUs. And that's why we were able to squeeze a lot of. performance out of two GPUs and do something and produce something unprecedented. So to sum up, it was multiple things. The idea that a big neural network, in this case, a vision neural network, a convolutional

Starting point is 00:06:24 network with many layers, one that's much, much bigger than anything that's ever been done before, could do something very unprecedented because the brain can see and the brain is a large neural network. And we can see quickly, so our neurons not have a lot of time. Then the compute needed, the technical know-how. that in fact we can't train such neural networks. And it was not at all widely distributed. Most people in machine learning would not have been able

Starting point is 00:06:49 to train such a neural network even if they wanted to. Did you guys have any particular goal from a size perspective? Or was it just as you know, and if that's biologically inspired or where that number comes from or just as large as we can go? Definitely as large

Starting point is 00:07:05 as we can go. Because keep in mind, I mean we had a certain amount of compute which we could usefully consume and then what can it do? Maybe if we think about just like the origin of Open AI and the goals of the organization, what was the original goal and how's that evolved over time? The goal did not evolve over time.

Starting point is 00:07:27 The tactic evolved over time. So the goal of Open AI from the very beginning has been to make sure that artificial general intelligence by which we mean autonomous systems AI that can actually do most of the jobs and activities and tasks that people do

Starting point is 00:07:48 benefits all of humanity that was the goal from the beginning the initial thinking has been that maybe the best way to do it is by just open sourcing a lot of technology we later and we also attempted to do it as a non-profit seemed very sensible this is the goal. This is the goal.

Starting point is 00:08:08 nonprofit is the way to do it what changed at some point at open AI we realized and we were perhaps among the earliest to realize that to make progress in AI for real you need a lot of compute

Starting point is 00:08:24 now what does a lot mean the appetite for compute is truly endless as now as now clearly seen but we realize that we will need a lot and a non-profit it wouldn't be the way to get there, wouldn't be able to build a large cluster with a non-profit.

Starting point is 00:08:43 That's where we became, we converted into this unusual structure called cap profit. And to my knowledge, we are the only cap profit company in the world. But the idea is that investors put in some money, but even if the company does incredibly well, they don't get more than some multiplier on top of their original investment. And the reason to do this,

Starting point is 00:09:06 The reason why that makes sense, you know, there are arguments, one could make arguments against it as well, but the argument for it is that if you believe that the technology that we are building, AGI, could potentially be so capable as to do every single task that people do, does it mean that it might unemploy everyone? Well, I don't know, but it's not impossible. And if that's the case, it makes sense. It will make a lot of sense if the company that built such a technology would not be able to make infinite, would not be incentivized rather to make infinite profits. I don't know if it will literally play out this way because of competition in AI. So there will be multiple companies and I think that will have some unforeseen implications on the argument which I'm making. But that was the thinking. I remember visiting the offices back when you were, I think, housed at YC or something or, you know, cohabited some space there.

Starting point is 00:10:05 And at the time, there was a suite of different efforts. There was robotic arms that were being manipulated. And then there was, you know, some video game related work, which was really cutting edge. How did you think about how the research agenda evolved and what really drove it down this path of transformer-based models and other forms of learning? So our thinking has been evolving over the years from when we started opening eye. And the first year we indeed did some of the more conventional machine learning work. By conventional machine learning work, I mean, because the world has changed so much, a lot of things which were known to everyone in 2016 or 2017 are completely and utterly forgotten.

Starting point is 00:10:46 It's like the Stone Age almost. So in that Stone Age, the world of machine learning looked very different. it was dramatically more academic. The goals, values, and objectives were much more academic. They were about discovering small bits of knowledge and sharing them with the other researchers and getting scientific recognition as a result. And it's a very valid goal and it's very understandable.

Starting point is 00:11:12 I've been doing AI for 20 years now. More than half of my time that I spent in AI was in that framework. And so what do you do? You write papers. You share your small discoveries to realization. The first realization is just at a high level, it doesn't seem like it's the way to go for a dramatic impact. And why is that?

Starting point is 00:11:31 Because if you imagine how an AGI should look like, it has to be some kind of a big engineering project that's using a lot of compute, right? Even if you don't know how to build it, what that should look like. You know that this is the ideal you want to strive towards. So you want to somehow move towards larger projects as opposed to small projects.

Starting point is 00:11:51 So while we attempted a first large project, where we trained a neural network to play a real-time strategy game as well as the best humans, it's the Dota 2 project, and it was driven by two people. Jakob Pahotsky and Greg Brockman, they really drove this project and made it a success. And this was our first attempt at a large project. But it wasn't quite the right formula for us, because the neural networks were. a little bit too small, it was just a narrow domain, just a game. I mean, it's cool to play a game. And they kept looking. And at some point we realized that, hey, if you train a large neural network, a very, very large transformer to predict text better and better, something very surprising will happen. This realization also arrived a little bit gradually. We were exploring generative

Starting point is 00:12:43 models. We were exploring ideas around next word prediction. Those are ideas also related to compression. We were exploring them. The transformer came out. We got really excited. We were like, this is the greatest thing. We're going to do transformers now. It's clearly superior than anything else before it. We started doing transformers.

Starting point is 00:13:01 We did GPD1. GPD1 started to show very interesting signs of life. And that led us to doing GPD2. And then ultimately, GPT3. GPT3 really opened everyone else's eyes as well to, hey, this thing has a lot of traction. There is one specific formula right now that everyone is doing. and this formula is train a larger and larger transformer on more and more data. I mean, for me, the big wake-up moment, to your point was GPT2 to GPT3 transition,

Starting point is 00:13:28 where you saw such a big step function and capabilities. And then obviously with four Open AIs published some really interesting research around some of the different domains of knowledge or domains of expertise or chain of thought or other things that the models can suddenly do in an emergent form. What was the most surprising thing for you in terms of emergent behavior in these models over time. You know, it's very hard to answer that question. It's very hard to answer because I'm too close and I've seen it progress every step of the way. So as much as I'd like, I find it very hard to answer that question. I think if I had to pick one, I think maybe the most

Starting point is 00:14:07 surprising thing for me is the whole thing works at all, you know? It's hard. I'm not sure I know how to convey this, what I have in mind here, because if you see a lot of neural networks do amazing things, well, obviously neural networks is the thing that works. But I have witnessed personally what it's like to be in a world for many years where the neural networks not work at all. And then to contrast that to where we are today, just the fact that they work and they do these amazing things, I think maybe the most surprising, the most surprising, if I had to pick one, it would be the fact that when I speak to it, I feel understood. Yeah, there's a really good saying from, I'm trying to remember maybe it's Arthur Clark or one of the sci-fi authors,

Starting point is 00:14:54 which is effectively it says advanced technology is sometimes indistinguishable for magic. Yeah, I'm fully in this camp. Yeah, yeah, it definitely feels like there's some magical moments with some of these models now. Is there a way that you guys decide internally, given all of the different capabilities you could pursue how to continually choose the set of big projects. You've sort of described that centralization and committing to certain research directions at scale is really important to open AIs success. Given the breadth of opportunity now, what's the process for deciding what's worth working on? I mean, I think there is some combination of bottom up and top down where we have some top down ideas that we believe should work, but we're not 100%

Starting point is 00:15:42 ensure. So we still, we need to have good top-down ideas. And there is a lot of bottom-up exploration that's guided by those top-down ideas as well. And their combination is what informs us as to what to do next. And if you think about those bottom, I mean, either direction, top-down or bottom-up ideas, like clearly we have this dominant continue to scale transformers' direction. Do you explore additional, like, architectural directions, or is that just not relevant. It's certainly possible that various improvements can be found. I think improvements can be found in all kinds of places, both small improvements and large improvements. I think the way to think about it is that while the current thing that's being done keeps getting better

Starting point is 00:16:29 as you keep on increasing the amount of compute and data that you put into it. So we have that property. The bigger you make it, the better it gets. It is also the property that different things get better by different amount as you keep on improving, as you keep on scaling them up. So not only you want to, of course, scale up what we are doing, we also want to keep scaling up the best thing possible. What is a, I mean, you probably don't need to predict because you can see internally. What do you think is improving most from a capability perspective in the current generation

Starting point is 00:17:05 of scale? The best way for me to answer this question would be. to point out the, to point to the models that are publicly available. And you can see how they compare from this year to last year. And the difference is quite significant. I'm not talking about the difference between, not only the difference between, let's say you can look at a difference between GP3 and GPT3.5 and then chat GPT, chat GPT4 with vision. And you can just see for yourself.

Starting point is 00:17:37 It's easy to forget where things used to be. But certainly the big way in which things are changing is that these models become more and more reliable. Before, they were only very partly there. Right now, they are mostly there, but there are still gaps. And in the future, perhaps, these models will be there even more. You could trust their answers. They'll be more reliable. They'll be able to do more tasks in general across the board.

Starting point is 00:18:05 And then another thing that they will do is that they'll have deeper insight. as we train them, they gain more and more insight into the true nature of the human world. And their insight will continue to deepen. I was just going to ask about how that relates to sort of model scale over time because a lot of people are really stricken by the capabilities of the very large-scale models and the emergent behavior in terms of understanding of the world. and then in parallel, as people incorporate some of these things into products, which is a very different type of path, they often start worrying about inference costs going up with the scale of the model, and therefore they're looking for smaller models that are fine-tuned. But then, of course, you may lose some of the capabilities around some of the insights and

Starting point is 00:18:47 ability to reason. And so I was curious in your thinking in terms of how all this evolves over the coming years. I would actually point out that the main thing that's lost when you switch to the smaller models is reliability. I would argue that at this point, it is reliability that's the biggest bottleneck to these models being truly useful. How are you defining reliability? So it's like when you ask it a question that's not much harder than other questions that the model succeeds at, then you'll have a very high degree of confidence that it will continue to succeed.

Starting point is 00:19:21 So I'll give you an example. Let's suppose that I want to learn about some historical thing. And I can ask, well, tell me what is the, prevailing opinion about this and about that, and I can keep asking questions. And let's suppose it answered 20 of my questions correctly. I really don't want the 21st question to have a gross mistake. That's what I mean by reliability. Or like, let's suppose I upload some documents, some financial documents.

Starting point is 00:19:46 Suppose they say something, I want you to do some analysis and to make some conclusion, and I want to take action on this basis on this conclusion. And it's like, it's not a super hard task. And the model, these models clearly succeed on this task, most of the time. But because they don't succeed all the time, and if it's a consequential decision, I actually can't trust the model any of those times, and I have to verify the answer somehow. So that's how I define reliability. It's very similar to the self-driving situation, right? If you have a self-driving car and it's like, does things

Starting point is 00:20:15 mostly well, that's not good enough. Situation is not as extreme as with a cell driving car, but that's what I mean by reliability. My perception of reliability is that to your point, it goes up with model scale, but also it goes up and if you fine-tune for specific use cases or instances or data sets. And so there is that trade-off in terms of size versus specialized fine-tuning

Starting point is 00:20:37 versus reliability. So certainly people who care about some specific application have every incentive to get the smallest model working well enough. I think that's true. It's undeniable. I think anyone who cares about

Starting point is 00:20:53 a specific application will want the smallest model it. That's self-evident. I do think, though, that as models continue to get larger and better, then they will unlock new and unprecedentedly valuable applications. So, yeah, the small models will have their niche for the less interesting applications, which are still very useful, and then the bigger models will be delivering on applications. Okay, let's pick an example. Consider the task of producing good legal advice. It's really valuable if you can really trust the answer. Maybe we need a much bigger model for it, but it justifies the cost. There's been a lot of investment this year at the 7B in particular, but 7B 13B, 34B sizes.

Starting point is 00:21:36 Do you think continued research at those scales is wasted? No, of course not. I mean, I think that in the kind of medium term, medium term by high timescale anyway, there will be an ecosystem, there will be different uses for different model sizes. There will be plenty of people who are very excited, for whom the best 7B model is good enough, they'll be very happy with it. And then there will be plenty of very, very exciting and amazing applications for which it won't be enough. I think that's all.

Starting point is 00:22:14 I mean, I think the big models will be better than the small models, but not all applications will justify the. cost of a large model. What do you think the role of open sources in this ecosystem? Well, open source is complicated. I'll describe to you my mental picture. I think that in the near term,

Starting point is 00:22:34 open source is just helping companies produce useful. Like, let's see. Why would one want to have an open source, to use an open source model instead of a closed source model that's hosted by some other company? I mean, I think it's very valid to want to be the final decider on the exact way in which you want your model to be used.

Starting point is 00:22:59 For you to make the decision of exactly how you want the model to be used and which use case you wish to support. And I think there's going to be a lot of demand for open source models. And I think there will be quite a few companies that will use them. And I'd imagine that will be the case in the near term. I would say in the long run, I think the situation with open source models will become more complicated and I'm not sure what the right answer is there. Right now, it's a little bit difficult to imagine, so we need to put our future hat, maybe futurist hat. It's not too hard to get

Starting point is 00:23:31 into a sci-fi mode when you remember that we are talking to computers and they understand us. But so far, these computers, these models actually not very competent. They can't do tasks at all. I do think that there will come a day where the level of capability of models, will be very high. Like, in the end of the day, intelligence is power. Right now, these models, their main impact,

Starting point is 00:23:57 I would say, at least popular impact, it's been primarily around entertainment and, like, simple question answer. So you talk to a model, wow,

Starting point is 00:24:04 this is so cool, you produce some images, you had a conversation, maybe you had some questions, good answered. But it's very different from completing some large and complicated task like,

Starting point is 00:24:16 what about if you had a model which could autonomously start and build a large tech company. I think if these models were open source, they would have it difficult to predict consequence. Like, we are quite far from these models right now, and by quite far, I mean, by eye-time scale, but still, like, this is not what you're talking about,

Starting point is 00:24:36 but the day will come when you have models which can do science autonomously, like they'll deliver on big science projects. And it becomes more complicated as to whether it is desirable that models of such power should be open sourced. I think the argument there is a lot less clear cut, a lot less straightforward compared to the current level models, which are very useful. And I think it's fantastic that the current level models are being built.

Starting point is 00:25:05 So like that is maybe I answered a slightly bigger question rather than what is the role of open source models. What's the deal with the open source? And the deal is up to a certain capability. It's great, but not difficult to imagine. imagine models sufficiently powerful, which will be built, where it becomes a lot less obvious as to the benefits of their open source. Is there a signal for you that we've reached that level or that we're approaching it?

Starting point is 00:25:32 Like, what's the boundary? So I think figuring out this boundary very well is an urgent research project. I think one of the things that help is that the close source models are, more capable than the open source models. So the closed source models could be studied and so on. And so you'd have some experience with a generation of close source model. And then then you know, like, oh, these models capabilities, it's fine. There's no big deal there.

Starting point is 00:26:01 Then in like a couple of years, the open source models catch up. And maybe a day will come and we're going to say, well, like these close source models, they're getting a little too drastic and then some other approaches needed. If we have our, you know, future hat on, maybe let's like think about like, several year timeline. What are the limits you see, if any, in the near term in scaling? Is it like data, token scarcity, cost of compute, architectural issues? So the most near term limit to scaling is obviously data. This is well known. And some research is required to address it. Without going into the details, I'll just say that the data limit can be

Starting point is 00:26:46 overcome and progress will continue. One question I've heard people debate a little bit is a degree to which the transformer based models can be applied to sort of the full set of areas that you'd need for AGI. And if you look at the human brain, for example, you do have reasonably specialized systems or all neural networks via specialized systems for the visual cortex versus, you know, areas of higher thought, areas for empathy or other sort of aspects of everything from personality to processing, do you think that the transformer architectures are the main thing that we'll just keep going and get us there? Do you think we'll need other architectures over time?

Starting point is 00:27:23 So I have to, I understand precisely what you're saying, and I have two answers to this question. The first is that, in my opinion, the best way to think about the question of architecture is not in terms of a binary, is it enough? But how much effort, what will be the cost of using this particular architecture like at this point I don't think anyone doubts that the transformer architecture can do amazing things but maybe something else

Starting point is 00:27:54 maybe some modification could have some computer efficiency benefits so it's better to think about it in terms of computer efficiency rather than in terms of can it get there at all I think at this point the answer is obviously yes to the question about

Starting point is 00:28:10 well what about the human brain with its brain regions I actually think that the situation there is subtle and deceptive for the following reason. So what I believe you alluded to is the fact that the human brain has known regions. It has a speech perception region. It has a speech production region. It has an image region. It has a face region.

Starting point is 00:28:32 It has all these regions. It looks like it's specialized. But you know what's interesting? Sometimes there are cases where very young children have severe cases of epilepsy at a young age. And the only way they figured out how to treat such children is by removing half of their brain. Because it happens in such a young age, these children grow up to be pretty functional adults. And they have all the same brain regions, but they are somehow compressed onto one hemisphere. So maybe some information processing efficiency is lost. It's a very traumatic thing to

Starting point is 00:29:11 experience, but somehow all these brain regions rearrange themselves. There is another experiment where, which was done maybe 30 or 40 years ago, on ferrets. So the ferret is a small animal. It's a pretty mean experiment. They took the optic nerve of the ferret, which comes from its eye, and attached it to its auditory cortex. So now the inputs from the eye starts to map to the speech processing area of the brain. And then they recorded different neurons after it had a few days of learning to see. And they found neurons in the auditory cortex, which were very similar to the visual cortex or vice versa. It was either they mapped the eye to the ear to the auditory cortex or the ear to the visual cortex. But something like this has happened. These are

Starting point is 00:29:54 fairly well-known ideas in AI that the cortex of humans and animals are extremely uniform. And so that further supports the air. Like you just need one in a big uniform architecture. It's all you need. Yeah. In general, it seems like every biological system is reasonably lazy in terms of taking one system and then reproducing it and then reusing it in different ways. And that's true of everything from DNA encoding, you know, there's 20 amino acids and protein sequences. And so everything is made out of the same 20 amino acids on through to, uh, to your point, sort of how you think about tissue architectures. So it's remarkable that that carries over into the digital world as well, depending on the architecture you use. I mean, the way I see it is that this is an indication

Starting point is 00:30:31 from a technological point of view, we are very much on the right track. Because you have all these interesting analogies between human intelligence and biological intelligence and artificial intelligence. We've got artificial neurons, biological neurons, unified brain architecture for biological intelligence, unified neural network architecture for artificial intelligence. At what point do you think we should start thinking about these systems of digital life? I can answer that question. I think that will happen when those systems become reliable in such a way as to be very autonomous. Right now those systems

Starting point is 00:31:05 are clearly not autonomous. They're inching there, but they're not. And that makes them a lot less useful too, because you can't ask it, hey, like, do my homework or do my taxes or you see what I mean. So the usefulness is greatly limited. As the usefulness increases, they will indeed become more like artificial life,

Starting point is 00:31:23 which also makes it more, I would argue, trepidious, right? Like if you imagine actual artificial life, these brains that are smarter than humans go, gosh, that's like, that seems pretty monumental. Why is your definition based on autonomy? Because, you know, if you often look at the definition of biological life, it has to do with reproductive capability, plus I guess some form of autonomy, right? Like a virus isn't really necessarily considered alive much of the time, right? But a bacteria is. And you could imagine situations where you have symbiotic relationships

Starting point is 00:31:57 or other things where something can't really quite function autonomously, but it's still considered to life form. So I'm a little bit curious about autonomy being the definition versus some of these other aspects. Well, I mean, definitions are chosen for our convenience and it's a matter of debate. In my opinion, technology already has the reproduction, the reproductive function, right? And if you look at, for example, I don't know if you've seen those images of the evolution of cell phones and then smartphones over the past 25 years, you got this like what almost looks like an evolutionary tree or the evolution of cars over the past century. So technology is already reproducing using the minds of people who copy ideas from previous generation of technology.

Starting point is 00:32:34 So I claim that the reproduction is already there. The autonomy piece, I claim, is not. And indeed, I also agree that there is no autonomous reproduction. But that would be, like, can you imagine if you have like autonomously reproducing AIs? I actually think that that is a pretty traumatic and I would say quite a scary thing if you have an autonomously reproducing AI, if it's also very capable. Should we talk about super alignment? Yeah, very much so.

Starting point is 00:33:02 Can you just sort of define it? And then we were talking about what the boundary is for when you feel we need to begin to worry about these capabilities being in open source. Like what is super alignment and like why invest in it now? The answer to your question really depends to where you think AI is headed. If you just try to imagine and look into the future, which is, of course, a very difficult thing to do, but let's try to do it anyway. Where do we think things will be in five years or in ten years? Progress has been really stunning over the past few years. Maybe it will be a little bit slower.

Starting point is 00:33:45 But still, if you extrapolate this kind of progress, it will be in a very, very different place in five years, let alone ten years. It doesn't seem implausible. It doesn't seem at all implausible that we will have computers, data centers, that are much smarter than people. And by smarter, I don't mean just have more memory or have more knowledge, but I also mean have deeper insight into the same subjects that we people are studying and looking into. It means learn even faster than people. Like, what could such AIs do? I don't know. certainly if such an AI were the basis of some artificial life, it would be, well, how do you even think about it?

Starting point is 00:34:31 If you have some very powerful data center that's also alive, in a sense, that's what you're talking about. And when I imagine this world, my reaction is, gosh, this is very unpredictable what's going to happen, very unpredictable. But the bare minimum, but there is a bare minimum which we can articulate that if such super, if such very, very, intelligent, super intelligent, data centers are being built at all. We want those data centers to hold warm and positive feelings towards people, towards humanity. Because this is going to be non-human life, in a sense. Potentially, it could potentially be that. So I would want that any instance of such superintelligence to have warm feelings towards humanity. And so this is what we are doing with the super alignment project, you're saying, hey, if you just allow yourself,

Starting point is 00:35:28 if you just accept that the progress that we've seen, maybe it will be slower, but it will continue. If you allow yourself that, then can you start doing productive work today to build the science so that we will be able to handle the problem of controlling such future super, super, intelligence, of imprinting onto them a strong desire to be nice and kind to people. Because those data centers, right, they'll be really quite powerful. You know, there'll probably be many of them. They will be very complicated.

Starting point is 00:36:09 But somehow, to the extent that they are autonomous, to the extent that they are agents, to the extent of their beings, I want them to be to be pro-social, pro-human social. That's the goal. What do you think is the likelihood of that goal? I mean, some of it, it feels like a outcome you can hopefully affect, right? But are we likely to have pro-social AIs that we are friends with individually or, you know, as a species? Well, I mean, friends be, I think that that part is not necessary. The friendship piece, I think, is optional.

Starting point is 00:36:49 but I do think that we want to have very prosocial AI. I think it's possible. I don't think it's guaranteed, but I think it's possible. I think it's going to be possible and the possibility of that will increase insofar as more and more people allow themselves to look into the future, into the five to 10 year future. And just ask yourself, what do you expect AI to be able to do then?

Starting point is 00:37:15 How capable do you expect it to be then? And I think that with each passing year, if indeed AI continues to improve and as people get to experience, because right now we are talking, making arguments, but if you actually get to experience, oh gosh, the AI from last year, which was really helpful, this year puts the previous one to shame. And you go, okay. And then one year later, and one year starting to do science, the AI software engineer is starting to get really quite good let's say I think that you'll create a lot more desire in people for what you just described for the future superintelligence to need be very pro-social you know I think there's going to be a lot

Starting point is 00:38:03 of disagreement it's going to be a lot of political questions but I think that as people see AI actually getting better as people experience it the desire for the pro-social superintelligence the humanity loving super intelligence, you know, as much as it can be done will increase. And on the scientific problem, you know, I think right now it's still being an area where not that many people were working on. Our AIs are getting powerful enough

Starting point is 00:38:32 where you can really start studying it productively. We'll have some very exciting research to share soon. But I would say that's the big picture situation here. Just really, it really boils down to look at what you've experienced with up until now, ask yourself, like, is it slowing down? We will slow down next year. We will see and we'll experience it again and again. And I think it will keep, and what needs to be done, it will keep becoming clearer. Do you think we're just on an accelerative path? Because I think

Starting point is 00:39:03 fundamentally, if you look at certain technology waves, they tend to inflect and then accelerate versus decelerate. And so it really feels like we're in an acceleration phase right now versus the deceleration phase. Yeah. I mean, we are, right now, it is indeed the case that we are in an acceleration phase. You know, it's hard to say,

Starting point is 00:39:23 you know, multiple forces will come into play. Some forces are accelerating forces and some forces are decelerating. So, for example, the cost and scale are a decelerating force. The fact that our data is finite

Starting point is 00:39:37 is a deserating force, to some degree, at least. I don't want to overstate. Yeah, it's kind of a, Within an asymptote, right, like at some point you hit it, but it's the standard S curve, right, or sigmoidal. Well, with the data in particular, I just think it won't be, it just won't be an issue because we'll figure out something else. But then you might argue, like the size of the engineering project is an accelerating force, just the complexity of management.

Starting point is 00:40:01 On the other hand, the amount of investment is an accelerating force. The amount of interest from people, from engineers, scientists is an accelerating force. And I think there is one other accelerating force. and that is the fact that biological evolution has been able to figure it out and the fact that up until now progress in AI has had up until this point this weird property that it's kind of been you know it's been very hard to execute on but in some sense it's also been more straightforward than one would have expected perhaps like in some sense I don't know much physics but my understanding

Starting point is 00:40:40 understanding is that if you want to make progress in quantum physics or something, you need to be really intelligent and spend many years in grad school studying how these things work. Whereas with AI, you have people come in, get up to speed quickly, start making contributions quickly. It has the flavor somehow different. Somehow it's very, there is some kind of, there's a lot of give to this particular area of research. I think this is also an accelerating force. How will it all play out remains to be seen. Like, it may be that somehow the scale required, the engineering complexity will start to make it so that the rate of progress will start to slow down. It will still continue, but maybe not as quick as we had before. Or maybe the forces which are coming together to push, it will be such that it will be as fast for maybe a few more years before it will start to slow down. If at all, that's, that would be my articulation here. Ilia, this has been a great conversation. Thanks for trying to answer. Thank you so much for the conversation. I really enjoyed it.

Starting point is 00:41:39 Find us on Twitter. at No Pryors pod. Subscribe to our YouTube channel if you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails

Starting point is 00:41:51 or find transcripts for every episode at no dash priors.com.

Your Ad Here

No Priors: Artificial Intelligence | Technology | Startups - What is Digital Life? with OpenAI Co-Founder & Chief Scientist Ilya Sutskever

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.