No Priors: Artificial Intelligence | Technology | Startups - The bot Cicero can collaborate, scheme and build trust with humans. What does this mean for the next frontier of AI? With Noam Brown, Research Scientist at Meta

Episode Date: February 2, 2023

AGI can beat top players in chess, poker, and, now, Diplomacy. In November 2022, a bot named Cicero demonstrated mastery in this game, which requires natural language negotiation and cooperation with ...humans. In short, Cicero can lie, scheme, build trust, pass as human, and ally with humans. So what does that mean for the future of AGI? This week’s guest is research scientist Noam Brown. He co-created Cicero on the Meta Fundamental AI Research Team, and is considered one of the smartest engineers and researchers working in AI today. Co-hosts Sarah Guo and Elad Gil talk to Noam about why all research should be high risk, high reward, the timeline until we have AGI agents negotiating with humans, why scaling isn’t the only path to breakthroughs in AI, and if the Turing Test is still relevant. Show Links: More about Noam Brown Read the research article about Cicero (diplomacy) published in Science.  Read the research article about Liberatus  (heads-up poker) published in Science.  Read the research article about Pluribus (multiplayer poker) published in Science.  Watch the AlphaGo Documentary. Read “How Smart Are the Robots Getting?” by New York Times reporter Cade Metz  Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Polynoamial Show Notes:  [01:43] - What sparked Noam’s interest in researching AI that could defeat games [6:00] - How the AlexaNET and AlphaGo changed the landscape of AI research [8:09] - Why Noam chose Diplomacy as the next game to work on after poker [9:51] - What Diplomacy is and why the game was so challenging for an AI bot [14:50] - Algorithmic breakthroughs and significance of AI bots that win in No-Limit Texas Hold'em poker [23:29] - The Nash Equilibrium and optimal play in poker [24:53] - How Cicero interacted with humans  [27:58] - The relevance and usefulness of the Turing Test [31:05] - The data set used to train Cicero [31:54] - Bottlenecks to AI researchers and challenges with scaling [40:10] - The next frontier in researching games for AI [42:55] - Domains that humans will still dominate and applications for AI bots in the real world [48:13] - Reasoning challenges with AI

Transcript
Discussion (0)
Starting point is 00:00:00 We were trying to think of what would be the hardest game to make an AI for. We landed on diplomacy. The idea that you could have an AI that negotiates in natural language with humans and strategizes with them, it really just felt like science fiction. I'm really glad that we aimed high at that point. I was a little afraid to do that, to be honest. It's a high risk thing to aim for, but all research is high risk, high reward. least it should be. We maybe get like two orders magnitude more scaling and then we have a big
Starting point is 00:00:34 problem. Do you train a hundred billion dollar model? This is why I'm interested in the reasoning direction. This is the no priors podcast. I'm Saragua. I'm Alad Gail. We invest in, advise and help start technology companies. In this podcast, we're talking with the leading founders and researchers in AI about the biggest questions. This week on the podcast, we're welcoming Noam Brown. Noam's a research scientist on the meta-fundamental AI research team. Noam co-create the first AIs to defeat top humans in two different types of poker. He also recently did an important project called Cicero.
Starting point is 00:01:15 In this podcast, we'll dig into how this AI works, what makes for great AI research and engineering, and how AI games tie into AGI. Noam is considered one of the smartest engineers and researchers in AI. His work has deep implications for how humanity and AI co-evolve. The new bot Cicero can lie, can scheme. It can read a human's intentions and build trust. Cicero demonstrates these skills by performing better than the average human at a classic game called diplomacy. Noam, welcome to No Pryors. Thank you for having me.
Starting point is 00:01:42 Thanks a lot for joining. So, you know, I think in the world today when a lot of people think about AI, they think about it is basically you put a couple words into a prompt and then you get out an image. Or you have ChatGPT summarize James Burnham's professional managerial class for you in a rhyming essay in the voice of a cat or something. And I think you've pushed in really interesting directions that are very different in some ways from what a lot of people have focused on. And you've been more focused on game theoretic actors interacting with humans and with each other. And in parallel, you're kind of known as Sarah mentioned is sort of one of these true 10x engineers and researchers pushing the boundaries on the AI. And so I'm sort of curious, like what first sparked your interest in games and researching AI to defeat games like poker and diplomacy? Well, I think, you know, my journey is a bit non-traditional.
Starting point is 00:02:24 I mean, I started out in finance, actually. So towards the end of my undergrad career and also like right after undergrad, I worked in algorithm trading for a couple of years. And I kind of realized that while it's, it's fun and it's, you know, exciting, it's kind of like a game. You know, you get a score at the end of the day, which is how much money you've made or lost. It's not really the most fulfilling thing that I want to do with my life. And so I decided that I wanted to do research.
Starting point is 00:02:46 And it wasn't really clear to me in what area. I was originally planning to do economics, actually. And so I went to the Federal Reserve. I worked there for two years. honestly, I wanted to figure out how to structure financial markets better to encourage more pro-social behavior. And so in the process, I became interested in game theory, and I thought I wanted to pursue a PhD, like in economics, focused on game theory. And two things happened. So first of all, I became a bit jaded with the pace of progress in economics, because if you come up with an idea, you have
Starting point is 00:03:15 to get it passed through legislation, and it's a very long process. And computer science is much more exciting in that way because you can just build something. You don't really need permission to do it. And then the other thing I figured out was that a lot of the most exciting work in game theory was actually happening in computer science. It wasn't happening in economics. And so I applied for grad schools with the intention of studying algorithmic game theory in a computer science department. And when I got to grad school, there was conveniently a professor that was looking for somebody to do research on AI for poker. And I thought this was like the perfect intersection of everything that I wanted to do. I was interested in game theory.
Starting point is 00:03:47 I was interested in making something, interested in AI. I had played poker when I was in high school and college and, no, never for high stakes, but always just kind of interested in the strategy of the game. I actually tried to make a poker bot when I was an undergrad, and it did terribly, but it was a lot of fun. And so to be able to do that, you know, for research in grad school, I thought this was like the perfect thing for me to work on. And also, I felt like there was an opportunity here because it felt doable. And I kind of recognize that if you succeed in making an AI that can play poker, you're going to learn
Starting point is 00:04:20 really valuable things along the way. And that could have like major implications for the future. That's really cool. And did you have a specific end goal of your work when you started it? Was there just interest? In other words, some, you know, you talk to a lot of people in the field and they say, oh, our end goal is the AGI and it's always been. And I think sometimes that's sort of invented later as sort of an interesting story for what they're doing. Did you view this as just doing primary research and it's just personal interest? Did you view it as like there's a path leading to agents that function on behalf of people or was there some other sort of driving motivator? Well, so I started grad school in 2012. And
Starting point is 00:04:50 And it was a very different time in 2012. The idea of AGI was really science fiction. There were some people that were serious about it, but very few. The majority opinion was that AI was, if anything, it was kind of a dead field. I actually remember, like, emailing a professor and having this conversation where I was like, look, I'm really interested in AI, but I'm kind of worried to pursue a PhD in this because I get the impression that it's just a dead field. And I'm not, I'm worried if I'll be able to get a job afterwards. So conveniently, like a couple of years into grad school, things change pretty drastically. and I happen to be in the right place at the right time, I think is really fortunate in that respect.
Starting point is 00:05:25 So the original intention wasn't to pursue AGI. The original intention was, you know, you learn interesting things about AI and game theory and you build slowly, and it was really only a couple years into grad school that it became clear that the pace of progress was quite dramatic. Was there a specific moment that really drove that home for you? I know for some people they mentioned, oh, AlexNet came out or, oh, you know, some of the early gang work felt like a wake-up call. I'm just sort of curious that there's a specific technology or paper or something else that came out.
Starting point is 00:05:54 Or was it just kind of a continuum? I think it was a slow drip. I mean, I think for me, especially, it was the AlphaGo moment. You know, like when you see that, it's just very clear. I mean, Alex Net, too. I mean, before I started grad school, actually, I took a computer vision class and they were talking about like, you know, SIFT and all this stuff. And then you get something like AlexNet and it just like throws all that out the window. And it's just like mind boggling how effective that could be.
Starting point is 00:06:17 NEM, can you explain actually like why AlphaGo is so important? and like just size of search space and how you might contrast that to previous games. Yeah, so a big milestone in AI was Deep Blue beating Gary Kasparov and chess in 1997. And that was a big deal. It's kind of downplayed today, I think, in like by a lot of machine learning researchers,
Starting point is 00:06:34 but we learned a lot from that. We learned that scale really does work. And in that case, it wasn't scaling, you know, training in neural nets, it was scaling search. But the techniques that were used in Deep Blue, they didn't work in a game like go because the pattern matching was just not there. A big challenge in Go was figuring out, like, how do you even evaluate the state of a board? How do you tell who's winning? In chess, it's like difficult, but you can kind of write a function, but you can handcraft the function to estimate that, right? You calculate, oh, each piece is worth this many points and you add it together and you can kind of get a sense of who's winning and who's losing. And in Go, that's just almost impossible to do by hand. It's essentially too big to do that. It's too big. It's too subtle. It's just too complicated. And there's too much nuance. And if you ask, you know, the difference is also if you ask the human, you know, who's winning, they could tell you who's winning, but they couldn't tell you why.
Starting point is 00:07:20 And so, you know, one of the things that people assumed is that, you know, humans are just better at pattern matching. And to have an AI come along and, like, demonstrate that it can do this pattern matching better than a human can. And even if it's in this constrained game, that was a big deal. And I think that was a wake-up call to a lot of people, not just me, but I think across the world. I remember as a former, like, go nerd, just trying to understand the moves that AlphaGo made to try to figure out how to play better because it was like such a mind-blowing moment. Yeah. And, you know, if any of your listeners hadn't seen the AlphaGo documentary, I highly recommend watching it. You can, I think it's on Netflix or YouTube. And you can see just how significant this was to a lot of the world when you watch that. How did you end up choosing diplomacy as the next thing to work on after poker? There's obviously like a wide space of a variety of different types of games. And so what drove your selection criteria there? And how did you think about choosing that as the next sort of interesting research problem? So basically what happened, we succeeded in poker, and when we were trying to pick the next direction, it became clear that AI was progressing very quickly, like much quicker than
Starting point is 00:08:27 I think a lot of people appreciated. And there were a lot of conversations about, like, what should the next benchmark be? A lot of people were throwing around these games, like Hanabi was won, somebody was talking about like Werewolf or Satoosukatan, these kinds of things, and I just felt like, you know, this was 2019, and in 2019, you had GPT2 come out, which. was just mind-blowing. And then you also had deep mind beating grandmasters in Starcraft 2. You had opening eye beating human experts in Dota 2. And that was just after like a couple years of work of research and engineering. And to then like go to a game like Sotomayana, it just felt
Starting point is 00:09:01 like too easy. Like you could just take a team of five people, spend a year on that and you'd have it cracked. And so we wanted to pick something that would be truly impressive, like that would require fundamentally new techniques in order to succeed, not just scaling up something that already exists. And we were trying to think of what would be the hardest game to make an AI for. And we landed on diplomacy. The idea that you could have an AI that negotiates in natural language with humans and strategizes with them, it really just felt like science fiction. And even in 2019, knowing all this success that was happening in AI, it still felt like science fiction. And so that's why we aimed for it. And I think that was the right call. I mean, I'm really glad that we
Starting point is 00:09:39 aimed high at that point. I was a little afraid to do that, to be honest. It's a high risk thing to aim for, but all research is high-risk, high-reward, or at least it should be. Do you want to give a quick minute overview of diplomacy so people can understand what it is and why the research was such a breakthrough? Yeah, diplomacy is this game. It was developed in the 50s. It was actually developed by this guy who saw what happened in World War I and kind of viewed this as a diplomatic failure. And so he wanted to create this game that would teach people how to be better diplomats, essentially. And so it takes place at the onset of World War I. There's seven powers that you can play
Starting point is 00:10:13 England, France, Germany, Italy, Russia, Turkey, and Austria, Hungary. And you engage in these, like, complex negotiations every turn. And your goal is to try to control as much of the map as possible. And the way you win is by controlling a majority of the map. It's kind of like hunger games, where even though only one person can win at the end of the day, there's still this, like, incentive to be able to work together, especially early on, because you can both benefit and have a better chance of winning in the end if you work together. And so you have these, like, really complex negotiations that happen.
Starting point is 00:10:42 and all the communication is done in private. So unlike a game like risk, for example, or Siddler's a Catan where like all the negotiation is done in front of everybody else, in diplomacy, you will actually like pull somebody aside, go into a corner, like scheme about who you're going to attack together this turn, who's going to support who. And then after you've negotiated with everybody, you write down what your moves are for the turn. And so then all the moves are read off at the same time.
Starting point is 00:11:07 And you can see if people like actually follow through on their promises about like helping you, or maybe they lied to you and they're just going to attack you this turn. So it has like some elements of risk, poker, and survivor because there's this big trust component. And that's really the essence of the game. Like, can you build trust with others? Because the only way to succeed in this game is by working together, even though you always have an incentive to attack somebody and grow at their expense. So yeah, that's the game. It's been around for a long time. Like I said, since the 50s. It was JFK and Kissinger's favorite game. There's research for this game from an AI angle going back to the 80s, but the idea that you could play this game in natural language with humans and beat them was just complete science fiction until a few years ago, like it was still science fiction, but we at least thought it was worth pursuing it.
Starting point is 00:11:55 And research really took off in 2019 when researchers started using deep learning to make big bots for this game that could play the non-language version. So there's no communication. You just write down your moves and you kind of have to communicate non-verbally through the actions that you take. we were doing research on this deep mind was doing research on this and then also university of Montreal and a couple other places as well and there was a lot of interest and progress but we decided to take the risky bet of just like jumping to the end point and instead of taking an incremental approach aiming for full natural language diplomacy and I'm glad that we aim for that it seems like one of the pretty amazing things about what you all did is you basically created bots that other people that humans thought were other people and therefore they have
Starting point is 00:12:39 had to learn how to collaborate with each other, how to sometimes lie or deceive, how to sometimes think through multiple moves from a game theoretic perspective. And so it's a radically different thing than playing chess or playing go against another person and then just having almost a probabilistic tree of moves or something. Yeah, you run into this like human element. You really have to understand the human elements. And what's really interesting about diplomacy, aside from just the natural language component, is that it really is the first major game AI breakthrough in a game that involves cooperation. And that's really important because, you know, at the end of the day, when we make these AIs to play chess and go, we're not developing them with a purpose of beating humans
Starting point is 00:13:16 of games. We want to, you know, have them be useful in the real world. And if you want to have these AIs be useful in the real world, then they have to understand how to cooperate with humans as well. A lot and I were talking about Centaur play and whether or not that would persist as an idea at all given. Like, we've accepted that AIs are going to win games at this point. But I think, like, you know, the idea that AIs are going to take action by cooperating with humans, that needs to be a core capability seems obvious. And perhaps this is the making myself feel better story, but I am hopeful that that is a human skill that remains quite important, being able to cooperate with AIs. Well, from what I hear, Centaur play is like, AIs have gotten so strong in games like chess
Starting point is 00:14:00 that it's not clear if the human is really adding that much these days. That's what I told Sarah, too. Yeah, it's kind of a depressing thought. Yeah, I'm crying. I get it. I get it. I accept it. Yeah. I think the humans are still useful in a game like Go. Because like the AIs are super strong, but they will also sometimes like a few times in each game make these like really weird blunders. And in diplomacy, I think, yeah, it's super helpful to have like an experienced human in addition to the AI. Though like, you know, eventually I'd imagine that these systems become so strong that like it kind of goes the way of chess where like the humans just kind of like adding a marginal difference at the end. Yeah, I'm actually just, you know, wondering how long that was. window is for humans and Centaur playing the game of life, right? But it's okay. I got it. Alad was right. Hopefully, yeah, hopefully forever, but we know, we'll see. Yeah, so do you mind explaining the work that you've done in poker and some of the breakthroughs that you made there as
Starting point is 00:14:51 well? Yeah, my PhD research was really focused on how do you get an AI to beat top humans in the game of No Limit Texas Holden Poker? Specifically during my PhD, it was on heads up, no limit Texas, hold on poker. That's two-player poker. And this was a longstanding challenge problem. Actually, if you go back, to the original papers written on game theory by John Nash, the only application that's discussed in the paper is poker. He actually analyzes this simple three-player poker game in the paper and works out the Nash equilibrium by hand. And then actually at the end, he says, like,
Starting point is 00:15:22 oh, yeah, it'd be really interesting to analyze a much more complex poker game using this approach. So I'm glad we finally got a chance to do that, you know, 60 years later. And it's interesting, I think, especially after AlphaGo, this became a very popular problem because after AlphaGo, there was a big question of like, okay, well, AIs can now beat humans at chess,
Starting point is 00:15:44 they can beat humans a go, what can't they do? And the big thing that they couldn't do was be able to reason about hidden information, be able to understand that, okay, this other player knows things that I don't know and I know things that they don't know. And being able to overcome that problem
Starting point is 00:15:59 in a strategic setting was a big unanswered question. And yeah, so that was the focus of my research from basically, my whole grad school experience. And there were a few different research labs that were working on this. And what would happen is every year we would all make a poker bot
Starting point is 00:16:13 and we would play them against each other in this competition called the annual computer poker competition. And we actually won. So like basically what happened is when I started my PhD, there had already been like some progress in AI for poker.
Starting point is 00:16:28 And so the competition really turned into a competition of scaling. It's about like 2.5 billion different hands. that you could have on the river, like the last round of poker in Texas Holden. And what we would do is cluster those hands together using K-means clustering and like treat similar hands identically. And that allows you to compute a policy for poker because now instead of having to worry
Starting point is 00:16:51 about 2.5 billion hands and like having to come up with a policy for each one of those, you can now like bucket them together and now you have like 5,000 buckets or something and you can actually compute a policy for that many buckets. And so this was like before neural nets. That's why we were doing this like this K-means clustering thing instead of deep neural nets. But you can kind of think of it as like the number of buckets that you have is kind of like the number of parameters that you have in your network. And so in grad school, it kind of turned into a competition of scaling. How many buckets could you have in your bot?
Starting point is 00:17:21 And first year it was like 5,000 buckets. Then we got up to 30,000 buckets and then 90,000 buckets. Every year we would have these bigger and bigger models. We would train them for longer, parallelize them. and they would always beat the previous year's model. And in 2014, we actually won the annual computer poker competition, and after that, we decided to take our bot and played against expert human players.
Starting point is 00:17:44 And so this was the first what was called the Brains v. AI poker competition where we invited these four top heads-up, no-limit Texas Holden Poker Pros, and we had them play 80,000 hands of poker against our bot. And the bot actually lost by a pretty sizable margin. And it occurred to me during this competition, that the way the humans were approaching the game was actually very different from how our bot was approaching it. So we would train our bot for like two months leading up to this competition, you know, on a thousand CPUs.
Starting point is 00:18:14 But then when it came time to actually play the game, it would act instantly. And the humans would do something different. Like, you know, obviously they would practice ahead of time, they would develop an intuition for the game. But when they were playing the game against the bot and they were in a difficult spot, they would sit there and they would think. think. And sometimes it was like five seconds, sometimes it was like a minute, but they would think that would allow them to come up this better solution. And it occurred to me that this might be like something that we're missing from our bot. And so I did this analysis after the competition to figure out, okay, if we were to add this search, this planning algorithm that would come up with
Starting point is 00:18:49 a better strategy when it's actually in the hand, how much better could it do? And the answer was it improved the performance by about 100,000 X. It was the equivalent of scaling the model, like scaling the number of parameters, scaling the training by 100,000 X. Now, the three years of my PhD at that point, I had managed to scale things by about 100x. And, you know, that's like quite good.
Starting point is 00:19:15 I was very proud of that. But when I saw that result, it made me appreciate that everything I had done in my PhD up until that point was just a footnote compared to adding search and scaling search. And so for the next year, I just worked basically nonstop, like 100-hour weeks, trying to scale up search through as much
Starting point is 00:19:32 competition at the problem at inference time as possible. And then we did another competition in January 2017 where we played against four top expert poker players again, $200,000 in prize money to incentivize them to play their best. And this time, we completely crushed them. Poker players were literally telling us they did not think it was possible to beat expert poker players by that kind of margin. Yeah. And so that's the story of like, you know, my grad school experience, working on poker AI. That was for two-player poker. We ended up after that working on multiplayer poker, on six-player poker. Again, the big breakthrough there was that we developed a more scalable search technique. So instead of always having to search to the end of the game, it could search just
Starting point is 00:20:11 a couple moves ahead. And what was really interesting there is the bot, we did another competition, the bot won, and that bot cost under $150 to train if you were to run it on like a cloud computing service. And I think that shows that this wasn't just a matter of scaling compute. It really was an algorithmic breakthrough. And this kind of result would have been doable 20 years ago if people knew the approach to take. If you look at a lot of other games, those sorts of big shifts in performance from a bot relative to people than shifts how people play, right? They learn from the bot or they adapt their game from watching games at the bots play. How did that play out in terms of poker? Yeah, that's a great question. So, you know, the competition, it was really interesting because, you know, so kind of like as a last minute thing, we added this ability. The way the bot works, we give it different bet sizes that it can use. The game that we were playing, there's $20,000 chips, $100, $200 blinds, actually. And so it can bet any amount it wants from like $100 up to $20,000. And so there's not much value in like being able to bet both $5,000 and $5,000. And so we would discretize that action space.
Starting point is 00:21:21 to constrain it to like only considering a few different options. And so there's a question of like, okay, well, what sizes do you give it the choice between? And, you know, towards the end, when we were developing the spot, like, we just had room for extra computation. And so we just like threw in some extra sizes, like 4x the pot, 10x the pot. Like, it doesn't cost that much more. So why not just give it the option? I didn't think it would actually use those sizes. And then during the competition, it actually ended up using those sizes a lot.
Starting point is 00:21:46 And it would sometimes bet, like, you know, $20,000 into a $100 pot. which was completely unheard of in professional poker play. And, you know, I was a little worried about this because I thought it was a mistake at first. And I think the players that we were playing against also thought it was a mistake at first. But then they found that they kept ending up in these, like, really tricky situations. And, you know, they would just really struggle with, like, whether to call or fold. And that's how you know you're playing good poker. If you see the other person, like, really struggling with the decision, that is a sign that you're doing something right.
Starting point is 00:22:17 And at the end, they told us, like, yeah, that's the one thing that we're going to, try to incorporate into our own play, adding these, like, what are called overbets into our strategy. Typically, the strategy was like, oh, you bet between a quarter of the size of the pot and one times the pot. And now in professional poker play, it's actually really, I want to say common, but it's part of the strategy to bet sometimes like 5x the pot, 10x the pot. If you can pull it off in the right way, it can be a very powerful strategy. And I should also say, like, the way professional poker players train now, they all use bots to assist them.
Starting point is 00:22:53 It's a lot like chess where you play the game and then you have a bot analyze your play at the afterwards and see like, okay, did you make mistakes? Where did you make mistakes? How could you do better next time? The game really has been demystified and become a lot like chess. I kind of describe poker as essentially high-dimensional chess.
Starting point is 00:23:12 It's like chess where you have to reason about like a probability distribution over actions instead of just like discrete actions. Yeah, it's really interesting because I don't think people really believe there was fully optimal play in poker before. Like they understood the probability distribution. But if you're playing live poker, like there's social cues, right? And social play.
Starting point is 00:23:33 And that has clearly been swept out. Not as an activity of like enjoyment, but in terms of a strategy that actually wins. Yeah, I think that's surprising to a lot of people that this idea that there is an optimal way to play poker, you know, there's this thing called the Nash equilibrium where if you're playing that strategy, you'll never lose. It guarantees that in the long run, you will not lose an expectation. And the reason for that is because, like, if you're playing against somebody else that's also playing the Nash equilibrium, like obviously you can't both win. One of you is going to lose or you're going to tie. And so in expectation, if you're playing against each other,
Starting point is 00:24:04 against somebody else that's playing the Nash equilibrium, you're going to end up tying. But in practice, what ends up happening is if you're playing the Nash equilibrium in a complicated game like poker, the other person is going to make these small mistakes over time. And every mistake that they make is money in your pocket. And so you just play the Nash equilibrium, wait for them to make mistakes, and you end up winning. And that is now the conventional wisdom among poker players, that you start by playing the Nash equilibrium. If you're really good, you can look at the other players, see how they're deviating from the Nash equilibrium, playing suboptimally, and maybe you can like deviate yourself to capitalize
Starting point is 00:24:37 on those mistakes. But really, the safe thing to do is play the Nash equilibrium. let them make mistakes, and every mistake that they make cost them money and puts money to pocket. What was the most unexpected thing that come out of working on diplomacy in terms of what Cicero could do? I mean, I think the most unexpected thing was just honestly how it didn't get detected as a bot. We were really worried about this leading into the human competitions because, first of all, there's no way to like really test this ahead of time. Like, we can play with the bot, but we know that it's a bot and we can't really like gather a bunch of people together and stick them in a game and, you know, have them play with a bot without having them realize
Starting point is 00:25:11 that something's up, right? Like, if this company is hiring them to, like, play a game, like, and they know that we're working on diplomacy, like, clearly they're going to be playing with a bot. And when people know that they're playing with a bot, they behave very differently, right? We didn't want to turn this into a touring test. And so we had to enter the bot into these games where players did not know that there was a bot in the mix. That was the only way that we could get, like, meaningful results. And just to be clear, the reason for this is because, like, diplomacy is a natural language negotiation game. And so you're having these, like, really complicated, long conversations with these people.
Starting point is 00:25:43 And it's kind of hard to get away with that as a bot, but not be detected. And so our big concern was like, we stick this bot in a game. And within like five games, maybe even two games, they figure out it's a bot. Word gets out. All the diplomacy, the diplomacy community is pretty small. So they all talk to each other. And then in all the future games, everybody's like asking, you know, touring test questions, trying to figure out who the bot is, and our experiments are just, like, meaningless.
Starting point is 00:26:08 And so we figure, like, okay, maybe we get lucky, and we managed to get, like, 10 games in before they figure this out, but at least we have, like, 10 games worth of data. But surprisingly, we managed to go, like, the full 40 games without being detected as a bot. And that was surprising to be. And I think that's a testament to the progress of language models in the past couple of years, especially. And also that maybe humans aren't as good at talking as we might think. Like, it may be appreciate also that, you know, if somebody's saying something a little weird, because the bot does say weird things every once in a while, their first instinct is not going to be like, oh, I'm talking to a bot.
Starting point is 00:26:41 Their first instinct is going to be like, oh, this person is like dumb or distracted or like, you know, they're drunk or something. And then way down on the list is like, oh, this person is a bot. So I think that we got pretty lucky in that respect. I mean, but also, I mean, the bot did manage to like actually go these 40 games without being detected. And so I think that is a testament to the quality of the language model. I think meta is actually planning to release the data, which is going to be so interesting. But can you just describe like an interaction from the bot you thought was interesting in these negotiations? Oh, yeah.
Starting point is 00:27:12 I mean, I think one of the messages that was like really, you know, honestly, kind of scary to me was just when it was talking to another player. And the player was saying like, hey, you know, I'm really nervous about your units, you know, near my border. And the bot honestly was not planning to attack the player. It was planning to go in the other direction. And it just sent the player this like really empath. pathetic message where it was like, look, I totally understand where you're coming from. I can assure you 100%. Like, I'm not planning to attack you. I'm planning to go the other direction. You have my word. And it really felt like a very human-like message. And 100%, I
Starting point is 00:27:44 would have never expected that to come from a bot. And that, when you see stuff like that, it makes you appreciate. Like, yeah, there's something really powerful here. How do you think about the Turing test in the context of all this? Like, or what's your updated model of whether the test is still relevant or how to think about it? So there was actually a New York Times article I came out from Cade Mets at New York Times on, on like the touring test and what it means. And he actually talks about Cicero in the article. And basically his view is that the touring test is kind of dead. And I kind of agree with that. I think the touring test is no longer really a useful measure the way it was intended to be. Certainly just because we have bots that can, I
Starting point is 00:28:23 wouldn't say they can pass the touring test, but I mean like they're getting close enough that it's no longer that useful of a measure. It doesn't mean that we have general intelligence. I think there's still a long way to go on that. There's a lot of things that these bots can't do well. But yeah, I think my view now is that the Turing test is not that useful of a measure anymore. That doesn't necessarily mean that it was always a useless measure. I think it just shows like how much progress we've made. We're not 100% there, but the progress really has been staggering, especially in the past few years. What measure do you think, or measures do you think make sense to use? And then also, what do you think is missing on sort of the road to general intelligence? I think there's a few things that are missing. The big thing that I'm interested in particular is
Starting point is 00:29:00 reasoning capabilities. You have these bots. They're all doing next word prediction, right? Cicero is a bit different, actually, in that it's actually conditioning its dialogue generation on a plan. And I think that's one of the really interesting things that distinguishes Cicero from a lot of the work that's happening in language models today. But a lot of the research that is happening is using next word prediction. And when it's trying to do something that's like more sophisticated in terms of reasoning capabilities, it's a lot of chain of thought where it's just like rolling out it's, you know, the kind of reasoning that it's observed humans do and they're in its training data and seeing where that leads. So I think there's a general recognition among AI researchers that
Starting point is 00:29:38 this is a big weakness in the bots today and that if we want truly general, artificial, general intelligence, then this needs to be addressed. Now, there's a big question about how to address it. And that's actually why I really like this direction, because it's still an open question about how to actually fix this problem. There's been some progress, but I think there's a lot of room for improvement. What do you think are the most promising possible directions? That is the trillion-dollar question. You're the trillion-dollar man. I think there's like clear base. I mean, like, first of all, chain of thought really was like a big step. And it's kind of shocking just like how effective that was, given how simple
Starting point is 00:30:12 of an idea it is. I tell myself every day when I wake up now, let's think step by step. Yeah. Yeah. So for those of you that don't know, it's just like you add to the prompts like, oh, let's think through this step by step. And then the AI will like actually generate a longer like thought process about how it reaches its conclusion, and then that actually leads to better conclusions. But you can kind of see that as like just rolling out the thought process that it's observed in human data. And so there's a question of like, okay, well, instead of just rolling that out, could you actually improve it as it's going through each step? And so I think things like that, I mean, I'm kind of keeping it like very abstract because, you know, it's an important question.
Starting point is 00:30:48 And also I think there's not a clear answer yet. So I don't want to speculate too much, but I think that there is like room for improvements in this direction. What was the actual data set that was necessary in order for the training here? And maybe to take a step back, you know, I've been having a series of conversations with people about data and sort of like, when do we run out of data that's easily available? And when do we have to start creating either large scale synthetic data or RLHF data or, you know, do you literally pay bounties to people just record themselves all day so you can start collecting interesting data off of them to do different things with over time, right?
Starting point is 00:31:21 As these models scale to a certain point where, you know, you've used up the internet and you start using up all the video content. You start running out of stuff. I'm just sort of curious, like, how you thought about data in this context and what's necessary to really take things to the next level from a, you know, self-driven agent perspective like this. Yeah, it's not clear that data really is the bottleneck on performance here. And I've talked to AI researchers about this,
Starting point is 00:31:42 and I think there isn't as much of a worry about this as people might think. Probably that's because there's a lot more data that's out there than people might realize than that people are using right now. And also it's because I think there are going to, be improvements to sample efficiency as the research progresses. And so I think we'll be able to stretch the data more. What do you think is a bottleneck? So I think the bottleneck is going to be scaling. I mean, so you look at the models that exist today, like they probably cost $50 million to train. You can probably easily 10x that. I wouldn't be surprised if there's a $500 million
Starting point is 00:32:17 model that's trained in the next year or two. You can maybe even go another order of magnitude and train a $5 billion model if you're like the U.S. government or something or like a really big tech company. But what do you do beyond that? Do you train a $100 billion model? You'll probably see some improvement, but at some point it just becomes like not realistic anymore. And so that's that's going to be the bottleneck. Like we maybe get like two orders of magnitude more scaling. And then we have a big problem. And people are focused on like, okay, how do we make this more efficient? How do we train this cheaper, more paralyzed? But you can only squeeze so much out of that. And I think we've squeezed a lot already.
Starting point is 00:32:54 And this is why I'm interested in the reasoning direction, because I think there's this whole other dimension that people are not scaling right now, which is the amount of compute at inference time. You know, you can spend $50 million training this model ahead of time, like pre-training this model. And then when it comes to actual inference, it costs like a penny. And, you know, what happens if instead of it returning an answer in a second, it returns an answer in, like, an hour, or even five seconds or ten seconds. You know, sometimes if people want to give a better answer, they'll sit there and they'll think a bit. And that leads to a better outcome. And I think that that's one of the things that's missing from these models. So I think that that's one of the ways to overcome the scaling challenge. And that's partly why I'm interested in working on that.
Starting point is 00:33:41 Going back to related to what Elad said, diplomacy problem specifically didn't have like, you know, internet scale data, right? As you mentioned, it's like a relative small community. Can you talk about what you guys did in terms of self-play and the data that actually was involved? So diplomacy, the problem was interesting because, yes, there's actually not a ton of data out there. I mean, we had a relatively good data set about about 50,000 games with dialogue. We did... This is from, like, web diplomacy. Yeah, this is from a site called web diplomacy. It's been around for like almost 20 years where people play diplomacy casually on this site. We were very lucky to get this data set. I mean, honestly, I was scouring the internet, trying to find
Starting point is 00:34:20 like all the sites that have available data. And this was basically the only sites that had a meaningful amount of data. Like there was another popular site, but they periodically deleted their data, which was, you know, just mind-boggling to me. It's just, you're sitting on a gold mine here and you're just deleting it
Starting point is 00:34:34 because it's taking up server space. I guess they didn't appreciate that, like, AI researchers will one day be interested in that. And then other sites just, like, refuse to hand over their data. And so I'm really glad that we managed to, like, work out a deal with web diplomacy.comnet because otherwise the project would have just never happened.
Starting point is 00:34:48 Now, that's about 50,000 games of diplomacy, about 13 million messages. And that is a good-sized dataset, but it's not enough to train a bot from scratch. Fortunately, we're able to leverage, like, you know, a wider data set from the internet. So you kind of, like, have a pre-trained language model, and then you fine-tune it on the diplomacy data. And you get a bot that can actually communicate pretty well in the game of diplomacy. Now, that helps with the dialogue, but there's still a problem, which is that the strategy isn't going to be up to par. And that's partly because you can't do that well with just supervised learning. You can't learn like a really, really good strategy in these kinds of games just with supervised learning.
Starting point is 00:35:26 And it's also because the people that are playing these games are not very good at the game. Like the most of the data set is from fairly weak players. You know, that's just a reality. You have a bell curve. The actual strong players are like a relatively small fraction of any data set that you have. And I should say this is not limited to diplomacy. Like we also found in chess and go, we actually ran this experiment. if you do just pure supervised learning on a giant data set of human chess and go games,
Starting point is 00:35:53 the bot that you get out from that is not an expert chess or go player. Even if it's like conditioned to like behave like a chess grandmaster, it's not going to be able to match that kind of performance because it's not doing any planning. That's really what's missing. And so in order to get a strategy that was able to go beyond just like average human performance, or even like, you know, strong human performance to something that's like much better, we had to do self-play.
Starting point is 00:36:20 And this is like how all these like previous game AIs have been trained, right? Like you look at AlphaGo, you look at especially Alpha-Zero, the latest version of AlphaGo, and you look at the, you know, the Dota 2 bot. The way they're trained is by playing against themselves for millions or billions of trajectories. That's also how our poker bot was trained for two-player and six-player poker. Now, the difference is like when you go from those games
Starting point is 00:36:45 to diplomacy, suddenly there is this cooperative aspect of the game. You can't just assume that everybody else is going to behave like a machine. Identically, it's the way you're going to behave. And so in order to overcome that, we had to combine self-play with a recognition that humans are going to behave a lot like how our data suggests. And so using the data set that we have, we're able to build up this model, a rough model of how humans behave. And then we can improve on that using self-play. And so we're figuring out a good strategy, but basically a strategy that's compatible with how humans are playing the game. So to give some intuition for this, because it's not obvious why this changes when you go from a two-player zero-sum game
Starting point is 00:37:30 like chess to a cooperative game like diplomacy. I mean, also I should say, like diplomacy is both cooperative and competitive, but there is a big cooperative component. Like, let's say you're trying to develop a bot that negotiates. If you train that bot from scratch with no human data, it's going to, it could learn to negotiate, but it could learn to negotiate in a language that's not English. It could learn to negotiate in some like gibberish robot language. And then when you stick it in the game with six humans, that's a negotiation task like diplomacy, it's not going to be able to communicate with them. And they're just going to all work with each other instead of with the bot. That same dynamic happens even in the strategy
Starting point is 00:38:08 game, the moves in the game, the nonverbal communication aspect. The bot will develop these like norms and expectations around like what its ally should be doing this turn. Like, I'm going to support my ally into this territory because I'm expecting them to go into this territory. And I don't even have to talk to them about this because it's just so obvious that they should be doing this. But the humans have their own metagame where like, oh, it's actually really obvious that I should be supporting you into this territory. And if you don't understand the human norms and conventions, then you're not going to be able to cooperate well with humans. And they're just going to not work with you and work with somebody else instead. So that's what we really had to.
Starting point is 00:38:43 overcome in Cicero, and we managed to do that by using the human data to build this model of how humans behave, and then adding self-play on top of that as kind of like a modifier to the human data set. That actually has some really interesting implications, right? Like, if you believe in the long term, we are going to have bots that take action in the real world interacting with humans, and humans are perhaps not very good at optimal play in the game of life, and you're interacting with them. It just brings home the point of how important reasoning could be versus learning pattern recognition.
Starting point is 00:39:16 Yeah, I think you're absolutely right that this matters a lot if you want to make AIs that interact with humans in the real world. If you have a car driving on the road, a self-driving car, you don't want it to assume that all the other drivers are machines that are going to act perfectly optimally in every step of the way. You want the self-driving car to recognize that these other drivers are humans and humans make mistakes. and somebody could, like, swerve into my lane. And, yeah, and also, like, you know, just, like, day-to-day interactions, understanding, like, the nonverbal cues of humans and, like, what that means. These are things that, or even the verbal cues, these are things that an AI has to be able to cope with if it's going to, like, really be useful to humans in the real world and not just
Starting point is 00:39:57 beating them at chess. Games have been used for a while now as a way to measure AI progress, and you've worked on poker variants and diplomacy variants. And you mentioned before other work people have done in terms of chess and go and things like that, what do you think is the next frontier in terms of games and sort of research on them in the lens of AI? Yeah, so there's a long history of games as benchmarks for AI. This goes all the way back to like the very foundations of AI back in like the 50s. Chess in particular was held up as this like grand challenge for AI because if we can make an AI that was like as smart as a human
Starting point is 00:40:30 chess grandmaster, then like imagine all the other smart things it could do. Of course, that turned out to be like kind of a false promise, right? Like you get an AI that plays chess and it turns out it doesn't really do anything else. But we've learned a lot along the way. And games are useful as a benchmark because you can compare very objectively to top human performance. Like, it becomes very clear
Starting point is 00:40:48 when you're surpassing human ability in this domain, even if it's a restricted domain. And you also have this benchmark that's existed before the AI researchers came along. Like, AI researchers, it's really easy for them to come up with a benchmark once they have the technique already created. You come up with the technique,
Starting point is 00:41:06 and then you're saying, like, okay, well, now it's really easy to come up with a benchmark that this technique will work for. And you don't want that. You want the problem to come first. And games give you that. But I think we're reaching a point now where individual recreational games are just no longer that interesting of a challenge. You know, I said earlier, we chose diplomacy because we thought it would be the hardest game to make an AI for. And I think that's true.
Starting point is 00:41:31 I can't think of any other game out there where, like, if somebody made an AI that could play that game, I'd be like, wow, that's super impressive, and I did not think that that was possible. And so I think going forward, the field needs to move beyond looking at individual games and starting to look at, first of all, going beyond games, but also looking at generality. The approach that we've used in diplomacy is very different from what we previously did in poker and what others have done in chess and Go and Starcraft. And now there's a question of like, okay, well, if we really want a general system, a general AI, can we have it play all of these games at a superhuman level
Starting point is 00:42:10 and also able to do things like, you know, image generation and question answering and like all these tasks? And if we could accomplish that, then that becomes incredibly impressive. And so I think games will continue to serve as this benchmark, but it's not, instead of serving as a benchmark that the research kind of overfits to, my hope is that it will serve as a benchmark
Starting point is 00:42:31 that we use along other benchmarks outside of games like, you know, image generation benchmarks and language Q&A benchmarks and these kinds of things. Given that the AI is already one in these restricted domains that are challenging in specific ways, like how do you think about the domains that are going to be human skill dominant? Like, are there going to be domains like that? Well, certainly anything in the physical world, I mean, humans still dominate. I mean, when it comes to actually like, you know, manipulation tasks, these kinds of things, robotics is really lacking behind.
Starting point is 00:43:00 And I think I'm trying to avoid doing anything in the physical world for the, that reason. Software is just so much nicer to work with. I think reasoning, there's still things that you can't, that humans are definitely better at, even in restricted domains. You look at something like writing a novel. I don't think you can get an AI to output like the next Harry Potter just yet. That might not be that far off. Maybe it's like five years away or something, but I don't think it's happening just yet. It's kind of scary that it's, I'm really struggling to come up with domains where I'm like, oh yeah, AI is not going to be able to surpass humans in this. Yeah, I was about to say, like, I feel like people often talk about areas where humans will always have an advantage
Starting point is 00:43:39 just because they're humans and they want to feel good about the future versus because there's necessarily something that shouldn't be tractable from a at least sheer logical perspective, right? Yeah, it certainly is. I mean, I think the big advantage that humans have, and it's not clear when AI will surpass humans in this, is generality. The ability to learn from a small number of samples, to be able to be useful across a wide variety of domains. But isn't that generality overstated because I feel like in the examples that you mentioned, you said everything from like image gen to diplomacy and like a single architecture or AI or something. And often it seems like, you know, if you look at the average person, if they're very good at one thing,
Starting point is 00:44:14 they're usually not good at everything, right? And so I kind of feel like the bar that we're using in terms of generality for AI sometimes is higher than the bar we'd use for generality for people in some sense. Or is that not a true statement? I think it's, it's not just about generality, it's really about sample efficiency. Like, how many games does it take for a human to become a good chess player or a good diplomacy player or a good artist? The answer is orders of magnitude less than it takes for an AI. And that is going to pose a problem when you're in domains, but there isn't much data. Now, that seems like a problem that could be overcome. I'm just saying that's a problem that hasn't been overcome yet. And I think that that's
Starting point is 00:44:48 one of the clear advantages that humans have over AIs today. When do you think we'll see the emergence of AIs in financial or economic systems? And obviously we have like algorithmic trading and other things like that. And then we have things like crypto where you effectively have programmatic approaches to effectively money wrapped as code, right, and the ability to interact with those things in reasonably rich ways through smart contracts. Do you think there's any sort of near-term horizon of people experimenting with that or just interesting research being done in terms of the actual interaction of a bot with a financial system? I think it's already being done. If you look at financial markets, I'm sure there's tons of trading powered by deep learning.
Starting point is 00:45:21 I've actually talked to a lot of finance companies about this. I used to work in finance and also like a lot of finance companies love poker. And so I've given a few talks at like various places on AI for poker. And I've talked to a few places about like, is reinforcement learning actually useful for financial markets, for trading? And the answer I get is usually no. I think the major challenge with using things like reinforcement learning for trading is that it's a non-stationary environment.
Starting point is 00:45:47 So you can have all this historical data, but it's not a stationary system. And it's going to like the markets respond to world events, these kinds of things. things. So you need a technique, ideally, that really understands the world, not just training everything like a black box. But could that at all feed into what you're saying about spending more compute on inference versus training? In other words, incorporating real-time signals at the point of decision-making? Or did you mean something else by that in terms of model architecture that would enable you to update weights in certain ways or things like that over time? Well, I think it goes back to the sample efficiency problem, that humans are pretty good at adapting to novel situations.
Starting point is 00:46:23 and you run into these like novel situations pretty frequently in financial markets. Yeah, I think it's also a problem of generality that you need to understand so much about the world to really succeed. Now that said, I mean, I think that the AIs are successful in financial markets in like fairly limited ways. Certainly if you want to like break up big orders, these kinds of things. Also, I should say, like I'm not an expert in this. Like this is kind of outdated knowledge from me because I'm sure like there's a lot of
Starting point is 00:46:47 cunning-edge stuff that's happening that people are not telling me about because it's making money. But I can tell you that this is like kind of the perspective as a. of like maybe five years ago, that I think that there's, it's being used in limited ways, but I don't think it's fully replacing humans yet. Do you think we're going to get bots that negotiate with humans soon? Or I guess let me premise that is we are eventually going to get them. What do you think the timeline is or the use case?
Starting point is 00:47:09 That seems doable. I think it depends on how constrained the domain is. I think if you were to look at constrained domains, certain negotiation tasks, I think that AIs could probably do better than humans in that today. I mean, I'm trying to think of like specific examples, but things. like, you know, if you wanted to negotiate over the price of a good, it could probably do better than a human in a lot of those situations. I think if there's things like salary negotiations, it might do better than humans at that also. I think it depends on how much you need to know
Starting point is 00:47:38 about the world. I think contract negotiations, for example, would still be difficult because there's so much subtlety. There's so much nuance to like every contract. And it's not going to replace a professional negotiator for that kind of task just yet. But kind of the things that are more constrained, don't require as much outside knowledge about the world. I think AIs are probably up to the task already. A friend of mine who used to work with you says that one of the things you're really exceptional at is you tend to pick a neglected research domain with lots of promise. You commit to it long term and then you become the best at it. And many people in the world kind of get attracted to shiny things instead and kind of distracted by whatever is in vogue,
Starting point is 00:48:16 but then it turns out to be less interesting research. What are you thinking about working on next or what interests you is sort of the next wave of stuff to do. I think the big thing I'm interested in is the reasoning problem. And this is kind of motivated by my experience in this game space. You look at things like Alpha Zero, the latest version of AlphaGo. And like AlphaGo in particular is held up as this like this big milestone in deep learning. And to some extent it is. Like it was not doable without deep learning.
Starting point is 00:48:42 But it wasn't deep learning alone that enabled that. If you take out the planning that's being done in AlphaGo and just use the raw policy network, the raw neural network, it's actually substantially below top human performance. And so we have, with just like raw neural nets, we have all these things that are incredibly powerful, like, you know, chatbots, image generation software, but the raw neural net itself still can't play go. It requires this extra planning algorithm on top of it to achieve top human performance. And that planning algorithm that's used in AlphaGo, multicolage research, is very domain-specific. I think people don't appreciate just how domain-specific it is because it works in chess, it works in Go,
Starting point is 00:49:28 and these have been like the classic domains that people have cared about for investigating these kinds of techniques. It doesn't work in poker. It doesn't work in diplomacy. I think because I've worked in those domains, I kind of recognize that this is a major weakness of these kinds of algorithms. And so I think there's a big question of like, okay, how do we – get these models to be able to do these complex reasoning planning tasks with a more general system that can work across a wide variety of domains.
Starting point is 00:49:55 And if you can do that, if you can succeed in that task, then it enables a lot of really powerful things. Like, one of the domains that I'm thinking about is theorem proving. You know, it doesn't seem crazy to me that you could have a model that can prove the Riemann hypothesis within the next five years. If you can solve the reasoning problem in a truly general way. And, yeah, you know, maybe the inference cost is huge. Like, maybe it costs a million dollars per token to generate that proof.
Starting point is 00:50:23 But that seems totally worth it if you can pull it off. And maybe you can do other things with it, too. Like, maybe that's, maybe that allows you to, like, you know, write the next prize-winning novel. Maybe that enables you to come up with, like, life-sipping drugs. But I think... Just for context, the Riemann hypothesis is, like, considered the most important unsolved problem in math where, I don't know, the first X set of solutions have been checked. we don't know for sure yet. Yeah, and I think the key is that I'm really interested in
Starting point is 00:50:51 is the generality. Like, we can solve this problem in domain-specific ways, but then it always ends up like kind of overfit to that domain. And so I think what we need is something as general as what we're seeing with transformers, where you just throw it at any sort of problem and it works surprisingly well. And I guess you're implying that there are ways to frame the problem to make progress that are more general, but really interesting to making progress in reasoning. And that could be around math or possibly code. Is that the right understanding? My hope is that the techniques are general.
Starting point is 00:51:25 I mean, I think it's important to also look at a wide variety of domains in order to, like, prevent you from overfitting. And yeah, one of the domains that I think would also be a good fit is code generation, because I think to write good code, like next token prediction is going to, is getting you surprisingly far. But I don't think it's going to get you all the way there to like replacing engineers at big companies. Yeah, maybe one piece of just context for listeners is
Starting point is 00:51:46 Copilot is amazing, right? But what we are doing with code generation today is very local context specific. Yeah, and so if you want to plan out like a whole product, like that doesn't seem doable with existing technology. And I think the perspective of a lot of people when they hear me say this is like, well, you know, but you just scale it.
Starting point is 00:52:03 You know, you scale up the models, you scale up the training, and that's always worked in the past. And the example I like to give is you look at, okay, you look at AlphaGo. Like, yeah, you could, in theory, scale up the training, scale up the model capacity, and you don't need planning then.
Starting point is 00:52:15 You just have a really large, you run this reinforcement learning algorithm for a really long time. You have a really big network, and it will eventually learn, in theory, at least, how to beat expert humans and go. But there's a question of like, okay, well, how much would you have to scale it up? How much would you have to scale up this raw neural net, the capacity, and the training in order to match the performance that it achieves with Montecollege research? And if you crunch the numbers and it ends up being 100,000 X, now, you know, you these models are already costing like $50 million.
Starting point is 00:52:45 Like, clear that you're not going to be able to scale them by 100,000 X. And so then there's a question of like, okay, well, what do you do instead? And the answer in AlphaGo is like, well, instead of having all that computation be during training, you also have it spend like 30 seconds to figure out what moved to make next when it's actually playing the game. And that shifts the cost burden from having to like pre-compute everything to then being able to think on the fly. And so that's why I think that avenue seems like the piece that's missing. A really random question, because if you look at the human brain, you have these very specialized modules with very specific functions, right?
Starting point is 00:53:22 You have the visual cortex for visual processing. You have like different things for emotion in terms of specific modules. Like there's specific parts of the brain that if you ablate, you remove certain emotive or other capabilities, right? There have been accidents where like poles have gone through people's heads and ablated a very specific place and that people have survived. And so you see this sort of very specific ablation of function through the ablation of specific modules, why is it the correct assumption to think that there should be a
Starting point is 00:53:44 generalizable architecture versus you just have a bunch of submodels that are all running together that collectively enable a wide range of behavior, which is effectively what we see in the brain? That's a good question. I don't think that we need to be tied to a specific technique. The answer might be that we need to have more specialized systems instead of just like one truly general architecture. But I think what I'm thinking about is more the goal rather than the approach. So we want something that's able to succeed across a wide variety of domains. And having to come up with like a unique approach to every single domain that gets you part of the way. But I think that eventually that will be superseded by something that is really general.
Starting point is 00:54:21 Yeah, that makes sense. And I guess, you know, one big domain is just reasoning, right? So I didn't mean to imply that it's different types of reasoning will require different approaches, but more there may be really big things that fundamentally may function in a very different way. And again, that may be incorrect, right? The brain is a evolved system, which means it has enormous limitations in terms of where it came from and how it got created and you often end up with these local maxima when you evolve a system, right? I was just sort of curious about how you thought about that.
Starting point is 00:54:45 Yeah, there's certainly a risk always with research that you could end up in a local minimum, and it's like hard to, people overfits that. And I think actually like machine learning was an example of this. Like deep learning, not many people were focused on this because they kind of assumed it was this dead end. There were only a few people out in the, you know, like Canadian wilderness that were working on this. And that ended up like being tremendously successful. And so there's value in diversity. There's value in diversity of approaches. And so I think it does help to try to think outside the box and try to do something that's look a little bit different than what everybody else is doing. Noam, you are going to go work on this really interesting area. I'm sure
Starting point is 00:55:23 there are other problems you think are interesting, especially given the practical limits of how much money we're willing to spend on scaling up beyond another magnitude or two. What do you think other researchers or teams should be working on that they're not paying enough attention to? Well, I think we're in an interesting place now in AI where, given where things are at now, there's already an opportunity to build up products that can have a big impact on the world. And so it's great to see that there are people that are going in that direction and trying to like bring these, this research into the real world and have a big impact there, make people's lives better.
Starting point is 00:55:58 For what it's worth, both a lot and I got emails from multiple people telling us that they're building price negotiation agents as they, as we speak. Well, like I said, I think it's doable. So I think it's the right call. Yeah, I think of the research side, there's still a lot of interesting questions about, like, how do we make these things more efficient? Are there better architectures we can use? I mean, I think there's just so many questions across the board that are interesting. I think the big thing I would recommend to researchers is not about which area to focus on, but just like the style of research.
Starting point is 00:56:29 I think there's a tendency to play it safe and not take big risks. And I think it's important to recognize that research is an inherently risky field. You know, there's a high probability that what you're working on is not going to be useful in the long term. And you have to kind of accept that and be willing to take that risk anyway. I mean, this happened to me. Like, by the early research in my PhD, in the grand scheme of things, really wasn't that useful. Like, it didn't make as much impact in the long term as I would have hoped. And that's okay because, you know, I had one thing that ended up being quite impactful.
Starting point is 00:57:04 And so I think it's important to be able to take those risks, kind of like go into the field recognizing that you are taking your risk already by going into research. You heard it here first. Be like Nome. Work on things that make you nervous. I think that's all we have time for. Thank you so much for joining us on the podcast, Noam. Yeah. Thank you very much for having me. Thank you for listening to this week's episode of No Priors. Follow No Priors for new guests each week and let us know online what you think and who an AI you want to hear from. You can keep in touch with me and conviction by following. at Saranormus. You can follow me on Twitter at Alad Gill.
Starting point is 00:57:38 Thanks for listening. No Pryors is produced in partnership with Pod People. Special thanks to our team, Cynthia Galdaya and Pranav Reddy, and the production team at Pod People. Alex Vigmanis, Matt Saab, Amy Machado, Ashton, Ashton Carter, Danielle Roth, Carter Wogan, and Billy Libby. Also, our parents, our children, the Academy,
Starting point is 00:58:00 and Open Google Soft AI, the future employer of all of mankind. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.