That Neuroscience Guy - The Neuroscience of Artificial Intelligence

Starting point is 00:00:00 Hi, my name's Olof Kergolsen, and I'm a neuroscientist at the University of Victoria. And in my spare time, I'm that neuroscience guy. Welcome to the podcast. Kia ora, and greetings from New Zealand again. Sorry this episode's a little bit late, but I'm on vacation and I've just been taking a bit of time off. But this is the episode for Sunday last week. We'll get it up and we'll get a bite up and we'll try to keep it on track until I'm back in Canada. You know, since I've been here, I've been thinking a lot about, you know, topics that might be interesting to people. And I wanted to delve

Starting point is 00:00:40 into the idea of the neuroscience of artificial intelligence. Now, you've all used artificial intelligence, whether you know it or not. For instance, if your phone recognizes your face, that's a form of artificial intelligence. And if you've played video games, in those video games, whether it's something as simple as tic-tac-toe or something complex like Halo, there is AI there. The computer is able to control things and it makes decisions. And this even spills out into the real world with robots. For instance, you might own one of those vacuum cleaners that vacuums your house for you. How does it do that? And how does this relate to neuroscience? Well, the reason AI is cool from a neuroscience perspective is that a lot of AI is based on how humans learn and how humans make decisions. And there's actually

Starting point is 00:01:32 a field called computational neuroscience, where the whole point of the field is to sort of try to figure out the mathematical patterns that your brain is computing to learn and make decisions. So on today's podcast, the neuroscience of artificial intelligence, and specifically how AI learns and how AI makes decisions, and how that relates to the human brain. So the origins of AI can kind of be tied to the computer. The first mechanical computer was designed by Charles Babbage in 1822, and it basically was what was called a mechanical computer.

Starting point is 00:02:08 It used gears and things like this to be able to do some simple computations. And this was accelerated, you know, as we headed into World War II specifically. And you might have heard stories about the code breaking that went on at Letchley Park in the UK, the code breaking that went on at Letchley Park in the UK, where basically mechanical computers were used to break German U-boat codes to allow the Allies to avoid having ships sunk. But World War II also generated the first electric computer, and it was programmable. It was ENIAC in 1943. But really, the history of AI is a little bit further than that, because these computers were essentially just calculators in a sense. But machines that think sort of evolved in the 50s, and particularly credit for the term artificial intelligence is given to Marvin Minsky and John

Starting point is 00:02:58 McCarthy at Dartmouth College, who were sort of the fathers of the field of artificial intelligence. who were sort of the fathers of the field of artificial intelligence. And they were trying to come up with ideas about how machines could behave in ways like animals and humans do. So an example of this emerged in the late 90s. Now we're forwarding quite a bit. And this was the Deep Blue computer built by IBM that actually played Garry Kasparov, the world's reigning chess champion, and played him twice. The first time in 1996, Kasparov beat Deep Blue 4-2, but a rematch was scheduled the

Starting point is 00:03:34 following year in 1997, and Deep Blue beat Garry Kasparov three and a half games to two and a half games. It was the first defeat of a reigning world champion by a computer under tournament conditions. So how did AI work in this case? And what was Deep Blue actually doing? Well, Deep Blue relied on what are called lookup tables. And a lookup table is fairly straightforward. The concept of a lookup table is essentially given the state of the board. And let's just use chess for this example. and if you don't know chess, you could think of tic-tac-toe, but given the state of the board, so the pieces on the board

Starting point is 00:04:10 and where they are, what is the move the computer should make? All right, and a very simple lookup table would be, you know, if you think of tic-tac-toe, if x takes a top left corner, o could take the bottom right corner. So the lookup table would say if X goes there first, this is where I go. And lookup tables can be more sophisticated by simply saying if X goes in this location, O could make randomly choose one of two locations. Imagine the middle square or the bottom right square.

Starting point is 00:04:42 And this is true of chess as well. So what Deep Blue was doing was effectively it was looking for the state of the board that Kasparov had put it in. And so the pieces on the board were all in positions. And then Deep Blue was designed to make a move based on that. So it would say, given Kasparov doing this, I'll do this. The modern version of AI, if you stick with this kind of thing, is Google's DeepMind project. And Google's DeepMind doesn't rely on lookup tables. It uses a combination of what's called deep learning and neural networks to basically learn

Starting point is 00:05:17 how to solve problems. And in the game, there's a couple of great examples of that. Basically, the DeepMind was asked to beat the entire library of Atari video games, and it didn't know anything about the games. It just sort of figured out what was told, the, you know, this is a win, this is a loss. And by losing a whole bunch of times, it figured out the moves to make or how to move the joystick, if you will, to be able to play the Atari games. And after a period of training, the AI, in other words, to be able to play the Atari games. And after a period of training, the AI, in other words, Google DeepMind, could actually play the games better and more efficiently than humans. And in the one that was in the news a lot more recently, in 2017,

Starting point is 00:05:57 it came to light, was a specialized form of DeepMind called AlphaGo bested the number one player in the world in the game Go, which is an ancient Chinese board game. And the reason Go is such an important challenge is that the number of possible moves in Go is equivalent to the number of stars in the known universe. So solving Go was considered to be an almost impossible problem for AI, but AlphaGo was able to do that using the aforementioned learning algorithms and neural networks. So for the second half of this, that's a bit of a history lesson. For the second half of this, I want to sort of talk briefly about how these things work. And these are concepts that we've sort of covered before, but I want to frame this in the artificial intelligence world. So if you think of your vacuum cleaner, that little robot vacuum cleaner that learns how

Starting point is 00:06:49 to vacuum your house, well how does it actually work? How does it learn what to do? The way it learns is actually pretty straightforward and it relies on what's called reinforcement learning. And we've talked about reinforcement learning before and how humans learn back in season one, but basically if you want to think of the reinforcement learning example with the vacuum cleaner, basically it just starts driving. And if it moves forward and it doesn't hit anything, it says, well, this is a good little bit of space here. And it assigns that a value. And

Starting point is 00:07:20 reinforcement learning relies on what are called prediction errors, if you remember. And reinforcement learning relies on what are called prediction errors, if you remember. You take what actually happens and what you expect to happen, and you compute a prediction error of the difference. So if the robot doesn't expect to hit something and it doesn't hit something, there's no prediction error. So when the robot moves forward and it doesn't hit anything, it actually goes, well, this is good, I haven't hit anything. So it assigns a value to moving forward from that particular location. So the way you can visualize this is imagine if you took your living room and you drew it on a piece of graph paper, all right? And you could outline where the walls are, you could outline where the furniture is, and then you're going to have a whole bunch of empty squares, which is the space that needs to be

Starting point is 00:08:03 vacuumed. So when the robot moves into a square where there's nothing it vacuums away happily and it says well this is an okay place to be. Now the robot's also keeping track of the squares that it's visited because it doesn't want to sort of vacuum the same space over and over again. Now what happens when the robot hits something? Well it basically treats that as a punishment. So again in terms of prediction errors, basically what happens is the robot says well moving forward from the space I was just in is bad because I've hit a wall and it basically has a negative prediction error. So it didn't expect to hit a wall but it hit a wall so it assigns a

Starting point is 00:08:42 negative value to moving forward in the square where the wall is. And that's what the robot does, is it just keeps driving around, bumping into things, and anytime it hits something, it assigns that specific location a negative value. And if you imagine doing this on a piece of graph paper, what you would come up with, if we just kept the numbers really simple, is anywhere where there was a wall would be a minus one or a piece of furniture, and anywhere there's open space there would be a one. And what the robot's learning to do then is just to move across all of the ones. It's going to cover all of the ones, and the really smart robots will do a bit of math to figure out what's the best pattern to do this.

Starting point is 00:09:19 The cheaper ones will just drive around randomly until it's literally covered all the ones, and it might have to retrace its steps a few times but it's just going to keep looking for ones and when it when it covers the one you can imagine it sort of pushes it to zero but at the same time it's avoiding those minus ones it doesn't want to hit walls or furniture and this is why if you move your furniture around your robot vacuum will get confused because the new furniture location might have been where a bunch of ones were and now all of a sudden it hits them so it assigns it a minus one and this is how the robots adapt is they're always computing these

Starting point is 00:09:52 prediction errors every time they vacuum now for a final note on this well how does it get back to the starting point well the way it gets back to the starting point is it assigns a very high value, say a 10 to the charging station. So once it's done vacuuming, all right, it's covered all the ones and then, or it's running low on battery power because it's keeping track of its own charge level. Well, then it just navigates through that space looking for that 10. And again, a bit of math would allow it to compute the shortest path to the 10 that avoids all those minus ones and gets it back to the starting location. Now, this is also true of how computers and artificial intelligence learns

Starting point is 00:10:32 to play games. So how does AI learn to play a game? Well, quite simply, it uses this same sort of logic. So imagine a computer is learning to play tic-tac-toe. Well, the computer will just move randomly, all right? So it's just going to pick random things in the early stages because it doesn't know anything about whether moves are good or bad, and it doesn't have those lookup tables that we talked about. But what it's going to do is when it wins a game, it's basically going to say, well, hey, what were the moves I made that game that got me a win?

Starting point is 00:11:07 And it's going to assign positive values or ones to the moves that helped it win. And when it loses a game, it's going to look at the moves that cost it the game, and it's going to assign minus ones to those moves. So it's actually creating its own lookup tables. So as opposed to being told the lookup table by playing itself a ton of times, it will learn which moves led to wins and which moves led to losses using these prediction errors. So just the same way that humans learn. This is what we think happens in the human brain. You know, if you were, and I used this example back in season one, but imagine you're a student and you wrote an essay and, you know, you thought you got an 80 on your essay, but you got a 60.

Starting point is 00:11:50 This is a negative prediction error because the outcome is 60 and the expectation was 80. 60 minus 80 is minus 20. And that prompts a change in behavior. This is how reinforcement learning works. And this is how artificial intelligence learns to do things and i'll finish with one more example of this and that is td gammon i've always loved the td gammon story td gammon was a computer program written by andy tazaro and basically he wanted a computer to learn to play backgammon and if you don't not familiar with backgammon you could

Starting point is 00:12:23 quickly google it but basically the idea is you have pieces on a board and your opponent has pieces and you roll dice and move your pieces around the board to get to the other side. And your opponent's trying to do the same thing and if your opponent lands on you, it knocks your piece off the board and you have to get back on and vice versa. But it's a fairly complex game. Anyway, what Andy Tassaro did is he just used reinforcement learning and he had the computer play itself millions of times over and over and over again. But every time it won a game, it would say, well, what moves did I make that allowed me to win the game? And it would assign positive values to those moves. And every time it lost

Starting point is 00:13:01 a game, it would assign negative values to the moves that it made. Now I'm sort of short cutting this a bit for simplicity's sake because what it actually does is after any given move it's going to compute a prediction error and if it's getting to a state or a board position that's closer to the winning the game then it's going to compute a prediction error saying hey things are better than expected and if it moves into a state that's going to eventually leads to a loss it's going to compute a prediction error saying, hey, things are better than expected. And if it moves into a state that eventually leads to a loss, it's going to know that as well and compute negative prediction errors. So it doesn't actually do it all right from the end. It's doing it step by step as it goes through. But hopefully you get the basic concept that when these artificial intelligence agents make moves,

Starting point is 00:13:40 they compute prediction errors and they use that to change the values for the choices they're making. they compute prediction errors and they use that to change the values for the choices they're making. In the case of TD Gammon, you know, how it should move its pieces given a given board position and a dice roll. But the cool thing about TD Gammon, and the reason I love this story, is when they were testing it, it was playing World Champions at backgammon, and at one stage it made a move that a human wouldn't have made in that situation. So basically, given a certain board position and a certain dice roll, it made a move that humans wouldn't have done. The experts were saying, well, it's made a mistake. That's not the move to do. Well, it turns out, after much play, humans realized that the move that TD Gammon had come

Starting point is 00:14:22 up with was actually better than what humans thought they should do in that situation. And it had found that just by trial and error learning or reinforcement learning. So the AI had created something new. And that's a key point I want to finish on, is that artificial intelligence does have the ability to create new things because it's just going to try random combinations of things until it works. And you could take that lesson from TD Gammon and put that to music where computers can now generate music because they know what good music is because we tell them that. And if it generates

Starting point is 00:14:55 a random piece of music and that people don't like, it punishes the notes that it's chosen and those kind of things. And if it generates something we do like, it reinforces them. And now you've got computer algorithms that can make unique music. And this is true of almost anything, but the key concept here that underlies this are these prediction errors or reinforcement learning. Now that's the end of part one about the neuroscience of artificial intelligence. And I've told you a little bit about how AI learns, and it follows those reinforcement learning principles that humans use as well. On the next episode, I'm going to come back and talk about the AI of how artificial intelligence makes decisions, and that will involve what are called neural networks.

Starting point is 00:15:42 Thank you so much for listening. Please subscribe to the podcast. If you have ideas, you can follow me on Twitter, at ThatNeuroscienceGuy. Just DM me. Say, hey, what about an episode on this? We're planning the last couple episodes of Season 3, and then we're going to take a bit of time off and plan Season 4. But please send us your ideas. And of course, there's our website, ThatNeuroscienceGuy.com. Thank you so much to those of you

Starting point is 00:16:05 that are supporting us on patreon and buying t-shirts all of that money is going to graduate students in my lab to help them with their studies my name is olaf krigolson and i'm that neuroscience guy thank you so much for listening and i'll be back to you shortly with another neuroscience bite and another episode of the podcast

Your Ad Here

That Neuroscience Guy - The Neuroscience of Artificial Intelligence

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.