CoRecursive: Coding Stories - Story: Reinforcement Learning At Facebook with Jason Gauci

Starting point is 00:00:00 Okay, so before we get into it, why don't you state your name and what you do? My name is Jason Gauci, and yeah, I bring machine learning to billions of people. Hello, and welcome to Code Recursive, the stories behind the code. I'm Adam Gordon-Bell. Jason has worked on YouTube recommendations. He was an early contributor to TensorFlow, the open source machine learning platform. His thesis work was cited by DeepMind. They were the people who beat all human players at Go and at StarCraft, I think, and who knows what else.

Starting point is 00:00:34 If you ever wanted to learn about machine learning, you could do worse than have Jason teach you. But what I find so fascinating about Jason is he recognized this problem that was being solved the wrong way and set out to find a solution to it. The problem was making recommendations. You know, like on Amazon, people who bought this book might like that book. He didn't exactly know how to solve the problem, but he knew it could be done better. So that's the show today. Jason's going to share his story, which will eventually change the way Facebook works. And we'll learn about reinforcement learning and neural nets and just about the stress of pursuing research at a large company. It all started in 2006 when Jason was in grad school. Yeah, so I went to college, picked computer science, and I remember my parents were a little,

Starting point is 00:01:18 found that a little strange. They said, oh, you could be like a doctor or a lawyer or something, like you have the brains for it. And then at one point, my dad thought it was kind of like going to school to be a TV repairman. And so he wasn't really sure. He's like, are you sure you really want to do this? Like, you know, now I could just buy another TV or another computer if it breaks. And to this day, I have to explain to people, I really don't know how to fix the computer. I give this laptop broke right now. I just have to do the same thing my parents do and just go get another one.

Starting point is 00:01:48 I have no idea. But I had an option to do like a master's PhD hybrid or basically do it all kind of in one shot. And after two years, if I wanted to call it quits, then I would get the master's degree. Yeah, at the time, I thought I will just do the master's. I didn't really plan on getting a PhD, but actually the very last class that I took in my master's was a class called neuro evolution, which was all about trying to solve problems through neural networks and through evolutionary computation.

Starting point is 00:02:26 So America Online had this capture the flag game for free. And I remember I downloaded it on a 56K modem. It took forever. And it was basically like a turn-based capture the flag where you played as one person and there was a friendly AI for the other you know three players and then there was four player enemy ai and you're trying to capture the flag and if the enemy touched you you're in jail but the friendly ai could could bail you out of jail and i think i played this is you get to see more and more of the of the ground as you travel like yeah that's right yeah yeah do you remember

Starting point is 00:03:03 the name of it so So the game is called Capture the Flag. If you've not played it, you view a large field with trees in it from overhead and you can only see where your players have been. There's like a fog of war

Starting point is 00:03:13 like in Starcraft, except it's turn-based. You move a certain number of moves and then your players freeze there and the computer gets to take its turn and move its players. But for my Neuroevolution course, my final project, I recreated this game, Capture the Flag,

Starting point is 00:03:28 and then I built AI for it using neuroevolution. And so just to sort of unpack that, you know, neural networks are effectively like function approximators that are inspired by the way the brain works. And so, you know, imagine if you imagine, you know, graphing a function on your calculator, I'm sure everyone's kind of done this on their TI-85. You can punch in, you know, Y equals X squared and it'll draw a little parabola on your TI-85 or whatever the, you know, calculator is nowadays. And so what a neural network will do is it will look at a lot of data and it will, it can represent almost any function. So if it's like your, your original like graph thing, it's like telling it when X is two,

Starting point is 00:04:12 Y is three, you're feeding it all these pairs. Exactly. Yep. Um, but memorizes them. Yep. But because there's sort of contradictions and there's noise in the data and all of that, you know, it won't, you won't tell it exactly like, you the data and all of that, you know, it won't, you won't tell it exactly like, you know, force it to be, you know, Y is three when X is three, but it's like a hint. You say, Hey, when X is three, Y is probably three. So if you're not there, get a little bit closer to there. And you do this over and over again for so many different Xs that you end up with some shape that, you know, won't pass through every point. It's usually impossible, but it'll get, you know, close to a lot of the points.

Starting point is 00:04:50 This is basically back propagation. It's a form of supervised learning. You're training the neural net by supervising it and telling it, you know, when it gets the wrong answer, what it should have gotten instead. And to do this, you need to know what the right answer is so that you can train it. And so that works great when you have a person going and telling you the perfect answer or the right answer. But for puzzles and games, for example, you don't have that. So look at Go. To this day, people haven't found the perfect Go game, a Go game for people who are playing perfectly. And so you don't have that. And so you have to do something different. You have to learn from experience. So you just say, look, this Go game, that's a really

Starting point is 00:05:33 good move. That's like better than any move we've ever seen at this point in the game. It doesn't mean it's the best. It doesn't mean that your goal should be to always make that move, but it's really good. A simple way to do that is have a neural network and have it play a lot of go and then make a subtle change to it and have it play a lot of go again and then say, okay, did that change make this player win more games? If it did, then you keep the change. And if it didn't, then you throw it away. And so if you do this enough times, you know, you will end up in what we call a local optimum. In other words, you're making these small changes.

Starting point is 00:06:13 You're picking all the changes that make their go player better. And eventually you just can't find a small change that makes the player better. And so you could think of evolutionary computation at a high level as doing something like that, but it's doing it a really large scale. So maybe you have a thousand small changes and 500 of them make the player better. And you can adapt all 500 of those different players and the existing players. So you can take all 501 of those players and make a player that's stepwise, that's better in a big way. And you would just keep doing that. So this is what Jason learned in his neuroevolution class. He would create all these

Starting point is 00:07:00 generations of players which had random changes and like evolution have them play capture the flags against each other slowly breeding better and better players was there a moment where where you tested out your algorithm like did you try to play it and capture the flags yeah the real aha moment was you know having this sort of you know god's eye view without the fog of war because i was just an observer and watching the AI and specifically like watching this almost like wolf pack behavior where three, you know, players would kind of surround a player and trap them. You know, just seeing that thing that you've seen in nature just kind of emerge just organically.

Starting point is 00:07:41 That to me was amazing. Like that was unbelievable. I mean, when I saw all the players kind of converge and capture and kind of do this methodical thing and then take the flag and even, you know, I think at one point like two of them had been captured and so the other two just decided to go for the flag and just forget of any strategy and just go for broke.

Starting point is 00:08:02 Did you watch it and like, you know, anthropomorphize? Like, did you cheer for one team? yeah yeah yeah i did i mean you you naturally you want to cheer for the underdog so so yeah you would see this scenario play out where they would chase after one person you know even though there was four of them and only two defend two of the other team they would chase after one and the other one would get the flag i didn't follow the strategy it's like one one runs and then yeah so one would run and the other four would all chase after that one and then the second one would go and get the flag and win uh it's like a decoy yeah but it would only happen when the ai was disadvantaged you know so there was there was uh the way it worked was there's four players.

Starting point is 00:08:48 So there's a bunch of sensory information that was just repeated four times to make the input of the network. And I guess even though it's playing against itself, it kind of learned that when two of those inputs are completely shut off, which is what happened when they were captured, to then execute this Hail Mary strategy. And yeah, it was just super fun to watch that play out. And I would remember just sitting in the lab kind of cheering for this one person, and they would try to come back.

Starting point is 00:09:22 In your head, it was kind of hard to know, can these people because it's a big grid can they get back quick enough to catch this person so like it'd be pretty suspenseful and uh just seeing all of that just all encoded in this network like neural like excitation back prop and all these things for like understanding what a neural network is doing all this stuff hadn't been invented yet so it was just a black box and it was just magic i mean you would run it on the university cluster who knows what it would do you know you would get it back a few days later and you would just see all this amazing emergent behavior. That to me just really lit the spark. And so I actually was, I'd already accepted a job with the intention of just getting the master's and leaving. I didn't see anything that inspired me. But right there at the 11th hour, I took this course

Starting point is 00:10:17 and I said, this is amazing. I mean, the fact that it actually worked and it exploited things that I would have never thought of, that really is what lit the spark. So based on this cool capture the flag experience, Jason decides to do his PhD and he gets a fellowship. So the entire time since we've been born all the way through to when I was doing my PhD, we were in this sort of neural network winter where people had given up on this idea of function approximation. This fellowship came from the natural language processing faculty member. So you have to have been nominated for the presidential fellowship. And I was nominated by Dr. Gomez who worked in natural language processing. You know, he really vouched for me. When he found out that I wanted to do neuroevolution, yeah, he was extremely disappointed.

Starting point is 00:11:13 I mean, he was particularly disappointed that I wanted to do this area that had no future. Yeah, I remember him saying things like, basically, there's no knowledge in the neural network. It's just a function. That's effectively what his argument was. And yeah, I mean, he was a total unbeliever. We know now that neural networks would have a resurgence, especially in supervised learning. But Jason had no way to know that. He just knew that he was seeing really amazing results, and that he was having a lot of fun. So I actually, I worked full time through the entire part of the PhD from starting from the thesis. And so that was a wild experience. So I would wake up early in the morning.

Starting point is 00:11:56 I would take a look at what evolved last night in the cluster. If it didn't crash or anything, I would either monitor it or if it was finished, I would like look at some results. I'd play a little bit of Counter-Strike at like 8 a.m. because I found the like most polite people were on early in the morning, maybe all like working professionals or something. I would work the entire day and then I would go home

Starting point is 00:12:18 and then I would see how the run was doing and everything. I ended up doing that for something like three or four years. This is around 2009. And Jason ended up developing a method called Hyperneed for his thesis. So I graduated college. I worked an extra year. And then I ended up going to Google. I had a friend who is a few friends who are at Google. And they said, hey, you have to come here. This is amazing. You know, it's like It's like Hunger Games for nerds. There's tons of nerds out there. I was like, okay, this sounds right up my alley. I did the interview. I was like, oh, this is like paradise. There's a real chance to focus. I met tons of really smart

Starting point is 00:12:57 people. I thought this is absolutely amazing. But after about a year of that, that's when Andrew Ng joined Google and that's when deep learning really started to take off. I kind of reached out to him and reached out to other people that were in the research kind of wing of Google. And I said, I have this experience. I love neural nets. I said, let's, you know, want to study this with you guys and make progress. So I ended up transferring to research. So then when the deep learning kind of revolution really hit, I was working on a lot of that stuff. So I was working really closely with the Google brain team. And I wrote a chunk of what ended up becoming TensorFlow later. I mean, I'm sure someone else rewrote it, but I wrote a really early version of parts of TensorFlow.

Starting point is 00:13:43 We actually, you know, the team I was on built the YouTube recommendations algorithm. So figuring out kind of what, you know, when you watch YouTube video and on the right-hand side, you get those recommended videos, you'll figure out what to put there. And the whole time I thought, wow, this is all really reinforcement learning and decision making. You know, we're putting things on the screen and we're using supervised learning to solve a problem that isn't really a supervised learning problem. Traditional recommender systems try to learn what you like from your history.

Starting point is 00:14:16 Amazon knows that people with my shopping history would like certain books, so it ranks them by probability and shows me the top five or something like that But what if recommendations were more like a game more like capture the flag? The computer player shows me recommendations and its goal is to show me things that I'll end up buying So it'll probably show me the top probability items But occasionally it'll throw in some other things so that I can learn more about me if I buy those it will know something new And it will be able to provide even better recommendations in the future. Like in your Capture the Flag game, right? There's like the fog of war.

Starting point is 00:14:51 There's like things you don't know. And it's like, okay, so I'm on YouTube, and you know that I like whatever computer file has like all their videos, right? But you don't know about something else that I might like, right? So it's like, you could just try to explore that space that is me, try to throw something at me and see what happens. Is that kind of the idea? Yeah, exactly right.

Starting point is 00:15:11 I mean, that's called the explore-exploit dynamic. So you can exploit, which means show the thing that you're most likely to engage with, or you could explore. You could say, well, maybe once per day, we'll show Adam something totally random. If he interacts with it, then we've learned something really profound, right? There's all these other times we could have shown you that same thing that we didn't. And so that's really useful to know, right? And so yeah, that isn't captured by taking all these signals and then just maybe adding them up and sorting. Like that's not going to expose that explore-exploit dynamic,

Starting point is 00:15:53 right? To expose that dynamic, you have to occasionally pick an item that has a really low score. In other words, these recommender systems, they need to make decisions. They're like AI bots, like a Go player, capture the Flag bot. It's a reinforcement learning problem, but nobody's approaching it this way. It's not even totally clear how to do it. So yeah, I talked to this gentleman, his name was Hussein Mahana, who's now leading AI at Cruise. And he was a director at Facebook. And when I met Hussein, basically the gist of what he said is, you know, I don't really know how to do it either, but I want you to come and figure it out. That to me really kind of ignited something.

Starting point is 00:16:34 I kind of felt that passion to really push state of the art. And it's something really transformational, right? Because it is a control problem and nobody really knew how to solve it using the technology that is designed to solve it. I just found that super appealing. So I came to Facebook about five years ago with the intent of kind of cracking that. Yeah, it's been a pretty wild ride. So was there a specific task or was it just... Initially, I was brought in to basically rethink the way that the ranking system works at Facebook. So for people who don't know, when you go to Facebook, whether it's on the app or the website, you see all these posts

Starting point is 00:17:19 from your friends, but they're not in chronological order. And actually a lot of people complain about that, but it turns out being chronological order is actually terrible. And we do experiments and it's horrible. It's just people are in a state of like, and I put myself in this category. I was in a state of unconscious ignorance about chronological order. It sounds great. Like you have your position and you can always just start there and go up. You never miss a post, right? It fails when your cousin like posts about crochet 14 times a day, right? And everyone has that one friend, right? And you just don't realize it because the posts aren't in chronological order. So they thought, well, we could use reinforcement learning to fix all of

Starting point is 00:18:02 these cold start problems and all these other challenges that we're having. You know, clickbait, all these things can be fixed in a very elegant way if you take a control theoretic approach versus if you're doing it by hand, it gets really complicated. This is the recommender thing all over again. If Facebook can kind of explore what your preferences are, it can learn more about you and give you a more valuable newsfeed. So Jason joins Facebook and he spins up a small team, but things are a bigger challenge than he thought they would be. So it's kind of interesting. It started off as me with a few contractors, so like short-term employees. And these were actually extremely talented people in reinforcement learning, but they worked for kind of like a consulting company.

Starting point is 00:18:52 Think of it as like a Deloitte or McKinsey or that type of thing. And so they had no intention of being full-time at Facebook or anything like that. And we worked on it and we just couldn't really get it to work. And so after their contract expired, we didn't renew it. And I was a little bit lost because I wasn't really sure how to get it to work either. But I kept working on it on my own.

Starting point is 00:19:18 It was a really odd time because I was starting to feel more and more guilty because I came in as a person with all these years of experience from these prior companies. And I was contributing zero back to the top line or the bottom line or any line. I was just spending the company's money. I realized that being sort of, you know, people joke about how nice it would be to be like a lazy writer. Like you hear this stereotype of like the rest invest. So people, you heard about this in the 90s, you know, people on Microsoft who had these giant stock grants that exploded

Starting point is 00:19:58 and they would just sit there and play solitaire until their stock expired or whatever. I realized being a lazy rider, actually, it's terrible. I mean, you really need to have kind of like a Wally mentality from Dilbert. And in my case, I wasn't lazy. I mean, I was working super hard, but I was a rider in the sense that I was being funded by an engine that I wasn't able to contribute to. And it felt terrible. You know, even I didn't get good ratings and all of it was really tough. And I actually, at one point, had like a heart to heart with my wife and I was kind of thinking, you know, you know, should I quit this or should I keep trying to do it?

Starting point is 00:20:40 I really thought that it was going to work the whole time. I was really convinced that it would work. And so what I decided to do is I decided to ask for permission to open source it. And the reason was I felt like if they fired me, I could keep working on it after I was fired. And so they were totally fine open sourcing it. My director didn't even, you know, he didn't really, it wasn't really on his radar. So he just said approved.

Starting point is 00:21:10 And so it got open source. Because it wasn't contributing to any sort of bottom line. So they were like, what? Yeah, it was totally below the radar. So it's just the way someone would like approve like a meal reimbursement or something, right? It's just the word approved. And then all of the code could go on GitHub. This project is on GitHub right now

Starting point is 00:21:28 and at reagent.ai. In retrospect, it seems like the project might've been failing because Jason was targeting the wrong team. There's sort of this interesting kind of catch-22 where the teams that are really important are also almost always under the gun. And it's very hard for them to have real freedom to pursue something.

Starting point is 00:21:50 And so you end up with a lot of the contrarian people end up sort of on fringe teams. And so there was a gentleman, his name was Shu. Shu was on this team that was pretty out on the fringe. They were notifying people who are owners of pages. So for people who aren't familiar with Facebook, there's pages on Facebook, which are kind of like storefronts. So there's a McDonald's page, and there's a person or a team of people who own that page.

Starting point is 00:22:21 They can have editorial rights and stuff like that. And so these were notifications going out to people who own pages, basically informing them of their page. A page like McDonald's, there's things changing all the time. So if you were to just notify everyone about everything, it would just blow up their phone. What type of notifications would I get? Like what's going on with my page? You know, you have a lot more likes this week than usual or fewer than usual. Yeah. There's 13 people want to join your group that you have to approve or not. So it's the same team that does groups and pages, these kinds of things. Yeah. These are all things that theoretically somebody wants to get, but if their page is busy, it's just way too much information. Yeah, exactly. And on the flip side, if their page is totally dead and one person joins maybe every two months, it's probably just annoying to send it to them. Yeah. So yeah,

Starting point is 00:23:18 part of the challenge there is coming up with sort of that metric of like, what are we, what value are we actually providing and how do you measure that without doing a survey or something like that? And so they were using decision trees to figure out the probability that somebody would tap on the notification. And if that probability was high enough, they would, they would show it. But, you know, at the end of the day, they don't want people to just tap on notifications. What they really wanted was to provide value. And so we could look at things like, is your page growing? Are you taking care of people who want to join your group? Are you actually going through? And if we sent you a notification, then are you going to go and

Starting point is 00:24:02 approve or reject these folks? And if we don't send you a notification, are you not going to go and approve or reject these folks and if we don't send you the notification are you not going to because if you're going to do it anyways at four o'clock and we notify you at 3 45 that's just annoying right because they would just end up optimizing for like the message you'd be most likely to click on you that's not really what they care about right like just that you're tapping these exactly yeah exactly a lot of us don't like getting notifications for things that we would have either done anyways or have no interest in. I mean, we wanted a better mousetrap, right?

Starting point is 00:24:32 Facebook's actually not in the business of just always sending people notifications. They do have social scientists and other people who are trying to come up with real value and measuring that objectively. It's not the newsfeed, but Jason has found a team who's willing to try his reinforcement learning approach. Here, the action is binary. Send the message or don't. So it's like, I have a page for this podcast, like co-recursive, and I do nothing with it.

Starting point is 00:24:58 And then so your reagent, it gets some sort of set of data, like here's stuff about Adam and how he doesn't care about this page. And then, okay, we have this notification. What should we do? Is that the type of problem? Yeah, pretty much. Imagine like an assembly line of those situations. So there's billions of people and we have this sort of assembly line and it just has a context. A person hasn't visited their page in 10 days. Here's a whole bunch of other contexts. Here's how often they approve requests to join the group, et cetera. And then we have to press the yes button or the no button.

Starting point is 00:25:39 And so, yeah, it's flying through at an enormous rate. Just billions and billions of these are going through this line. And we're ejecting most of them. But we let the ones through that we think will provide that value. And so what we're doing is we're looking at how much value are you getting out of your page if we don't send the notification? How much value are you getting out of your page if we do? And then that gap, when that gap is large enough, then we'll send it. One area where, you know, which is kind of our niche is that we do this offline.

Starting point is 00:26:17 So there's plenty of amazing reinforcement learning libraries that work either with a simulator or they're meant to be real time, like the robot is learning while it's moving around. But in our case, it's like, you give us, you know, millions of experiences, and then we will look at all of them at once and then train a model. And then you can use this model for the next million experiences. I mean, just to give kind of a explain through an absurd example, let's say you had a Go playing AI. So like something AlphaGo would do. And let's say, you know, at any given time, if you were to just stop training, there's

Starting point is 00:26:55 a 99% chance you'd have a great player and a 1% chance you'd have a terrible player. Yeah. Well, that's not a really big deal for them because they'll just stop. If it's bad, they'll stop, you know, or they could just train two in parallel or something. Right. And now you have like a, you know, what is it like, like, I don't know, one over 10,000 chance of having a bad player. But for us, like we can't do that. Like, you know, if we stop and then it's a bad player, that player goes out to a billion people and it's going to be a whole day like that. And so a lot of academics haven't thought about that specific problem.

Starting point is 00:27:34 And that's something that our library does really well. And I also assume like AlphaGo can play against itself. You don't have an Atom with a page out there to manage, right? Like you have to learn actively against the real world, I guess. Yeah, that's right. We have to learn from our mistakes. There is no self-play. There's no Facebook simulator or anything like that. What AlphaGo does is it just does reinforcement learning. So it's just constantly trying to make the best, what it thinks is the best move and learn from that. What we do is, you know, we start from scratch and we say, can I copy the current, whatever generated this data, can I copy their strategy? And even if it's a model that we put out yesterday, we still take

Starting point is 00:28:24 the same approach, which is, can I at least copy this and be confident that I'm going to make the same decision given the same context? Once we have that, then we're safe. We say, okay, I am 99.9% confident that this system is equivalent to the one that generated the data. I could ship it. Then we'll start reinforcement learning. this system is equivalent to the one that generated the data. I could ship it, right? Then we'll start reinforcement learning. And then when we do reinforcement learning, it's going to start making decisions that where we don't really know what's going to happen, right?

Starting point is 00:28:54 And it'll just, as we train and train and train, it will deviate from what we call the production policy, right, whatever generated the data. It's going to deviate more and more from that. And the whole time it's deviating, we're measuring that deviation, right? At some point we can stop training and say, I don't know with a degree of certainty

Starting point is 00:29:20 that this model isn't worse than what we already have out there. Like I'm only 99% sure that this model isn't worse than what we already have out there. Like I'm only 99% sure that this model isn't worse. And so now I'm going to send it out there, or it could be 99.9, whatever threshold we pick, right? You know, the more confident you want to be that the model isn't worse, the less it's able to change from whatever's out there right now. And so you kind of have these two loops. Now you have the training loop, and then you have this second loop, which is

Starting point is 00:29:52 show the model to the real world, get new data, and then repeat that. Because of that, that second loop takes an entire day to do one revolution. We have models that we launched a year ago that are still getting better. Instead of capture the flags, this sounds more like simultaneous chess. The AI is playing chess against billions of people each day. And then at night, it analyzes its games and comes up with improved strategies and things it might try out next time. Actually, this makes me think about a documentary about AI. Something that comes to mind with all this. I don't know whether you want to answer this or not. What do you think of the movie, like the social dilemma? It seems like very relevant to what

Starting point is 00:30:34 we're talking about. Yeah. I mean, you know, I haven't seen it, but I've heard a lot of critiques and, um, you know, I know some of the folks who are in the movie, and I think there's a lot of truth to it. The part that I think, and again, I haven't seen it, so this is going to be, you know, take this with a grain of salt. But one thing that I noticed is missing, at least from a lot of these critiques, is there's this sort of assumption of guilt in terms of the intent. There's this idea that we have this sort of capitalist engine and it's just maximizing revenue all the time. And so because engagement drives revenue, then you're maximizing engagement all the time. The reality is it's not true.

Starting point is 00:31:22 In my experience, there isn't that much pressure on companies like Facebook and Google and Apple. I mean, there's pressure because there's things we want to do and everything. But there isn't this capitalist pressure on Facebook. It's not like the airline industry where they have razor razor thin margins right you know i do think that we're trying to find that way to really increase the value to provide real value to people and i think we do that the vast majority of facebook if you were to troll through the data, you know, it's basically people just acknowledging each other. Like the vast majority of comments are congratulations or something. That's probably the number one comment, right?

Starting point is 00:32:14 Once you adjust for language and everything. And so, and that is really what we're trying to optimize for are those good experiences, right? I don't work on the social science part of it. We try to optimize and we do it on good faith that the goals we're optimizing for are good faith goals, right? But I've been in enough of the meetings to see that the intent is really good intent. It's just a thing that's very difficult to quantify. But I do think that the intent is to provide that value. And I do think that they would trade, you know, some of the margin for the value in a heartbeat. Yeah, I mean, with all that said, you know,

Starting point is 00:32:59 I think it's important to, like, keep tabs of how much time you spend on your phone and just look at it and be honest with yourself. And I mean, this is true of everything. I mean, I'm not a TV watcher, but if I was, I would do the same thing there. And, uh, you know, I catch myself like, like I get really into a video game and next thing you know, I realize I'm spending like three, four hours a day on this video game and you could do the same thing with Facebook and everything else. It's good to have, have discipline there. And And you know, it's a massive machine learning engine that is doing everything it can to optimize whatever you try to optimize.

Starting point is 00:33:35 Right. So that part of the social dilemma is true. I just think the intent is a little bit misconstrued there. That's my personal take on it. I think social media is like fast food, like McDonald's fries are super delicious, best fast food, French fries. But if that's your only source of food, or if social media is your only form of social interaction, then that's going to be a problem. But I'm not sure we need a moral panic. Anyways, let's find out how the notifications project worked out. Yeah. So, you know, our goal going into it was to reduce the number of notifications while still keeping the same value. So there was sort of a measure of how much value we were providing to people,

Starting point is 00:34:17 as I said, based on how they interact with the website. And we reduced notifications like something like 20 or 30%. And we actually caused the value to increase slightly. And so that people are really excited by that. So did your performance reviews get better? Yeah. So, so just to take this timeline to its conclusion. So then that we ended up kind of having more and more success in this area of, I think it's technically

Starting point is 00:34:45 called re-engagement marketing. But basically, you know, how do you, you have someone who's already a big fan of Toyota. When do you actually send them that email saying, Hey, you know, maybe you should buy another Toyota or trade in or something. Right? Even though like our goal is actually not to drive engagement, it's really just to send fewer notifications. But at the end of the day, like the thing that we want to preserve is that value. And so we found that niche and we just kind of started democratizing that tech. And then at some point it became just too much for me to do by myself. So I didn't get fired. I'm still at Facebook.

Starting point is 00:35:33 They haven't fired me yet. There's that saying, you get hired for what you know, but fired for who you are. I think I put that one to the test. But yeah, the performance reviews got better and I switched to managing, which has been itself a really, really interesting experience. Jason succeeded. He got people to see that these problems were reinforcement problems

Starting point is 00:35:55 and he got them to use his project. It's been at least five years since he joined Facebook, but the News Feed team is starting to use Reagent. And yeah, it's open source. So check it out. Reading between the lines, it seems like this whole thing took its toll on Jason. The thing I realized is that for me, at least,

Starting point is 00:36:13 I kind of like reached the finish line. Like I always joke with people that like, this is the last big company I'm ever going to work for. And I kind of reached that finish line. And when you do that, you could try and find the next finish line, or you could kind of turn back around and, and help the next person in line. Right. And, and being a manager is my way to sort of do that. I mean, I'm still super passionate about the area. Like I'm not checked out or anything like that,

Starting point is 00:36:42 but you know, I'm done in terms of like their career race. I've hit my finish line. And so let me turn back around and just try and help as many people as I can, you know, over the wall. So that was the show. I'm going to try something new. If you want to learn a little bit more about Jason and about reagent, go to co-recursive.com slash reinforcement. I'm going to put together a PDF with some things covered in this episode. I haven't done it yet, so you'll have to bear with me. Jason can be found on Twitter at neuralnets4life.

Starting point is 00:37:14 That's the number four. He's very dedicated to neural nets. And he also has a podcast, which is great. And I'll let him describe. If you enjoy hearing my voice, you could check out our podcast. We don't talk that much about AI, but I have a podcast that I co-host with a friend of mine that I've known for a really long time, a podcast called Programming Throwdown.

Starting point is 00:37:36 And we talk about all sorts of different programming topics, everything from languages, frameworks. And we've had Adam on the show a couple of times. It's been really amazing. We've had some really phenomenal episodes. We talked about working from home together on an episode. So you can check me out there as well. Thank you to Jason for being such a great guest and sharing so much. And until next time, thank you so much for listening.

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Story: Reinforcement Learning At Facebook with Jason Gauci

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

CoRecursive: Coding Stories - Story: Reinforcement Learning At Facebook with Jason Gauci

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.