CoRecursive: Coding Stories - Story: Reinforcement Learning At Facebook with Jason Gauci
Episode Date: February 1, 2021If you ever wanted to learn about machine learning you could do worse than have Jason Gauci teach you. Jason has worked on YouTube recommendations. He was an early contributor to TensorFlow the open-s...ource machine learning platform. His thesis work was cited by DeepMind. But what I find so fascinating with Jason is he recognized this problem that was being solved the wrong way and set out to find a solution to it. So that's the show today. Jason is going to share his story. Links: ReAgent.ai Programming Throwdown Episode Bonus
Transcript
Discussion (0)
Okay, so before we get into it, why don't you state your name and what you do?
My name is Jason Gauci, and yeah, I bring machine learning to billions of people.
Hello, and welcome to Code Recursive, the stories behind the code.
I'm Adam Gordon-Bell.
Jason has worked on YouTube recommendations.
He was an early contributor to TensorFlow, the open source machine learning platform.
His thesis work was cited by DeepMind.
They were the people who beat all human players at Go and at StarCraft, I think, and who knows what else.
If you ever wanted to learn about machine learning, you could do worse than have Jason teach you.
But what I find so fascinating about Jason is he recognized this problem that was being solved the wrong way and set out to find a solution to it. The problem was making recommendations. You know,
like on Amazon, people who bought this book might like that book. He didn't exactly know
how to solve the problem, but he knew it could be done better. So that's the show today. Jason's
going to share his story, which will eventually change the way Facebook works. And we'll learn
about reinforcement learning and neural nets and just about the stress
of pursuing research at a large company. It all started in 2006 when Jason was in grad school.
Yeah, so I went to college, picked computer science, and I remember my parents were a little,
found that a little strange. They said, oh, you could be like a doctor or a lawyer or something,
like you have the brains for it.
And then at one point, my dad thought it was kind of like going to school to be a TV repairman.
And so he wasn't really sure. He's like, are you sure you really want to do this?
Like, you know, now I could just buy another TV or another computer if it breaks.
And to this day, I have to explain to people, I really don't know how to fix the computer.
I give this laptop broke right now. I just have to do the same thing my parents do
and just go get another one.
I have no idea.
But I had an option to do like a master's PhD hybrid
or basically do it all kind of in one shot.
And after two years, if I wanted to call it quits,
then I would get the master's degree.
Yeah, at the time, I thought I will just do the master's. I didn't really plan on getting a PhD, but actually the
very last class that I took in my master's was a class called neuro evolution, which was all about
trying to solve problems through neural networks and through evolutionary computation.
So America Online had this capture the flag game for free.
And I remember I downloaded it on a 56K modem.
It took forever.
And it was basically like a turn-based capture the flag where you played as one person and
there was a friendly AI for the other you know three players and then there was four
player enemy ai and you're trying to capture the flag and if the enemy touched you you're in jail
but the friendly ai could could bail you out of jail and i think i played this is you get to see
more and more of the of the ground as you travel like yeah that's right yeah yeah do you remember
the name of it so So the game is called
Capture the Flag.
If you've not played it,
you view a large field
with trees in it from overhead
and you can only see
where your players have been.
There's like a fog of war
like in Starcraft,
except it's turn-based.
You move a certain number of moves
and then your players freeze there
and the computer gets to take its turn
and move its players.
But for my Neuroevolution course,
my final project, I recreated this game, Capture the Flag,
and then I built AI for it using neuroevolution. And so just to sort of unpack that, you know,
neural networks are effectively like function approximators that are inspired by the way the
brain works. And so, you know, imagine if you imagine, you know,
graphing a function on your calculator, I'm sure everyone's kind of done this on their TI-85.
You can punch in, you know, Y equals X squared and it'll draw a little parabola on your TI-85
or whatever the, you know, calculator is nowadays. And so what a neural network will do is it will
look at a lot of data and it will, it can represent almost any
function. So if it's like your, your original like graph thing, it's like telling it when X is two,
Y is three, you're feeding it all these pairs. Exactly. Yep. Um, but memorizes them. Yep. But
because there's sort of contradictions and there's noise in the data and all of that,
you know, it won't, you won't tell it exactly like, you the data and all of that, you know, it won't,
you won't tell it exactly like, you know, force it to be, you know, Y is three when X is three,
but it's like a hint. You say, Hey, when X is three, Y is probably three. So if you're not
there, get a little bit closer to there. And you do this over and over again for so many different
Xs that you end up with some shape that, you know, won't pass through
every point. It's usually impossible, but it'll get, you know, close to a lot of the points.
This is basically back propagation. It's a form of supervised learning. You're training the neural
net by supervising it and telling it, you know, when it gets the wrong answer, what it should
have gotten instead. And to do this, you need to know what the right answer is so that you can
train it. And so that works great when you have a person going and telling you the perfect answer or the
right answer. But for puzzles and games, for example, you don't have that. So look at Go.
To this day, people haven't found the perfect Go game, a Go game for people who are playing
perfectly. And so you don't have that. And so you have to do something
different. You have to learn from experience. So you just say, look, this Go game, that's a really
good move. That's like better than any move we've ever seen at this point in the game.
It doesn't mean it's the best. It doesn't mean that your goal should be to always make that move,
but it's really good.
A simple way to do that is have a neural network and have it play a lot of go and then make a
subtle change to it and have it play a lot of go again and then say, okay, did that change make
this player win more games? If it did, then you keep the change. And if it didn't, then you throw
it away. And so if you do this enough times, you know, you will end up in what we call a local optimum.
In other words, you're making these small changes.
You're picking all the changes that make their go player better.
And eventually you just can't find a small change that makes the player better.
And so you could think of evolutionary computation
at a high level as doing something like that, but it's doing it a really large scale. So maybe you
have a thousand small changes and 500 of them make the player better. And you can adapt all 500 of
those different players and the existing players. So you can take all 501 of those
players and make a player that's stepwise, that's better in a big way. And you would just keep doing
that. So this is what Jason learned in his neuroevolution class. He would create all these
generations of players which had random changes and like evolution have them play capture
the flags against each other slowly breeding better and better players was there a moment where
where you tested out your algorithm like did you try to play it and capture the flags yeah the real
aha moment was you know having this sort of you know god's eye view without the fog of war because
i was just an observer and watching the AI and
specifically like watching this almost like wolf pack behavior where three, you know,
players would kind of surround a player and trap them.
You know, just seeing that thing that you've seen in nature just kind of emerge just organically.
That to me was amazing.
Like that was unbelievable.
I mean, when I saw all the players kind of converge
and capture and kind of do this methodical thing
and then take the flag and even, you know,
I think at one point like two of them had been captured
and so the other two just decided to go for the flag
and just forget of any strategy and just go for broke.
Did you watch it and like, you know, anthropomorphize?
Like, did you cheer for one team? yeah yeah yeah i did i mean you you naturally you want to cheer
for the underdog so so yeah you would see this scenario play out where they would chase after
one person you know even though there was four of them and only two defend two of the other team
they would chase after one and the other one would get the flag i didn't follow the strategy it's like one one runs and then yeah so one would
run and the other four would all chase after that one and then the second one would go and get the
flag and win uh it's like a decoy yeah but it would only happen when the ai was disadvantaged
you know so there was there was uh the way it worked was there's four players.
So there's a bunch of sensory information that was just repeated four times to make
the input of the network.
And I guess even though it's playing against itself, it kind of learned that when two of
those inputs are completely shut off, which is what happened when they were captured,
to then execute this Hail Mary strategy.
And yeah, it was just super fun to watch that play out.
And I would remember just sitting in the lab kind of cheering for this one person,
and they would try to come back.
In your head, it was kind of hard to know, can these people because it's a big grid can they get back quick enough to catch this
person so like it'd be pretty suspenseful and uh just seeing all of that just all encoded in this
network like neural like excitation back prop and all these things for like understanding what a
neural network is doing all this stuff hadn't been invented yet so it was just a black box and it was just magic i mean you
would run it on the university cluster who knows what it would do you know you would get it back
a few days later and you would just see all this amazing emergent behavior. That to me just really lit the spark. And so I actually was,
I'd already accepted a job with the intention of just getting the master's and leaving. I
didn't see anything that inspired me. But right there at the 11th hour, I took this course
and I said, this is amazing. I mean, the fact that it actually worked and it exploited things
that I would have never thought of, that really is what lit the spark. So based on this cool capture the flag experience,
Jason decides to do his PhD and he gets a fellowship. So the entire time since we've
been born all the way through to when I was doing my PhD, we were in this sort of neural network winter where people had given up on this idea of
function approximation. This fellowship came from the natural language processing faculty
member. So you have to have been nominated for the presidential fellowship. And I was nominated by
Dr. Gomez who worked in natural language processing. You know, he really vouched
for me. When he found out that I wanted to do neuroevolution, yeah, he was extremely disappointed.
I mean, he was particularly disappointed that I wanted to do this area that had no future.
Yeah, I remember him saying things like, basically, there's no knowledge in the neural network.
It's just a function.
That's effectively what his argument was. And yeah, I mean, he was a total unbeliever.
We know now that neural networks would have a resurgence, especially in supervised learning.
But Jason had no way to know that. He just knew that he was seeing really amazing results,
and that he was having a lot of fun. So I actually, I worked full time through the entire part of the PhD from starting from the
thesis. And so that was a wild experience. So I would wake up early in the morning.
I would take a look at what evolved last night in the cluster. If it didn't crash or anything,
I would either monitor it or if it was finished,
I would like look at some results.
I'd play a little bit of Counter-Strike at like 8 a.m.
because I found the like most polite people
were on early in the morning,
maybe all like working professionals or something.
I would work the entire day and then I would go home
and then I would see how the run was doing and everything.
I ended up doing that for something like three or four years.
This is around 2009. And Jason ended up developing a method called Hyperneed for his thesis.
So I graduated college. I worked an extra year. And then I ended up going to Google. I had a
friend who is a few friends who are at Google. And they said, hey, you have to come here. This
is amazing. You know, it's like It's like Hunger Games for nerds.
There's tons of nerds out there. I was like, okay, this sounds right up my alley. I did the interview.
I was like, oh, this is like paradise. There's a real chance to focus. I met tons of really smart
people. I thought this is absolutely amazing. But after about a year of that, that's when Andrew Ng joined Google and that's when deep learning really started to take off.
I kind of reached out to him and reached out to other people that were in the research kind of wing of Google.
And I said, I have this experience. I love neural nets.
I said, let's, you know, want to study this with you guys and make progress.
So I ended up transferring to research. So then when the deep learning kind of revolution really hit, I was working on a lot of that stuff.
So I was working really closely with the Google brain team.
And I wrote a chunk of what ended up becoming TensorFlow later.
I mean, I'm sure someone else rewrote it, but I wrote a really early version of parts of TensorFlow.
We actually, you know, the team I was on built the YouTube recommendations algorithm.
So figuring out kind of what, you know, when you watch YouTube video and on the right-hand
side, you get those recommended videos, you'll figure out what to put there.
And the whole time I thought, wow, this is all really reinforcement learning and decision
making.
You know, we're putting things on the screen
and we're using supervised learning to solve a problem that isn't really a supervised learning
problem. Traditional recommender systems try to learn what you like from your history.
Amazon knows that people with my shopping history would like certain books, so it ranks them by
probability and shows me the top five or something like that But what if recommendations were more like a game more like capture the flag?
The computer player shows me recommendations and its goal is to show me things that I'll end up buying
So it'll probably show me the top probability items
But occasionally it'll throw in some other things so that I can learn more about me if I buy those it will know something new
And it will be able to provide even better recommendations in the future.
Like in your Capture the Flag game, right?
There's like the fog of war.
There's like things you don't know.
And it's like, okay, so I'm on YouTube,
and you know that I like whatever computer file has like all their videos, right?
But you don't know about something else that I might like, right?
So it's like, you could just try to explore that space that is me,
try to throw something at me and see what happens.
Is that kind of the idea?
Yeah, exactly right.
I mean, that's called the explore-exploit dynamic.
So you can exploit, which means show the thing that you're most likely to engage with,
or you could explore.
You could say, well, maybe once per day,
we'll show Adam something totally random. If he interacts with it, then we've learned something
really profound, right? There's all these other times we could have shown you that same thing
that we didn't. And so that's really useful to know, right? And so yeah, that isn't captured by taking all these signals and then
just maybe adding them up and sorting. Like that's not going to expose that explore-exploit dynamic,
right? To expose that dynamic, you have to occasionally pick an item that has a really
low score. In other words, these recommender systems, they need to make decisions. They're
like AI bots, like a Go player, capture the Flag bot. It's a reinforcement learning problem, but nobody's approaching it
this way. It's not even totally clear how to do it. So yeah, I talked to this gentleman,
his name was Hussein Mahana, who's now leading AI at Cruise. And he was a director at Facebook.
And when I met Hussein, basically the gist of what he said is, you know, I don't really
know how to do it either, but I want you to come and figure it out.
That to me really kind of ignited something.
I kind of felt that passion to really push state of the art.
And it's something really transformational, right?
Because it is a control problem and nobody really knew how to
solve it using the technology that is designed to solve it. I just found that super appealing.
So I came to Facebook about five years ago with the intent of kind of cracking that.
Yeah, it's been a pretty wild ride. So was there a specific task or was it just... Initially, I was brought in
to basically rethink the way that the ranking system works at Facebook. So for people who don't
know, when you go to Facebook, whether it's on the app or the website, you see all these posts
from your friends, but they're not in chronological order. And actually a lot of people complain about that,
but it turns out being chronological order is actually terrible. And we do experiments and
it's horrible. It's just people are in a state of like, and I put myself in this category.
I was in a state of unconscious ignorance about chronological order. It sounds great. Like you
have your position and you can always just start there and go up. You never miss
a post, right? It fails when your cousin like posts about crochet 14 times a day, right? And
everyone has that one friend, right? And you just don't realize it because the posts aren't in
chronological order. So they thought, well, we could use reinforcement learning to fix all of
these cold start problems and all these other challenges that we're having.
You know, clickbait, all these things can be fixed in a very elegant way if you take a control theoretic approach versus if you're doing it by hand, it gets really complicated.
This is the recommender thing all over again.
If Facebook can kind of explore what your preferences are, it can learn more about you and give you a more valuable newsfeed. So Jason joins Facebook and
he spins up a small team, but things are a bigger challenge than he thought they would be.
So it's kind of interesting. It started off as me with a few contractors, so like short-term
employees. And these were actually extremely talented people in reinforcement learning, but they
worked for kind of like a consulting company.
Think of it as like a Deloitte or McKinsey or that type of thing.
And so they had no intention of being full-time at Facebook or anything like that.
And we worked on it and we just couldn't really get it to work.
And so after their contract expired,
we didn't renew it.
And I was a little bit lost
because I wasn't really sure how to get it to work either.
But I kept working on it on my own.
It was a really odd time
because I was starting to feel more and more guilty
because I came in as a person with all these years of experience from these prior companies.
And I was contributing zero back to the top line or the bottom line or any line.
I was just spending the company's money.
I realized that being sort of, you know, people joke about how nice it would be to be like
a lazy writer. Like you hear this stereotype of like the rest invest. So people, you heard about
this in the 90s, you know, people on Microsoft who had these giant stock grants that exploded
and they would just sit there and play solitaire until their stock expired or whatever.
I realized being a lazy rider, actually, it's
terrible. I mean, you really need to have kind of like a Wally mentality from Dilbert.
And in my case, I wasn't lazy. I mean, I was working super hard, but I was a rider in the
sense that I was being funded by an engine that I wasn't able to contribute to. And it felt terrible. You know,
even I didn't get good ratings and all of it was really tough. And I actually, at one point,
had like a heart to heart with my wife and I was kind of thinking, you know,
you know, should I quit this or should I keep trying to do it?
I really thought that it was going to work the whole time. I was really convinced that it would work.
And so what I decided to do is I decided to ask for permission to open source it.
And the reason was I felt like if they fired me,
I could keep working on it after I was fired.
And so they were totally fine open sourcing it.
My director didn't even, you know,
he didn't really, it wasn't really on his radar.
So he just said approved.
And so it got open source.
Because it wasn't contributing to any sort of bottom line.
So they were like, what?
Yeah, it was totally below the radar.
So it's just the way someone would like approve like a meal reimbursement or something, right?
It's just the word approved.
And then all of the code could go on GitHub.
This project is on GitHub right now
and at reagent.ai.
In retrospect, it seems like the project
might've been failing
because Jason was targeting the wrong team.
There's sort of this interesting kind of catch-22
where the teams that are really important
are also almost always under the gun.
And it's very hard for them to have real freedom to pursue something.
And so you end up with a lot of the contrarian people end up sort of on fringe teams.
And so there was a gentleman, his name was Shu.
Shu was on this team that was pretty out on the fringe.
They were notifying people who are owners of pages.
So for people who aren't familiar with Facebook, there's pages on Facebook, which are kind
of like storefronts.
So there's a McDonald's page, and there's a person or a team of people who own that
page.
They can have editorial rights and stuff like that.
And so these were notifications going out to people who own pages, basically informing them of their page. A page like McDonald's, there's things changing all the time. So if you were to
just notify everyone about everything, it would just blow up their phone.
What type of notifications would I get? Like what's going on with my page?
You know, you have a lot more likes this week than usual or fewer than usual. Yeah. There's 13 people want to join your group that you have to approve or not. So it's the same team that
does groups and pages, these kinds of things. Yeah. These are all things that theoretically
somebody wants to get, but if their page is busy, it's just way too much information. Yeah, exactly. And on the flip side, if their page is totally dead and one person joins
maybe every two months, it's probably just annoying to send it to them. Yeah. So yeah,
part of the challenge there is coming up with sort of that metric of like, what are we, what value are we actually providing
and how do you measure that without doing a survey or something like that? And so they were
using decision trees to figure out the probability that somebody would tap on the notification.
And if that probability was high enough, they would, they would show it. But, you know, at the
end of the day, they don't want people to just tap
on notifications. What they really wanted was to provide value. And so we could look at things like,
is your page growing? Are you taking care of people who want to join your group?
Are you actually going through? And if we sent you a notification, then are you going to go and
approve or reject these folks? And if we don't send you a notification, are you not going to go and approve or reject these folks and if we don't send you the notification
are you not going to because if you're going to do it anyways at four o'clock and we notify you
at 3 45 that's just annoying right because they would just end up optimizing for like the message
you'd be most likely to click on you that's not really what they care about right like just that
you're tapping these exactly yeah exactly a lot of us don't like getting notifications
for things that we would have either done anyways
or have no interest in.
I mean, we wanted a better mousetrap, right?
Facebook's actually not in the business
of just always sending people notifications.
They do have social scientists and other people
who are trying to come up with real value
and measuring that objectively.
It's not the newsfeed, but Jason has found a team who's willing to try his reinforcement
learning approach. Here, the action is binary. Send the message or don't.
So it's like, I have a page for this podcast, like co-recursive, and I do nothing with it.
And then so your reagent, it gets some sort of set of data, like here's stuff about Adam and
how he doesn't care about this page. And then, okay, we have this notification. What should we do? Is that the
type of problem? Yeah, pretty much. Imagine like an assembly line of those situations.
So there's billions of people and we have this sort of assembly line and it just has a context.
A person hasn't visited their page in 10 days.
Here's a whole bunch of other contexts.
Here's how often they approve requests to join the group, et cetera.
And then we have to press the yes button or the no button.
And so, yeah, it's flying through at an enormous rate.
Just billions and billions of these are going through this line. And we're ejecting most of them.
But we let the ones through that we think will provide that value.
And so what we're doing is we're looking at how much value are you getting out of your
page if we don't send the notification?
How much value are you getting out of your page if we do?
And then that gap, when that gap is large enough, then we'll send it.
One area where, you know, which is kind of our niche is that we do this offline.
So there's plenty of amazing reinforcement learning libraries that work either with a simulator or they're meant to be real time, like the robot is learning while it's moving
around. But in our case, it's like, you give us,
you know, millions of experiences, and then we will look at all of them at once and then train
a model. And then you can use this model for the next million experiences. I mean, just to give
kind of a explain through an absurd example, let's say you had a Go
playing AI.
So like something AlphaGo would do.
And let's say, you know, at any given time, if you were to just stop training, there's
a 99% chance you'd have a great player and a 1% chance you'd have a terrible player.
Yeah.
Well, that's not a really big deal for them because
they'll just stop. If it's bad, they'll stop, you know, or they could just train two in parallel
or something. Right. And now you have like a, you know, what is it like, like, I don't know,
one over 10,000 chance of having a bad player. But for us, like we can't do that. Like, you know,
if we stop and then it's a bad player, that player goes out to a billion people and it's going to be a whole day like that.
And so a lot of academics haven't thought about that specific problem.
And that's something that our library does really well.
And I also assume like AlphaGo can play against itself.
You don't have an Atom with a page out there to manage, right? Like you have to learn
actively against the real world, I guess. Yeah, that's right. We have to learn from our mistakes.
There is no self-play. There's no Facebook simulator or anything like that. What AlphaGo
does is it just does reinforcement learning. So it's just constantly trying to make the best, what it thinks is the best move and learn from that. What we do is, you know,
we start from scratch and we say, can I copy the current, whatever generated this data,
can I copy their strategy? And even if it's a model that we put out yesterday, we still take
the same approach, which is, can I at least copy this and be confident that I'm going to make the same decision given the same context?
Once we have that, then we're safe.
We say, okay, I am 99.9% confident that this system is equivalent to the one that generated the data.
I could ship it.
Then we'll start reinforcement learning. this system is equivalent to the one that generated the data. I could ship it, right?
Then we'll start reinforcement learning.
And then when we do reinforcement learning,
it's going to start making decisions that where we don't really know what's going to happen, right?
And it'll just, as we train and train and train,
it will deviate from what we call the production policy,
right, whatever generated the data.
It's going to deviate more and more from that.
And the whole time it's deviating,
we're measuring that deviation, right?
At some point we can stop training and say,
I don't know with a degree of certainty
that this model isn't worse
than what we already have out there.
Like I'm only 99% sure that this model isn't worse than what we already have out there. Like I'm only 99% sure that this model isn't worse.
And so now I'm going to send it out there,
or it could be 99.9, whatever threshold we pick, right?
You know, the more confident you want to be that the model isn't worse,
the less it's able to change from whatever's out there right now. And so you kind
of have these two loops. Now you have the training loop, and then you have this second loop, which is
show the model to the real world, get new data, and then repeat that. Because of that, that second
loop takes an entire day to do one revolution. We have models that we launched a year ago that are still getting
better. Instead of capture the flags, this sounds more like simultaneous chess. The AI is playing
chess against billions of people each day. And then at night, it analyzes its games and comes
up with improved strategies and things it might try out next time. Actually, this makes me think
about a documentary about AI.
Something that comes to mind with all this. I don't know whether you want to answer this or not.
What do you think of the movie, like the social dilemma? It seems like very relevant to what
we're talking about. Yeah. I mean, you know, I haven't seen it, but I've heard a lot of critiques
and, um, you know, I know some of the folks who are in the movie, and I think there's a lot of truth to it.
The part that I think, and again, I haven't seen it, so this is going to be, you know, take this with a grain of salt.
But one thing that I noticed is missing, at least from a lot of these critiques, is there's this sort of assumption of guilt in terms of the intent. There's this idea that we have this sort of capitalist engine and it's just maximizing
revenue all the time.
And so because engagement drives revenue, then you're maximizing engagement all the
time.
The reality is it's not true.
In my experience, there isn't that much pressure on companies like Facebook and Google and Apple.
I mean, there's pressure because there's things we want to do and everything.
But there isn't this capitalist pressure on Facebook.
It's not like the airline industry where they have razor razor thin margins right you know i do think that we're trying
to find that way to really increase the value to provide real value to people and i think we do that
the vast majority of facebook if you were to troll through the data, you know, it's basically people just acknowledging each other.
Like the vast majority of comments are congratulations or something.
That's probably the number one comment, right?
Once you adjust for language and everything.
And so, and that is really what we're trying to optimize for are those good experiences, right?
I don't work on the social science part of it.
We try to optimize and we do it on good faith that the goals we're optimizing for are good
faith goals, right? But I've been in enough of the meetings to see that the intent is really
good intent. It's just a thing that's very difficult to quantify. But I do think that
the intent is to provide that value. And I do think that they would trade, you know,
some of the margin for the value in a heartbeat. Yeah, I mean, with all that said, you know,
I think it's important to, like, keep tabs of how much time you spend on your phone
and just look at it and
be honest with yourself. And I mean, this is true of everything. I mean, I'm not a TV watcher,
but if I was, I would do the same thing there. And, uh, you know, I catch myself like, like I
get really into a video game and next thing you know, I realize I'm spending like three, four
hours a day on this video game and you could do the same thing with Facebook and everything else.
It's good to have, have discipline there. And And you know, it's a massive machine learning engine
that is doing everything it can to optimize whatever you try to optimize.
Right. So that part of the social dilemma is true. I just think the intent is a little bit
misconstrued there. That's my personal take on it. I think social media is like fast food, like McDonald's fries are super delicious,
best fast food, French fries. But if that's your only source of food, or if social media is your
only form of social interaction, then that's going to be a problem. But I'm not sure we need a moral
panic. Anyways, let's find out how the notifications project worked out.
Yeah. So, you know, our goal going into it was to reduce the number of notifications
while still keeping the same value.
So there was sort of a measure of how much value we were providing to people,
as I said, based on how they interact with the website.
And we reduced notifications like something like 20 or 30%.
And we actually caused the value to increase slightly.
And so that people are really excited by that.
So did your performance reviews get better?
Yeah. So, so just to take this timeline to its conclusion.
So then that we ended up kind of having more and more success in this area of,
I think it's technically
called re-engagement marketing. But basically, you know, how do you, you have someone who's
already a big fan of Toyota. When do you actually send them that email saying, Hey, you know, maybe
you should buy another Toyota or trade in or something. Right? Even though like our goal is actually not to drive engagement,
it's really just to send fewer notifications. But at the end of the day, like the thing that
we want to preserve is that value. And so we found that niche and we just kind of started
democratizing that tech. And then at some point it became just too much for me to do by myself.
So I didn't get fired.
I'm still at Facebook.
They haven't fired me yet.
There's that saying, you get hired for what you know, but fired for who you are.
I think I put that one to the test.
But yeah, the performance reviews got better and I switched to managing,
which has been itself a really, really interesting experience.
Jason succeeded.
He got people to see that these problems
were reinforcement problems
and he got them to use his project.
It's been at least five years since he joined Facebook,
but the News Feed team is starting to use Reagent.
And yeah, it's open source.
So check it out.
Reading between the lines,
it seems like this whole thing took its toll on Jason.
The thing I realized is that for me, at least,
I kind of like reached the finish line.
Like I always joke with people that like,
this is the last big company I'm ever going to work for.
And I kind of reached that finish line.
And when you do that,
you could try and find the next finish line, or you could kind of turn back around and,
and help the next person in line. Right. And, and being a manager is my way to sort of do that.
I mean, I'm still super passionate about the area. Like I'm not checked out or anything like that,
but you know, I'm done in terms of like their career race. I've hit my finish line. And so let me turn back around and
just try and help as many people as I can, you know, over the wall. So that was the show. I'm
going to try something new. If you want to learn a little bit more about Jason and about reagent,
go to co-recursive.com slash reinforcement. I'm going to put together a PDF with some things covered in this episode.
I haven't done it yet,
so you'll have to bear with me.
Jason can be found on Twitter
at neuralnets4life.
That's the number four.
He's very dedicated to neural nets.
And he also has a podcast,
which is great.
And I'll let him describe.
If you enjoy hearing my voice,
you could check out our podcast. We don't talk that much about AI, but I have a podcast that I co-host with a
friend of mine that I've known for a really long time, a podcast called Programming Throwdown.
And we talk about all sorts of different programming topics, everything from languages,
frameworks. And we've had Adam on the show a couple of times.
It's been really amazing. We've had some really phenomenal episodes. We talked about working from
home together on an episode. So you can check me out there as well.
Thank you to Jason for being such a great guest and sharing so much.
And until next time, thank you so much for listening.