Lex Fridman Podcast - Tuomas Sandholm: Poker and Game Theory
Episode Date: December 29, 2018Tuomas Sandholm is a professor at CMU and co-creator of Libratus, which is the first AI system to beat top human players at the game of Heads-Up No-Limit Texas Hold'em. He has published over 450 paper...s on game theory and machine learning, including a best paper in 2017 at NIPS / NeurIPS. His research and companies have had wide-reaching impact in the real world, especially because he and his group not only propose new ideas, but also build systems to prove these ideas work in the real world. Video version is available on YouTube. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, or YouTube where you can watch the video versions of these conversations.
Transcript
Discussion (0)
The following is a conversation with Thomas Sanhom.
He's a professor at Same You and co-creator of La Bradas,
which is the first AI system to be top human players in the game of heads-up no limit Texas Holdum.
He has published over 450 papers on Game Theory and Machine Learning,
including a best paper in 2017 at Nips.
Now renamed to New Rips, which is where I caught up with him for this conversation.
His research and companies have had wide reaching impact in the real world, especially because
he and his group not only proposed new ideas, but also build systems to prove that these
ideas work in the real world.
This conversation is part of the MIT course on artificial
general intelligence and the artificial intelligence podcast. If you enjoy it,
subscribe on YouTube, iTunes, or simply connect with me on Twitter at Lex
Friedman spelled F-R-I-D. And now here's my conversation with Thomas Sanhom. describe at the high level the game of poker Texas Holdham heads up Texas Holdham for people who might not be familiar
This card game. Yeah, happy to so heads up no limit Texas Holdham has really emerged in the AI community as a main benchmark
For testing these application independent algorithms for imperfect information game solving and this is a game
for imperfect information game solving. And this is a game that's actually played by humans.
You don't see that much on TV or casinos because,
well, for various reasons, but you do see it in some expert level casinos
and you see it in the best poker movies of all time.
It's actually an event in the World Series of Poker,
but mostly it's played online,
and typically for pretty big sums of money ja se on tullut katsotaan, mutta se on tullut on tullut ongelma.
Ja tullut on tullut ongelma, ja se on tullut ongelma,
mutta se on tullut ongelma, mutta se on tullut ongelma,
ja se on tullut ongelma, ja se on tullut ongelma,
ja se on tullut ongelma, ja se on tullut ongelma,
ja se on tullut ongelma, ja se on tullut ongelma,
ja se on tullut ongelma, ja se on tullut ongelma, ja se on tullut ongelma, ja se on tullan, mutta se on tosi, mutta se on koko koko. Ja se on haluan, että se on tosi, niin se on haluan, että se on haluan,
että se on haluan, että se on haluan, että se on haluan, että se on haluan,
että se on haluan, että se on haluan, että se on haluan, että se on haluan,
että se on haluan, että se on haluan, että se on haluan, että se on haluan,
että se on haluan, että se on haluan, että se on haluan, että se on haluan,
että se on haluan, että se on haluan don't know. Instead of pieces being nicely laid on the board
for both of us to see.
So in Texas Holdum,
there's two cards that you only see.
They come to you.
Yeah, and there is a gradually
out some cards that add up
overall to five cards that everybody can see.
Yeah.
The imperfect nature of the information
is the two cards that you're holding up front. Yeah. So as you said, you know, you first get two cards in private each and then
you, there's a betting round. Then you get three cards in public on the table, then there's
a betting round. Then you get the fourth card in public on the table, there's a betting
round. Then you get the fifth card on the table, there's a betting round. So there's a total
of four betting rounds and four tranches of information revelation, if you will. The only the first tranche
is private and then it's public from there. And this is probably probably by far the most
popular game in AI and just the general public in terms of imperfect information.
So that's probably the most popular spectator game to watch, right?
So which is why it's a super exciting game tackle.
So it's on the order of chess, I would say, in terms of popularity,
in terms of AI setting it as the bar of what is intelligence.
So in 2017,
Lebradas, how do you pronounce it? Lebradas. Lebradas.
Lebradas beats. Little Latin there. Little bit Latin.
Lebradas beats a few four experts, human players.
Can you describe that event? What you learned from it? What was it like?
What was the process in general for people who have not read the papers and study. So the event was that we invited four of the top 10 players, with these especially players in Hedges-Upno-Limit-Texas-Holen,
which is very important, because this game is actually quite different than the multiplayer version.
We brought them in to Pittsburgh to play at the reverse casino for 20 days
We wanted to get a hundred and twenty thousand hands in because we wanted to get statistical significance
So it's a lot of hands for humans to play even for these top pros who play very quickly normally
So we couldn't just have one of them play so many hands. 20 days, they were playing basically morning to evening
and I raised 200,000 as a little incentive for them to play
and the setting was so that they didn't all get 50,000.
We actually paid them out based on how they did
against the AI each.
So they had an incentive to play as hard as they could, whether they're way ahead or way behind
or right at the mark of beating the AI. And you don't make any
money, unfortunately. Right. No, we can't make any money. So
originally, a couple of years earlier, I actually explored
whether we could actually play for money, because that would
be, of course, interesting as as well to play against the top people
for money but the Pennsylvania gaming board said no. So if we couldn't. So this is much like
an exhibit like for a musician or a boxer or something like that. Nevertheless, you're keeping
track of the money and brought us one close to two million dollars I think. So if it was for real money, if you were able to earn money,
that was quite impressive and inspiring achievement.
Just a few details.
What were the players looking at?
I mean, were they behind a computer?
What was the interface like?
Yes, they were playing much like they normally do.
These top players, when they play this game,
they play mostly online. So they used to playing through UI. And they did the same thing here. So there was this layout,
you could imagine, there's a table on a screen, this, the human sitting there, and then there's the
AIC thing there, and the screen shows everything that's happening, the cards coming out, and so
the bets being made. And we also had the betting history for the human. So if the human forgot what had
happened in the hand so far, they could actually reference back and so forth.
Is there a reason they were given access to the betting history?
Well, we just, it didn't really matter. They wouldn't have forgotten anyway.
These are top quality people,
but we just wanted to put out there.
So it's not a question of a human for getting
and the AI somehow trying to get that advantage
of better memory.
So what was that like?
I mean, that was an incredible accomplishment.
So what did it feel like before the event?
Did you have doubt?
Hope?
Where was your confidence at?
Yeah, that's great.
So, great question.
So, 18 months earlier, I had organized a similar brain
versus AI competition with our previous AI called
Cloudical, and we couldn't beat the humans.
So, this time around, it was only 18 months later,
and I knew that this new AI Libratos was way stronger,
but it's hard to say how you'll do against the top humans before you try.
So I thought we had about a 50-50 shot.
And the international betting sites put us as a four to one or five to one underdog.
So it's kind of interesting that people really believe in people and over AI.
Not just people don't just believe,
over believing themselves, but they have over confidence
in other people as well compared to the performance of AI.
And yeah, so we were a 4 to 1, or 5 to 1 underdog.
And even after three days of beating the humans in a row,
we were still 50, 50 on the international betting sites.
Do you think there's something special and magical about poker in the way people think
about it?
In a sense, you have, I mean, even in chess, there's no Hollywood movies.
Poker is the star of many movies, and there's this feeling that certain human facial expressions and body language, eye movement, all these
towels are critical to poker.
Like you can look into somebody's soul and understand their bedding strategy and so on.
So that's probably why, possibly, do you think that is why people have a confidence that
humans will out, because AI systems cannot in this construct perceive these kinds of
tells. They're only looking at betting patterns and nothing else, the betting patterns and
statistics. So what's more important to you if you step back on human players, human versus human,
the back and human players, human versus human, what's the role of these tells of these ideas that we romanticize?
Yeah, so I'll split it into two parts.
So one is why do humans trust humans more than AI and all have overconfidence in humans?
I think that's not really related to the telequestion.
It's just that they've seen these top players how good they are and they're really fantastic.
So it's just hard to believe that the NEA could beat them.
So I think that's where that comes from.
And that's actually maybe a more general lesson about the AI that until you've seen it
overperformed a human, it's hard to believe that it could.
But then the tailors, a lot of these top players,
they're so good at hiding tales that among the top players it's actually not really worth it
for them to invest a lot of effort trying to find tales in each other because they're so good at
hiding them. So yes, at the kind of Friday evening game,
tell us how it's going to be a huge thing. You can read other people and if you're a good reader,
you'll read them like an open book. But at the top level, so poker now, the tells become a
less much, much smaller and smaller aspect of the game as you go to the top levels.
The amount of strategies, the amount of possible actions is very large, 10 to the power of 100 plus.
So there has to be some, I've read a few of the papers related, it has to form some abstractions of various hands and actions.
So what kind of abstractions are effective for the game of poker?
Yeah, so you were exactly right.
So when you go from a game tree that's 10 to the 161, especially in an imperfect information
game, it's way too large to solve directly, even with our fastest equilibrium finding algorithms.
So you want to abstract it first. And abstraction in games is much trickier than abstraction in MDPs or other single agent
settings.
Because you have these abstraction pathologies.
But if I have a finer-grained abstraction, the strategies that I can get from that for
the real game might actually be worse than the strategy I can get from the course-grained
abstraction.
You have to be very careful.
Now, the kinds of abstractions just to zoom out, we're
talking about there's the hands, abstractions, and then there's betting strategies.
Yeah, betting actions. Yeah, betting actions. So there's information abstraction,
talk about general games, information abstraction, which is the abstraction of what chance does.
And this would be the cards in the case of poker. And then there's action abstraction, which is the abstraction of what chance does. And this would be the cards in the case of poker.
And then there's action abstraction, which is abstracting the actions of the actual players, which would be bits in the case of poker.
Yourself and the other players?
Yes, yourself and other players. And for information abstraction, we were completely automated.
So these are algorithms,
but they do what we call potential where abstraction,
where we don't just look at the value of the hand,
but also how it might materialize in the good or bad hands over time.
And it's a certain kind of bottom-up process
with integer programming there and clustering and various aspects. How do you build this abstraction? potomapproses, jota progruja, kustering ja aspekia,
jota suurin suurin.
Ja sitten se on suurin.
Se on tärkeää, että se on tärkeää,
että se on tärkeää, että se on tärkeää,
mutta se on tärkeää, että se on tärkeää,
että se on tärkeää, että se on tärkeää, We actually use an automated action abstraction technology which is provably convergent
That it finds the optimal combination of exercises, but it's not very scalable
So we couldn't use it for the whole game, but we use it for the first couple of betting actions
So what's more important?
The strength of the hand, so the
information restriction or the
How you play them, the actions.
Does it, you know, the romanticized notion again, is that it doesn't matter what hands you
have, that the actions, the betting, maybe the way you win no matter what hands you have.
Yeah.
So that's why you have to play a lot of hands, so that the role of luck gets smaller.
So you could otherwise get lucky and get some good hands
and then you're gonna win the match.
Even with thousands of hands, you can get lucky.
Because there's so much variance in no limit Texas hold them.
Because if we both go all in, it's a huge stack of variance.
So there are these massive swings in no limit Texas hold them.
So that's why you have to play not
just thousands, but over a hundred thousand hands to get statistic or significance.
Let me ask another way this question. If you didn't even look at your hands, but they
didn't know that the opponents didn't know that how well would you be able to do? That's
a good question. There's actually, I heard the story that there's this Norwegian female poker player
called Enet Oberstad, who's actually won a tournament by doing exactly that.
But that would be extremely rare.
So, you cannot really play well that way.
But okay.
So the hands do have some role to play.
Yes.
So, the brothers does not use, as far as I understand,
they use learning methods, deep learning.
Is there room for learning in,
there's no reason why LeBroadist doesn't,
you know, combine with an AlphaGo type approach
for estimating the quality for function estimator.
What are your thoughts on this? Maybe as compared to another algorithm, which I'm not
that familiar with deep stack, the engine that does use deep learning that is unclear how well it
does, but nevertheless uses deep learning. So what are your thoughts about learning methods to aid
in the way that telebrotus plays the game of poker?
Yeah, so as you said, Libratos did not use learning methods and played very well without them.
Since then, we have actually, actually here, we have a couple of papers on things that do use learning techniques.
Saxon.
So, and deep learning in particular.
And sort of the way you're talking about where it's learning an evaluation function.
But in imperfect information games, unlike, let's say, in Go or now also in chess and showby,
it's not sufficient to learn an evaluation for a state,
because the value of an information set depends not
only on the exact state, but it also depends on both players' beliefs. Like, if I have a bad hand,
I'm much better off if the opponent thinks I have a good hand. And vice versa, if I have a good hand, I'm much better off if the opponent
believes I have a bad hand. So the value of a state is not just a function of the cards. It depends on
if you will the path of play, but only to the extent that it's captured in the belief distributions.
So that's why it's not as simple as as it is in perfect information games.
And I don't want to say it's simple there either.
It's of course very complicated computationally there too.
But at least conceptually it's very straightforward.
There's a state, there's an evaluation function,
you can try to learn it.
Here you have to do something more.
And what we do is in one of these papers,
we're looking at allowing,
where we allow the opponent to actually take
different strategies at the leaf of the search tree,
as if you will.
And that is a different way of doing it.
And it doesn't assume, therefore,
a particular way that the opponent plays.
But it allows the opponent to choose
from a set of different continuation strategies.
And that forces us to not be too optimistic in our look ahead search.
And that's one way you can do sound look ahead search in imperfect information games,
which is very difficult.
And you were asking about deep stack what they did,
it was very different than what we do,
either in Libratos or in this new work.
They were randomly generating various situations
in the game, then they were doing the look ahead
from there to the end of the game
as if that was the start of a different game.
And then they were using deep learning
to learn those values of those states,
but the states were not just
the physical states, they include the belief distributions.
When you talk about look ahead for Deep Stack or with LeBrotus, does it mean considering
every possibility that the game can evolve?
Are we talking about extremely sort of like this exponentially growth of a tree?
Yes.
So we're talking about exactly that.
Much like you do in alpha beta search or on the crawl to
research, but with different techniques. So there's a different search algorithm
and then we have to deal with the leaves differently.
So if you think about what the Liberados did, we didn't have to worry about this
because we only did it at the end of the game.
So we would always terminate into a real situation and we would know what to pay out this.
It didn't do these depth limited lookaheads but now in this new paper which is called depth
limited, I think it's called depth limited search for imperfect information games, we can actually do
sound depth limited lookaheads so we can actually start to do the look ahead
from the beginning of the game on.
Because that's too complicated to do for this whole long game.
So in Liberados, we were just doing it for the end.
And then the other side, this belief distribution,
is it explicitly modeled what kind of beliefs
that the opponent might have?
Yeah, it is explicitly modeled, but it's not assumed.
The beliefs are actually output, not input.
Of course, the starting beliefs are input,
but they just fall from the rules of the game,
because we know that the dealer deals uniformly from the dick.
So I know that every pair of cards that you might have is equally likely. I know that for a fact, that just follows from the rules of the dick. So I know that every pair of cards that you might have is equally likely.
I know that for a fact. That just follows from the rules of the game. Of course, except
the two cards that I have, I know you don't have those. You have to take that into a card.
That's called card removal, and that's very important.
Is the dealing always coming from a single deck in a heads up? So you can assume single
deck. So is it possible? You know that if I have the Ace of Spades,
I know you don't have an Ace of Spades.
Great.
So in the beginning, your belief is basically
the fact that it's a fair dealing of hands,
but how do you start to adjust that belief?
Well, that's where this beauty of games theory comes.
So Nash equilibrium, which John Nash introduced in 1950, introduces what rational play is
when you have more than one player.
And these are pairs of strategies where strategies are contingency plans, one for each player,
so that neither player wants to deviate a different strategy, given that the other doesn't deviate. But as a side
effect, you get the beliefs from page rule. So, National Equilibrium really isn't just
deriving in these imperfect information games. National Equilibrium, it doesn't just define strategies,
it also defines beliefs for both of us. And it defines beliefs for each state. So each state, each, if they
call information sets, at each information set in the game, there's a set of different
states that we might be in, but I don't know which one we're in.
Nacically, Brune tells me exactly what is a probability distribution over those real world states in my mind? How does Nashically give you that distribution?
So why?
I'll do a simple example.
So you know the game RockPace paper scissors.
So we can draw it as player one moves first and then player two moves.
But of course, it's important that player two doesn't know what player one moved.
Otherwise player two would win every time.
So we can draw that as an information set where player
one makes one of three moves first and then there's an information set for player
two. So player two doesn't know which of those nodes the world is in. But once
we know the strategy for player one, Nash equilibrium will say that you play one
third rock, one third paper, one third scissors.
From that, I can derive my beliefs on the information set that they're one third, one third, one third.
So Bayes gives you that.
Bayes gives you.
But is that specific to a particular player or is it something you quickly update with those specific players?
Game theory isn't really player-specific. So that's also why we don't really players specific.
So that's also why we don't need any data.
We don't need any history how these particular humans
played in the past or how any AI or human had played before.
It's all about rationality.
So we just think, the AI just thinks about
what would a rational opponent do?
And what would I do if I were, I am rational
and that's that's
the idea of game theory. So it's really a data free opponent free. So it comes from the
design of the game as opposed to the design of the player. Exactly. There's no opponent
modeling per se. I mean, we've done some work on combining opponent modeling with game
theory. So you can exploit weak players even more but that's another strand and in Libraudus we can turn that on because I decided that these
players are too good and when you start to exploit an opponent you typically open yourself up
self up to exploitation and these guys have so few holes to exploit and they're world's leading
experts in counter exploitation so I decided that we're not going to attend that stuff. Actually, I saw a few papers in my exploiting opponents.
It sounded very interesting to explore. Do you think there's room for exploitation? Generally,
outside of LeBrotus, is there subject or people differences that could be exploited, maybe
not just in poker, but in general interactions
and negotiations, all these other domains that you're considering.
Yeah, definitely.
We've done some work on that, and I really like the work that hybridizes the two.
So you figure out what would a rational opponent do.
And by the way, that's safe in these zero-sum games, two players zero-sum games, because
if the opponent does something irrational, yes, it might throw off my beliefs. Ja se on saa seuraavaa seuraavaa, että se on oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman oman o But still, if somebody is weak as a player, you might want to play differently to exploit
them more.
So you can think about it this way, a game theoretic strategy is unbeatable, but it doesn't
maximally beat other opponents.
So the winnings per hand might be better with a different strategy.
And the hybrid is that you start from a game theoretical approach. And then as you gain data from the
about the opponent, in certain parts of the game tree, then
in those parts of the game tree, you start to tweak your
strategy more and more towards exploitation, while still
staying fairly close to the game theory strategy, so as to
not open yourself up to exploitation too much.
How do you do that? Do you try to vary up strategies, make it unpredictable?
It's like, what is it?
Tid for task strategies in prisoners dilemma or?
Well, that's a repeated game, kind of simple prisoners
to lemma, repeated games.
But even there, there's no proof that says that that's a best thing
But experimentally it actually does does well. So what kind of games are there?
First of all, I don't know if this is something that you could just summarize. There's perfect information games
We're all the information's on the table. There is imperfect information games. There's repeated games that you play over and over
there's zero-sum games.
There's non-zero-sum games. And then there's a really important distinction you're making
two-player versus more players. So what are what other games are there? And what's the difference,
for example, with this two-player game versus more players?
And what are the key differences?
Right, yeah.
So let me start from the basic.
So a repeated game is a game where the same exact game
is played over and over.
In these extensive form games,
where you think about three form,
maybe with these
information says to represent incomplete information you can have kind of
repetitive interactions and even repeated games are a special case of that
by the way but the game doesn't have to be exactly the same. It's like in sourcing
options. Yes we're gonna see the same supply base year to year but what I'm
buying is a little different every time and the supply base is a little different every time and so on.
So it's not really repeated. So to find a purely repeated game is actually very rare in the world.
So they're really a very coarse model of what's going on. Then if you move up from repeated matrix games,
not all the way to extensive form games,
but in between there's stochastic games,
where you know, you think about it like these little matrix games,
and when you take an action and your own text and action,
they determine not which next state I'm going to,
next game I'm going to, but the distribution over next games, where I might be going to. ajan, se on tullut, joten ne on näyttänyt, että näyttänyt, mutta
distribuus on näyttänyt, joten ne on näyttänyt, joten ne on
stokastikkiin, mutta se on, että matriuskinkin on
stokastikkiin, yksi kertoo kertoo kertoo kertoo kertoo kertoo,
että on, että on, että on, että on, että on, että on, että on,
että on, että on, että on, että on, että on, että on, että on, että on,
että on, että on, että on, että on, että on, että on, että on, että on, And poker is an example of the last one. So it's really in the most general setting, extensive form games.
And that's kind of what the AI community has been working on and being benchmarked on
with this heads up, no limit, Texas hold them.
Can you describe extensive form games?
What's the model here?
Yeah, so if you're basically with the tree form, so it's really the tree form.
Like in chess, there's the search tree versus a matrix.
There is a matrix here.
And the matrix is called the matrix form or pi matrix form or normal form game.
And here you have the tree form.
So you can actually do certain types of reasoning there that you lose the information
when you go to normal form.
There's a certain form of equivalence.
Like if you go from tree form and you say it every
possible contingency plan is a strategy, then I can actually go back to the normal form, but I
lose some information from the lack of sequentiality. Then the multiplayer versus two-player distinction
is an important one. So two-player games in zero-sum are conceptually easier and computationally easier. The
still huge like this one, but they're conceptually easier and computationally
easier. In that conceptually you don't have to worry about which equilibrium is
the other guy going to play when they're a multiple because any equilibrium
strategy is the best response to any other when they're a multiple, because any equilibrium strategy
is the best response to any other equilibrium strategy.
So I can play a different equilibrium from you, and we'll still get the right values of the game.
That falls apart, even with two players when you have a general sum games.
Even without cooperation.
Even without cooperation.
So there's a big gap from two players, zero sum, to two players, general sum, or even to three players, zero sum. That's a big gap from two players, zero sum, two player general sum, or even to three players, zero sum.
That's a big gap.
At least in theory.
Can you maybe not mathematically provide the intuition why it all falls apart with three or more players?
It seems like you should still be able to have a Nash equilibrium that...
Yeah, that's instructive that holds.
Okay.
It is true that all finite games have a Nash equilibrium.
So this is what your Nash actually proved.
So they do have a Nash equilibrium.
That's not the problem.
The problem is that there can be many.
And then there's a question of which equilibrium
to select. So, and if you select your strategy from a different equilibrium and I select mine,
then what does that mean? And in these non-zero-some games, we may lose some joint benefit by being
simply stupid. We could actually both be better off if we did something else.
And in three players, you get other problems also like collusion.
Like maybe you and I can get up on a third player and we can do radically better by colluding.
So there are lots of issues that come up there.
So no brown, the student you've worked with on this has mentioned. I looked through the AMA on Reddit.
He mentioned that the ability of poker players
to collaborate would make the game.
Here's the question of how would you make the game of poker,
or both of you were asked the question.
How would you make the game of poker
beyond being solvable by current AI methods?
And he said that there's not many ways of making poker
more difficult, but a collaboration or cooperation between players would make it extremely difficult.
So can you provide the intuition behind why that is if you agree with that idea?
Yeah, so I've done a lot of work,
collisional games, and we actually have a paper here
with my other student, Cabrilla,
Fathering and I, some other collaborators
at Nips on that, actually just came back
from the post-session where we presented it.
But so when you have a collusion,
it's a different problem, and it typically gets even harder
than even the game representations.
Some of the game representations don't really allow what a computation.
So we actually introduced a new game representation for that.
Is that kind of cooperation part of the model?
Do you have information about the fact that other players are cooperating or is it just this chaos that
We're nothing is known so so there's some some things unknown can you give an example of a collusion type game
Or is it used so like bridge?
Yeah, so think about bridge. It's like when you and I are on a team
Our payoffs are the same
The problem is that we can't talk so so when I get my cards, I can't whisper to you what my cards are.
That would not be allowed.
So we have to somehow coordinate our strategies ahead of time and only ahead of time.
And then there are certain signals we can talk about, but they have to be such that the
other team also understands that.
So that's an example where the coordination is already built into the rules of the game.
But in many other situations like auctions or negotiations or diplomatic relationships poker,
it's not really built in, but it still can be very helpful for the colloders.
I've read, you read somewhere, the negotiations, you
come to the table with prior, like a strategy that you're willing to do and not willing
to do those kinds of things. So how do you start to, now moving away from
poker, moving beyond poker into other applications, like negotiations, how do you start applying
this to other domains,
the real world domains that you've worked on?
Yeah, I actually have two startup companies doing exactly that.
One is called strategic machine,
and that's for business applications,
gaming, sports, all sorts of things like that.
Any applications of this to business and to sports,
and to gaming, to various types of things in finance,
electricity markets and so on.
And the other is called strategy robot, where we are taking these to military cybersecurity
and intelligence applications. I think you worked a little bit in how do you put it, advertisement, sort of suggesting
ads kind of thing.
Yeah, that's another company, optimized markets.
Optimized markets.
But that's much more about a combinatorial market and optimization based technology.
That's not using these game theory, decreasing technologies.
I see. OK, so what high level do you think about our ability to use game theory
at a concept to model human behavior?
Do you think human behavior is amenable to this kind of modeling?
So outside of the poker games, and where have you
seen it done successfully in your work?
I'm not sure. The goal really is modeling humans. Like for example, if I'm playing a zero-sum
game, I don't really care that the opponent is actually following my model of rational
behavior because if they're not, that's even better for me. Right. So, so see with the opponents and games, there's a, the prerequisite is that you
formalize the interaction in some way that can be amenable to analysis. I mean, you've
done this amazing work with mechanism design, designing games that have certain outcomes. But so I'll tell you an example for my
for my world of autonomous vehicles, right? We're studying pedestrians and pedestrians and cars
negotiate in this nonverbal communication. There's a weird game dance of tension where pedestrians
are basically saying, I trust that you won't kill me.
And so as a J Walker, I will step onto the road,
even though I'm breaking the law, and there's this tension.
And the question is, really don't know how to model that well
in trying to model intent.
And so people sometimes bring up ideas of game theory
and so on.
Do you think that aspect of human behavior can use these kinds of imperfect
information, approaches, modeling, how do you start to attack a problem like that when you don't
even know how to design the game to describe the situation in order to solve it? Okay, so I haven't
really thought about J-walking, but one thing that I think
could be a good application in autonomous vehicles is the following. So let's say that
you have fleets of autonomous cars operating by different companies. So maybe here's the
way more fleet and here's the Uber fleet. If you think about the rules of the road, they
define certain legal rules, but that still leaves a huge strategy space open.
Like, as a simple example, when cars merge, you know, how humans merge, you know, they
slow down and look at each other and try to merge.
Wouldn't it be better if these situations would all be pre-negotiated so we can actually
merge at full speed and we know that this is a situation, this is how we do it, and it's all going to be faster. Sitten on tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut tullut And of course it might be that hey, maybe you're not gonna always let me go first.
Maybe you said, okay, well in these situations I'll let you go first, but in exchange you're gonna give me to Amazon,
you're gonna let me go first in these situations. So it's this huge combinatorial negotiation.
And do you think there's room in that example of merging to model this whole situation in perfect information game,
or do you really want to consider it to be a perfect? No, that's a good question. Yeah, that's a good question
Do you pay the price of
Assuming that you don't know everything
Yeah, I don't know it's certainly much easier games with perfect information are much easier
So if you can get away with it
are much easier. So if you can get away with it, you should. But if the real situation is of imperfect information, then you're going to have to deal with imperfect information.
Great. So what lessons have you learned, the annual computer poker competition,
an incredible accomplishment of AI? You know, you look at the history of Deep Blue, AlphaGo, these kind of moments when AI stepped up in an engineering
effort and a scientific effort combined to beat the best of human players. So what do you take
away from this whole experience? What have you learned about designing AI systems that play these
kinds of games? And what does that mean for AI in general for the future of AI development?
Yeah, so that's a good question. There's so much to say about it. I do like this type of
performance-oriented research. Although in my group we go all the way from like idea to theory,
to experiments, to big system, building to commercialization. So we spend that spectrum,
but I think that in a lot of situations in AI,
you really have to build the big systems and evaluate them at scale
before you know what works and doesn't.
And we've seen that in the computational game theory community,
that there are a lot of techniques that look good in the small,
but then they see it to look good in the large.
And we've also seen that there are a lot of techniques that look superior in theory. And I really mean in terms of convergence
rates better, like first sort of methods, better convergence rates like the CFR based algorithms,
yet the CFR based algorithms are the first fastest in practice. So it really tells me that you have
to test this in reality. The theory isn't
tight enough if you will to tell you which algorithms are better than the others. And you have to
look at these things that in the large, because any sort of projections you do from the small
can at least in this domain be very misleading. So that's kind of from a kind of science and
engineering perspective, from a personal perspective, it's been just a wild experience in that with the first poker
competition, first brains versus AI, man machine poker competition that we organized.
There had been, by the way, for other poker games, there had been previous competitions,
but this was, for heads up, no limit, this was the first.
And I probably became the most hated person in the world of poker.
And I didn't mean to.
I sized that for cracking the game for...
Yeah, it was...
A lot of people felt that it was a real threat to the whole game.
The whole existence of the game.
If AI becomes better than humans, People would be scared to play poker
because there are these superhuman
AI's running around taking their money
and all of that.
So I just, it's just really aggressive.
The comments were super aggressive.
I got everything just short of death threats.
You think the same was true for chess?
Because right now they just completed
the World Championships in chess
and humans just started ignoring the fact that there's AI systems now that I
perform humans and they still enjoy the game is still a beautiful game.
That's what I think and I think the same thing happens in poker and so I
didn't think of myself as somebody was going to kill the game and I don't think I
did. I've really learned to love this game. I wasn't the poker player before but
learned so many nuances about it from these AI's and they've really learned to love this game. I wasn't a poker player before, but learned so many nuances about it from these AIs.
And they've really changed how the game is played by the way. So they have these very Martian ways of playing poker,
and the top humans are now incorporating those types of strategies into their own play.
So if anything, to me, our work has made poker a richer, more interesting game for him as to play.
Not something that is going to steer him as away from it entirely.
Just a quick comment on something you said, which is, if I may say so, in academia is a little bit rare sometimes.
It's pretty brave to put your ideas to the test in the way you described.
Saying that sometimes good ideas don't
work when you actually try to apply them at scale. So where does that come from? I mean, what if you
could do advice for people, what drives you in that sense? Will you always this way? I mean, it
takes a brave person, I guess is what I'm saying, to test their ideas and to see if this thing actually works against human top human players and so on.
I don't know about brave, but it takes a lot of work. It takes a lot of work and a lot of time to organize,
to make something big and to organize an event and stuff like that.
And what drives you in that effort? Because you could still, I would argue, get a best paper award at Nips as
you did in 17 without doing this. That's right. Yes. And so, so in general, I believe it's very
important to do things in the real world and at scale. And that's really where the
the pudding, if you will, proves in the pudding. That's where it is. In this particular
case, it was kind of a competition between different groups, and for many years, as though
who can be the first one to beat the top humans at heads up, no limit takes us hold them.
So it became kind of a competition who can get there.
Yeah, so a little friendly competition
could be, I could do wonders for progress.
Yes, absolutely.
So the topic of mechanism design,
which is really interesting, also kind of new to me,
except as an observer of, I don't know, politics and any,
I'm an observer of mechanisms,
but you're writing your paper
and automated mechanisms of design that I quickly read.
So, a mechanism of design is designing the rules of the game so you get a certain desirable
outcome.
And you have this work on doing so in an automatic fashion as opposed to fine tuning it. So what have you learned from those efforts,
if you look, say, I don't know, at complex,
like our political system,
can we design our political system
to have in an automated fashion,
to have outcomes that we want,
can we design something like traffic lights,
to be smart, where it gets outcomes that we want.
So what are the lessons that you draw from that work?
Yeah, so I still very much believe in the automated mechanism design direction.
Yes.
But it's not a panacea.
There are impossibility results in mechanism design saying that there is no mechanism that accomplishes
objective X in class C. So there's no way using any mechanism design tools, manual or automated
to do certain things in the mechanism design. You can describe that again, so meaning there is some possible to achieve that. Yeah, yeah. So that's unlikely.
Impossible.
Impossible.
So these are not statements about human ingenuity,
who might come up with something smart.
These are proofs that if you want to accomplish properties
X in class C, that is not doable with any mechanism.
The good thing about automated mechanism design
is that we're not really
designing for a class, we're designing for specific settings at a time. So even if
there's an impossibility result for the whole class, it just doesn't mean that all of the
cases in the class are impossible. It just means that some of the cases are impossible.
So we can actually carve these islands of possibility within these
known impossible classes. And we've actually done that. So what one of the famous results
in mechanism design is the Myers-San-Seth Way theorem for bio-Rodger Myers and Mark Seth Way
from 1983. It's an impossibility of efficient trade under imperfect information. We show that
you can in many settings,
avoid that and get efficient trade anyway.
Depending on how you design the game, okay?
So depending how you design the game,
and of course, it's not, it doesn't,
in any way, anyway contradict the impossibility result.
The impossibility result is still there,
but it just finds spots within this impossible class
where in those spots, you don't have the impossibility.
Sorry if I'm going a bit philosophical, but what lessons do you draw?
I was like I mentioned politics or human interaction and designing mechanisms for outside of just these kinds of trading or auctioning or purely formal games are human interaction
like a political system.
What, do you think it's applicable to politics or to business, to negotiations, these kinds
of things, designing rules that have certain outcomes.
Yeah, I do think so. Have you seen success that successfully done?
Yeah, it hasn't really... Oh, you mean mechanism design or automated mechanism design?
Automated mechanism design.
So mechanism design itself has had fairly limited success so far.
There are certain cases, but most of the real world situations are actually not sound
from a mechanism design perspective, even in those cases where they've been designed
by very knowledgeable mechanism design people, the people are typically just taking some insights
from the theory and applying those insights into the real world, rather than applying the
mechanisms directly.
So one famous example of is the FCC Spectrum auctions.
So I've also had a small role in that,
and very good economists have been,
excellent economists have been working on that
with no game theory,
yet the rules that are designed in practice there,
they're such that bidding truthfully is not the best strategy.
Usually mechanism design we try to make things easy for the participants.
So telling the truth is the best strategy. And by the way, nobody
knows even a single optimal bidding strategy for those auctions.
What's the challenge of coming up with an optimal bid? Because there's a lot of players
and there's in person. There's not so much that a lot of players, but many items for sale
and these mechanisms are such that even with just two items or one item bidding truthfully wouldn't
be the best strategy.
If you look at the history of AI, it's marked by seminal events and an AlphaGo beating
a world champion, human go player, I would put LeBronis winning the heads of no limit, hold
them as one of such events.
And what do you think is the next such event, whether it's in
your life or in the broadly AI community that you think might be out there that would surprise
the world? So that's a great question and I really know the answer. In terms of game solving,
know the answer. In terms of game solving, it's a problem, it really was the one remaining widely agreed upon benchmark. So that was the big milestone. Now, are there other things?
Yes, certainly there are, but there is not one that community has kind of focused on.
So what could be other things? There are groups working on Starcraft, there are groups working on
Dota 2 with these are video games, or you could have like Diplomacy or Hanabi, you know,
things like that. These are like recreational games, but none of them are really acknowledged
as kind of the main next challenge problem like chess or go or hedge up no limit Texas Holden was so I don't really know in the game solving space what is or what will be the next benchmark.
I hope kind of hope that there will be a next benchmark because really the different groups working on the same problem really drove these application independent techniques forward very quickly over 10 years. Do you think there's an open problem that excites you that you start moving away from games into
real-world games like say the stock market trading?
Yeah, that's that's kind of how I am so I am probably not going to work
as hard on these
recreational benchmarks. I'm doing two startups on game-solving
technology, strategic machine and strategy robot and we're really interested
in pushing this stuff into practice. What do you think would be really you know a
powerful result that would be surprising? That would be, if you can say, I mean, it's, you know,
five years, ten years from now, something that's statistically, you would say, is not
very likely, but if there's a breakthrough, what achieve?
Yeah, so I think, overall, we're in a very different situation in game theory than we are in, let's say, machine learning.
So in machine learning, it's a fairly mature technology and it's very broadly applied and proven success in the real world.
In game solving, there are almost no applications yet.
We have just become superhuman, which machine learning you could argue happened in the 90s, if not earlier,
and at least on supervised learning, certain complex supervised learning applications.
Now, I think the next challenge problem, I know you're not asking about this way, you're asking about the technology breakthrough,
but I think that big, big breakthrough is to be able to show that, hey, maybe most of, let's say, military planning or most of business strategy will actually be done strategically
using computational game theory.
That's what I would like to see as a next 5 or 10-year goal.
Maybe you can explain to me, again, forgive me if this is an obvious question, but machine
learning methods and your own networks are suffer from not being transparent, not being explainable.
I game theoretic methods, you know, Nash or Kulibria, do they generally, when you see the different solutions,
are they, when you talk about military operations, are they, once you see the strategies, do they make sense,
are they explainable, or do they suffer from the same problems as neural networks do. So that's a good question. I would say a little bit yes and no.
And what I mean by that is that these game-threatic strategies, let's say,
Nash equilibrium. It has provable properties.
So it's unlike, let's say, deep learning where you kind of cross your fingers,
hopefully it'll work. And then after the fact, when you have the weights,
you're still crossing your fingers and, hopefully it'll work. And then after the fact when you have the weights, you're still crossing your fingers,
and I hope it will work.
Right.
Here, you know that the solution quality is there.
This provable solution quality guarantees.
Now, that doesn't necessarily mean
that the strategies are human understandable.
That's a whole other problem.
So I think at deep learning and computational game theory
are in the same boat in that sense,
that both are difficult to understand.
But at least the game theory techniques,
they have these guarantees of solitude and quality.
So do you see business operations,
strategic operations, or even military in the future being,
at least the strong candidates being proposed by automated systems.
Do you see that?
Yeah, I do.
I do, but that's more of a belief than a substantiated fact.
Depending on where you land, optimism or pessimism, that's a really, to me, that's an exciting future,
especially if there's provable things in terms of optimality.
So looking into the future, there's a few folks worried about the, especially you look
at the Game of Poker, which is probably one of the last benchmarks in terms of games being
solved.
They worry about the future and the existential threats of artificial
intelligence. So the negative impact in whatever form on society. Is that something that concerns you
as much or you more optimistic about the positive impacts of AI?
I am much more optimistic about the positive impacts. So just in my own work, what we've done so far,
we run the nation where
kidney exchange hundreds of people are walking around alive today, who would it be? And it's
increased employment. You have, you have a lot of people now running kidney exchanges and
at transplant centers interacting with the kidney exchange. You have some ex-possurgence, nurses, anesthesiologists, hospitals, all of that.
So employment is increasing from that and the world is becoming a better place.
Another example is combinatorial sourcing auctions.
We did 800 large-scale combinatorial sourcing auctions from 2001 to 2010 in a previous startup, what we call CombinedNet.
And we increased the supply chain efficiency
on that $60 billion of spend by 12.6%.
So that's over $6 billion of efficiency improvement
in the world.
And this is not like shifting value from somebody
to somebody else, just efficiency improvement,
like in trucking, less empty driving, so there's less waste, less carbon footprint, and so on.
This is a huge positive impact in the near term, but sort of to stay in it for a
little longer, because I think game theory is a role to play here. Let me actually come back
on that. That is one thing. I think AI is also going to make the world much safer.
So that's another aspect that often gets overlooked.
Let me ask this question.
Maybe you can speak to the safer.
So I talked to Max Tecmar, who is a Russell, who are very concerned about existential
threats of AI.
And often the concern is about value misalignment. So AI systems basically working operating towards goals that are not the same as human
civilization, human beings.
So it seems like Game Theory has a role to play there to make sure the values are aligned
with human beings.
I don't know if that's how you think about it.
If not, how do you think AI might help with this problem?
How do you think AI might make the world safer?
Yeah, I think this value misalignment
is a fairly theoretical worry
and I haven't really seen it in, because I do a lot of real applications.
I don't see it anywhere. The closest I've seen it was the following type of mental exercise,
really, where I had this argument in the late 80s when we were building these transportation
optimization systems. And somebody had heard that it's a good idea to have high utilization of assets. So they told me, hey, why don't you put that as an objective?
And we didn't even put it as an objective because I just showed him that,
if you had that as your objective, the solution would be to load your trucks
full and drive in circles.
Nothing would ever get delivered. You'd have 100% utilization.
So yeah, I know this phenomenon, I've known this for over 30 years, but I've never seen it actually be a problem reality in reality.
And yes, if you have the wrong objective, the AI will optimize that to the hilt, and it's going to hurt more than some human who's kind of trying to
solid in a half-baked way with some human insight too, but I just haven't seen that materializing practice.
There's this gap that you've actually put your finger on very clearly just now between
theory and reality.
That's very difficult to put into words, I think.
It's what you can theoretically imagine, the worst possible case or even bad cases and what usually happens in reality.
So, for example, to me, maybe it's something you can comment on having grown up in the Soviet
Union, there's currently 10,000 nuclear weapons in the world.
And for many decades, it's theoretically surprising to me that the nuclear war is not broken out.
Do you think about this aspect from a game of the erratic perspective in general?
Why is that true?
Why in theory you could see how things would go terribly wrong and somehow yet they have not.
Yeah, how do you think so?
So I do think that about that a lot.
I think the biggest two threats that we're facing as mankind, one is climate change and
the other is nuclear war.
So those are my main two worries that they worry about.
And I've tried to do something about climate, thought about trying to do something for climate
change twice.
Actually, for two of my startups, I've actually commissioned studies of what we could do on those things.
And we didn't really find a sweet spot, but I'm still keeping an eye out on that.
If there's something where we could actually provide a market solution or optimization
solution or some other technology solution to problems.
Right now, like for example, pollution grid markets was what we were looking at then.
And it was much more the lack of political will by those markets.
We're not so successful rather than bad market design.
So I could go in and make a better market design, but that wouldn't really move
the needle on the world very much if there's no political will.
And in the US, you know, the market, at least the Chicago market was just shut down
and so on.
So then it doesn't really help how great your market design was.
And then you nuclear side, it's more.
So global warming is a more encroaching problem.
Nuclear weapons have been here.
It's an obvious problem that's just been sitting there.
So how do you think about what is the mechanism design there that just made everything seem stable?
And are you still extremely worried?
I am still extremely worried.
So you probably know the simple game theory of Mad.
So these were so mutually assured destruction.
And it doesn't require any computation with small matrices.
You can actually convince yourself that the game is such that nobody wants to initiate.
Yeah, that's a very coarse-grained analysis.
And it really works in a situation where you have two superpowers or small numbers or superpowers.
Now things are very different.
You have a smaller nook, so the threshold of initiating a smaller and you have smaller countries and non-nation actors
who may get nukes and so on. So I think it's riskier now than it was maybe ever before.
And what idea application by AI, you've talked about a little bit, but what is the most exciting to you right now? I mean, you're here at NIPS, New RIPS. Now, you have a few excellent pieces of work,
but what are you thinking into the future? What several companies are doing?
What's the most exciting thing or one of the exciting things?
The number one thing for me right now is coming up with these scalable techniques for game solving
and applying them into the real world.
I'm still very interested in market design as well and we're doing that in the optimized markets,
but I'm most interested if number one right now is strategic machine strategy robot getting that
technology out there and seeing as you're in the trenches doing applications what needs to be
actually filled, what technology cap still need to be field. So it's so hard to just put your feet on the table and imagine
what needs to be done, but when you're actually doing real applications, the applications
tell you what needs to be done. And I really enjoy that interaction.
Is it a challenging process to apply some of the state, the art techniques you're working on and having the
various players in industry or the military or people who could really benefit
from it actually use it. What's that process like? You know in Natano's vehicles
will work with automotive companies and they're in many ways they're a little
bit old fashioned.
It's difficult.
They really want to use this technology.
There's clearly will have a significant benefit,
but the systems aren't quite in place to easily have them integrated in terms of data,
in terms of compute, in terms of all these kinds of things.
So, is that one of the bigger challenges that you're facing and how do you tackle that challenge?
Yeah, I think that's always a challenge. That's kind of slowness and inertia, really.
Let's do things the way we've always done it. You just have to find the internal champions
of the customer who understand that, hey, things can't be the same way in the future. Otherwise, bad
things are going to happen. And it's in autonomous vehicles, it's actually very interesting
that the car makers are doing that, and they're very traditional.
But at the same time, you have tech companies who have nothing to do with cars
or transportation, like Google and Baidu, really pushing on autonomous cars.
I find that fascinating.
Clearly, you're super excited about actually these ideas having an impact in the world.
In terms of the technology, in terms of ideas and research, are there directions that you're
also excited about, whether that's on the, some of the approaches you talked about for
the imperfect information games, whether it's applying deep learning to some of these problems,
is there something that you're excited in the research side of things?
Yeah, lots of different things in the game solving.
Solving even bigger games, games where you have more hidden action of the player actions
as well.
Poker is a game where really the chance actions are hidden, or some of them are hidden, but
the player actions are hidden or some of them are hidden, but the player actions are public.
Multiplayer games, or various sorts, collusion, opponent exploitation,
and even longer games. So games that basically go forever, but they're not repeated. So, see extensive fun games that go forever forever what would that even look like?
How do you represent that? How do you solve that? What's the example of a game like that?
This is some of the stochastic games that you mentioned. Let's say business strategy.
So it's not just modeling like a particular interaction but thinking about the business from here to
eternity or let's say military strategy. So it's not like war is going to go away. How do you think about
military strategy that's going to go forever? How do you even model that? How do you know whether
a move was good that you, you, you, somebody made and so on. So that's kind of one direction. I'm also very interested in learning much more scalable techniques for integer programming.
So we had an ICML paper this summer on that. The first automated algorithm configuration paper that has theoretical generalization guarantees.
So if I see these many training examples and I tool my algorithm in this way,
it's going to have good performance on the real distribution, which I've not seen.
So which is kind of interesting that, you know, algorithm configuration has been going on now for
at least 17 years seriously.
And there has not been any generalization theory before.
Well, this is really exciting and it's been it's a huge honor to talk to you. Thank you so much, Tomas.
Thank you for bringing LeBrotus to the world.
Now, the great work you're doing.
Well, thank you very much.
It's been fun.
Good questions.
Ha-ha. Thank you.