Lex Fridman Podcast - #120 – François Chollet: Measures of Intelligence
Episode Date: August 31, 2020François Chollet is an AI researcher at Google and creator of Keras. Support this podcast by supporting our sponsors (and get discount): - Babbel: https://babbel.com and use code LEX - MasterClass: ...https://masterclass.com/lex - Cash App: download app & use code "LexPodcast" Episode links: Francois's Twitter: https://twitter.com/fchollet Francois's Website: https://fchollet.com/ On the Measure of Intelligence (paper): https://arxiv.org/abs/1911.01547 If you would like to get more information about this podcast go to https://lexfridman.com/podcast or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 05:04 - Early influence 06:23 - Language 12:50 - Thinking with mind maps 23:42 - Definition of intelligence 42:24 - GPT-3 53:07 - Semantic web 57:22 - Autonomous driving 1:09:30 - Tests of intelligence 1:13:59 - Tests of human intelligence 1:27:18 - IQ tests 1:35:59 - ARC Challenge 1:59:11 - Generalization 2:09:50 - Turing Test 2:20:44 - Hutter prize 2:27:44 - Meaning of life
Transcript
Discussion (0)
The following is a conversation with Francois Chollet, his second time on the podcast.
He's both a world-class engineer and a philosopher in the realm of deep learning and artificial
intelligence.
This time, we talk a lot about his paper titled On the Measure of Intelligence that discusses
how we might define and measure general intelligence in our computing machinery.
Quick summary of the sponsors.
Babbel, Masterclass, and Cash App.
Click the sponsor links in the description to get a discount and to support this podcast.
As a side note, let me say that the serious, rigorous, scientific study of artificial general
intelligence is a rare thing.
The mainstream machine learning community works on very narrow AI with very
narrow benchmarks. This is very good for incremental and sometimes big incremental progress.
On the other hand, the outside the mainstream, renegade, you could say, AGI community works on
approaches that verge on the philosophical and even the literary without big public benchmarks.
Walking the line between the two worlds is a rare breed, but it doesn't have to be. I ran the AGI
series at MIT as an attempt to inspire more people to walk this line. DeepMind and OpenAI for a time
and still on occasion walk this line. Francois Chollet does as well.
I hope to also.
It's a beautiful dream to work towards
and to make real one day.
If you enjoy this thing, subscribe on YouTube,
review it with five stars on Apple Podcasts,
follow on Spotify, support on Patreon,
or connect with me on Twitter at Lex Friedman.
As usual, I'll do a few minutes of ads now
and no ads in the middle. I try to
make these interesting, but I give you timestamps so you can skip. But still, please do check out
the sponsors by clicking the links in the description. It's the best way to support
this podcast. This show is sponsored by Babbel, an app and website that gets you speaking in a
new language within weeks. Go to babbel.com and use code LEX to get three months free.
They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian.
Daily lessons are 10 to 15 minutes, super easy, effective,
designed by over 100 language experts.
Let me read a few lines from the Russian poem,
100 language experts. Let me read a few lines from the Russian poem Ночь, улица, фонарь, аптека by Alexander Block
that you'll start to understand if you sign up to Babbel.
Ночь, улица, фонарь, аптека, бессмысленный и тусклый
свет. Живи еще хоть четверть века.
Все будет так, исхода нет.
Now, I say that you'll start to understand this poem because
Russian starts with a language and ends with vodka. Now, the latter part is definitely not endorsed
or provided by Babbel and will probably lose me this sponsorship, although it hasn't yet.
But once you graduate with Babbel, you can enroll in my advanced course
of late-night Russian conversation over vodka.
No app for that yet.
So get started by visiting Babbel.com
and use code LEX to get three months free.
This show is also sponsored by Masterclass.
Sign up at Masterclass.com slash LEX
to get a discount and to support this podcast.
When I first heard about Masterclass, I thought it was too good to be true.
I still think it's too good to be true. For $180 a year, you get an all-access pass to watch
courses from, to list some of my favorites. Chris Hatfield on space exploration. Hope to have him
in this podcast one day. Neil deGrasse Tyson on Scientific Thinking
and Communication. Neil too. Will Wright, creator of SimCity and Sims on Game Design. Carlos Santana
on Guitar. Gary Kasparov on Chess. Daniel Negreanu on Poker and many more. Chris Hadfield explaining
how rockets work and the experience of being launched into space alone is worth the money.
By the way, you can watch it on basically any device.
Once again, sign up at masterclass.com slash Lex to get a discount and to support this podcast.
This show, finally, is presented by Cash App, the number one finance app in the App Store.
When you get it, use code LexPodcast.
Cash App lets you send money to friends, buy Bitcoin, and invest in the stock
market with as little as $1. Since Cash App allows you to send and receive money digitally,
let me mention a surprising fact related to physical money. Of all the currency in the world,
roughly 8% of it is actually physical money. The other 92% of the money only exists digitally,
and that's only going to increase. So again, if you get Cash
App from the App Store or Google Play and use code LEXPODCAST, you get $10, and Cash App will
also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education
for young people around the world. And now, here's my conversation with Francois Chollet.
What philosophers, thinkers, or ideas had a big impact on you growing up and today?
So one author that had a big impact on me when I read his books as a teenager was Jean Piaget,
who is a Swiss psychologist, is considered to be the father of developmental psychology,
and he has a large body of work about basically how intelligence develops in children.
And so it's fairly old work, like most of it is from the 1930s, 1940s.
So it's not quite up to date.
It's actually superseded by many newer developments in developmental psychology.
But to me, it was very interesting, very striking,
and actually shaped the early ways
in which I started thinking about the mind
and the development of intelligence as a teenager.
His actual ideas or the way he thought about it
or just the fact that you could think
about the developing mind at all?
I guess both.
Jean Piaget is the author
that really introduced me to the notion
that intelligence and the mind
is something that you construct throughout your life and that children construct it in stages.
And I thought that was a very interesting idea, which is, of course, very relevant to AI, to building artificial minds.
Another book that I read around the same time that had a big impact on me.
a book that I read around the same time that had a big impact on me.
Uh, and, and there was actually a little bit of overlap with Jean-Pierre as well, and I read it around the same time, uh, is, uh, Jeff Hawkins, uh,
on intelligence, which is a classic.
And he has this vision of the mind as a multi-scale hierarchy of
temporal prediction modules.
And these ideas really resonated with me,
like the notion of a modular hierarchy
of, you know, potentially of compression functions
or prediction functions.
I thought it was really, really interesting.
And it really shaped the way I started thinking
about how to build minds the hierarchical nature the
which aspect also he's a neuroscientist so he was thinking yes actual he was basically talking about
how our mind works yeah the notion that cognition is prediction was an idea it was kind of new to
me at the time and that i really loved at the time. And yeah, and the notion that there are multiple scales of processing in the brain.
The hierarchy.
Yes.
This was before deep learning.
These ideas of hierarchies in AI have been around for a long time, even before on intelligence.
I mean, they've been around since the 1980s.
And yeah, that was before deep intelligence. They've been around since the 1980s. And yeah, that was before deep learning. But of course, I think these ideas really found their practical implementation in deep learning.
What about the memory side of things? I think you were talking about knowledge representation.
Do you think about memory a lot? One way you can think of neural networks is a kind of memory.
memory a lot. One way you can think of neural networks as a kind of memory, you're memorizing things, but it doesn't seem to be the kind of memory that's in our brains, or it doesn't have
the same rich complexity, long-term nature that's in our brains. Yes, the brain is more of a sparse
access memory, so that you can actually retrieve very precisely like bits of your experience
the retrieval aspect you can like introspect you can ask yourself questions yes you can program
your own memory and language is actually the tool you use to do that i think language is a kind of
operating system for the mind and use language well one of the uses of language is as a query that you run over your own memory,
use words as keys to retrieve specific experiences or specific concepts, specific thoughts.
Like language is a way you store thoughts, not just in writing, in the, in the physical
world, but also in your own mind.
And it's also how you retrieve them.
Like imagine if you didn't have language.
Then you would have to...
You would not really have a self-internally triggered
way of retrieving past thoughts.
You would have to rely on external experiences.
For instance, you see a specific sight,
you smell a specific smell, and that brings up memories.
But you would not really have a way to deliberately access these memories without language.
Well, the interesting thing you mentioned is you can also program the memory.
You can change it probably with language.
Yeah, using language.
Yes.
Well, let me ask you a Chomsky question, which is like, first of all, do you think language
is like fundamental?
Like there's turtles what's at the bottom of the turtles they don't go it can't be turtles all the way down
is language at the bottom of cognition of everything is like language the fundamental
aspect of like what it means to be a thinking thing no i don't think so i think language you
disagree with norm chomsky yes i think language is a layer on top of cognition so it is fundamental
to cognition in the sense that to to use a computing metaphor i see language as the operating
system of the brain of the human mind yeah and the operating system of the brain, of the human mind.
And the operating system, you know, is a layer on top of the computer.
The computer exists before the operating system,
but the operating system is how you make it truly useful.
And the operating system is most likely Windows, not Linux,
because its language is messy.
Yeah, it's messy and it's pretty difficult to inspect it, introspect it.
How do you think about language?
Like we use actually sort of human interpretable language,
but is there something like deeper,
that's closer to like logical type of statements?
sort of like like logical type of statements um like yeah what is the nature of language do you think like is there something deeper than like the syntactic rules we construct is there something
that doesn't require utterances or writing or so on are you asking about the possibility that there could exist uh languages for thinking that are not made of words yeah yeah i think so i think so
uh the mind is layers right and language is almost like the the outermost the uppermost layer
um but before we think in words i think we think in in terms of emotion in space and we think in words, I think we think in terms of emotion in space, and we think in terms of physical actions.
And I think babies in particular probably express these thoughts in terms of the actions that they've seen or that they can perform,
and in terms of motions of objects in their environment before they start thinking in terms of words.
It's amazing to think about that as the building blocks of language.
So the kind of actions and ways the babies see the world
as more fundamental than the beautiful Shakespearean language
you construct on top of it.
And we probably don't have any idea what that looks like, right?
Because it's important for them trying to engineer it into AI systems.
I think visual analogies and motion
is a fundamental building block of the mind.
And you actually see it reflected in language.
Language is full of special metaphors.
And when you think about things, I consider myself very much as a visual thinker, you often express your thoughts by using things like visualizing concepts
in 2D space, or like you solve problems
by imagining yourself
navigating
a concept space. I don't know if
you have this sort of experience.
You said visualizing concept space.
So like, so I certainly think about
I certainly
visualize mathematical concepts
but you mean like in concept space?
Visually, you're embedding ideas into a three-dimensional space
you can explore with your mind, essentially?
You should be more like 2D, but yeah.
2D?
Yeah.
You're a flatlander.
Okay.
No, I do not
I always have to
before I jump
from concept to concept
I have to
put it back down
on paper
it has to be on paper
I can only travel
on 2D paper
not inside my mind
you're able to
move inside your mind
but even if you're writing
like a paper
for instance
don't you have like a spatial representation of your paper?
Like you visualize where ideas lie topologically
in relationship to other ideas,
kind of like a subway map of the ideas in your paper.
Yeah, that's true.
I mean, there is, in papers, I don't know about you,
but there feels like there's a destination.
There's a key idea that you want to arrive at,
and a lot of it is in the fog, and you're trying to kind of,
it's almost like, what's that called when you do a path planning search
from both directions, from the start and from the end?
And then you find, you do like shortest path,
but like, you know, in game playing,
you do this with like A star from both sides.
And you see where to join.
Yeah.
So you kind of do, at least for me,
I think like, first of all, just exploring from the start, from, like, first principles, what do I know?
What can I start proving from that, right?
And then from the destination, if you start backtracking, like, if I want to show some kind of sets of ideas, what would it take to show them?
And you kind of backtrack.
But, like, yeah, I don't think I'm doing all that in my mind, though. Like, I'm putting it down on paper. some kind of sets of ideas what would it take to show them and you kind of backtrack but like yeah
i don't think i'm doing all that in my mind though like i'm putting it down on paper do you use mind
maps to organize your ideas yeah i like mind maps let's get into this because it's i've been so
jealous of people i haven't really tried it i've been jealous of people that seem to like they get
like this fire of passion in their eyes because everything starts
making sense. It's like Tom Cruise in the movie was like moving stuff around. Some of the most
brilliant people I know use mind maps. I haven't tried really. Can you explain what the hell a mind
map is? I guess a mind map is a way to make kind of like the mess inside your mind to just put it
on paper so that you gain more control over it
it's a way to organize things on paper and as as kind of like a consequence of organizing things
on paper they start being more organized inside inside your own mind so what what does that look
like you put like do you have an example like what what do you what's the first thing you write on
paper what's the second thing you write i mean typically uh you you draw a mind map to organize the way you think about the topic
so you would start by writing down like the the key concept about that topic like you would write
intelligence or something and then you would start adding uh associative connections like what do you
think about when you think about intelligence what What do you think are the key elements
of intelligence? So maybe you would have language,
for instance, and you'd have motion.
And so you would start drawing nodes with these things.
And then you would see, what do you think about when you think about
motion? And so on. And you would go like that,
like a tree.
Is it a tree or a tree most,
or is it a graph too, like a tree?
Oh, it's more of a graph than a tree.
And it's not limited to like a tree oh it's it's more of a graph than a tree and um and it's not
limited to just you know uh writing down words you can also uh draw things and it's not it's not
supposed to be purely hierarchical right like you can um the point is that you can start once once
you start writing it down you can start reorganizing it so that it makes more sense, so that it's connected in a more effective way.
See, but I'm so OCD that you just mentioned intelligence and language and motion.
I would start becoming paranoid that the categorization is imperfect.
Like that I would become paralyzed with the mind map. That like this may not be, so like the,
even though you're just doing associative kind of connections,
there's an implied hierarchy that's emerging.
And I would start becoming paranoid that it's not the proper hierarchy.
So you're not just, one way to see mind maps is you're putting thoughts on paper.
It's like a stream of consciousness.
But then you can also start getting paranoid,
well, is this the right hierarchy?
Sure, but it's mind maps.
It's your mind map.
You're free to draw anything you want.
You're free to draw any connection you want.
And you can just make a different mind map
if you think the central node is not the right node.
Yeah, I suppose there's a fear of being wrong.
If you want to organize your ideas by writing down what you think,
which I think is very effective,
like how do you know what you think about something
if you don't write it down, right?
If you do that, the thing is that it imposes
a much more syntactic, syntactic structure or value
ideas, which is not required with mind map.
So mind map is kind of like a lower level, more free hand
way of organizing your thoughts.
And once you've drawn it, then you can start, uh, actually voicing your
thoughts in terms of, you know, paragraphs and two dimensional aspect of layout too.
Right.
And it's a kind of flower, you know, paragraphs. It's a two-dimensional aspect of layout too, right? Yeah.
It's a kind of flower, I guess, you start.
There's usually, you want to start with a central concept.
Yes.
And then you move out.
Typically, it ends up more like a subway map.
So it ends up more like a graph, a topological graph.
Without a root note.
Yeah, so like in a subway map,
there are some nodes that are more connected than others,
and there are some nodes that are more important than others, and there are some nodes that are more important than others.
So there are destinations.
But it's not going to be purely like a tree, for instance.
Yeah, it's fascinating to think that if there's something to that about the way our mind thinks.
By the way, I just kind of remembered an obvious thing.
I have probably thousands of documents in Google Doc Doc at this point that are bullet point lists.
Which is, you can probably map a mind map to a bullet point list.
It's the same.
No, it's not.
It's a tree.
It's a tree.
Yeah.
So I create trees, but also they don't have the visual element
like um i guess i'm comfortable with the structure it feels like it the narrowness the constraints
feel more comforting if you have thousands of documents with your own thoughts in google docs
why don't you write uh some kind of search, like maybe a mind map, a piece of software,
a mind mapping software where you write down a concept
and then it gives you sentences or paragraphs
from your thousand Google Docs document
that match this concept.
The problem is it's so deeply, unlike mind maps,
it's so deeply rooted in natural language.
like mind maps it's so deeply rooted in natural language so it's not um it's not semantically searchable i would say because the categories are very you kind of mentioned intelligence language
and motion they're very strong semantic like it feels like the mind map forces you to be
It feels like the mind map forces you to be semantically clear and specific.
The bullet points list I have are sparse, disparate thoughts that poetically represent a category, like motion,
as opposed to saying motion.
So unfortunately, that's the same problem with the internet that's why the idea
of semantic web is difficult to get it's um most language on the internet is a giant mess
of natural language that's hard to interpret which so do you think uh do you think there's
something to mind maps as um you actually originally brought it up as we were talking about kind of cognition and language?
Do you think there's something to mind maps about how our brain actually deals, like think reasons about things?
It's reasonable to assume that there is some level of topological processing in the brain,
that the brain is very associative in nature.
And I also believe that a topological space is a better medium to encode thoughts than a geometric space.
What's the difference between a topological and a geometric space?
Well, if you're talking about topologies,
then points are either connected or not.
So a topology is more like a subway map.
And geometry is when you're interested in the distance between things.
And in subway maps, you don't really have the concept of distance.
You only have the concept of whether there is a train
going from station A to station
B. And what we do in deep learning is that we're actually dealing with geometric spaces.
We are dealing with concept vectors, word vectors that have a distance between them
which is expressed in terms of dot product.
We are not really building topological models usually.
I think you're absolutely right. Distance is of fundamental importance in deep learning.
I mean, it's the continuous aspect of it. Yes, because everything is a vector and
everything has to be a vector because everything has to be differentiable.
If your space is discrete, it's no longer differentiable. You cannot do deep learning in it anymore.
Well, you could, but you can only do it by embedding it in a bigger continuous space.
So if you do topology in the context of deep learning, you have to do it by embedding
your topology in a geometry.
Yeah. Well, let me let me zoom out for a second.
Let's get into your paper on the measure of intelligence.
Did you put it out in 2019?
Yes.
Okay.
November.
November.
Yeah, remember 2019?
That was a different time.
Yeah, I remember.
I still remember.
It feels like a different and different different world you could travel you can you know actually
go outside and see friends yeah let me ask the most absurd question i think uh there's some
non-zero probability there'll be a textbook one day like 200 years from now on artificial
intelligence or it'll be called like just intelligence
because humans will already be gone.
It'll be your picture with a quote.
This is, you know, one of the early biological systems
would consider the nature of intelligence.
And there'll be like a definition
of how they thought about intelligence,
which is one of the things you do in your paper
on measuring intelligence is to ask like, well, what is intelligence and how to test for intelligence
and so on.
So is there a spiffy quote about what is intelligence?
What is the definition of intelligence according to Francois Chollet?
Yeah.
So do you think the super intelligent AIs of the future
will want to remember us the way we remember humans from the past?
And do you think they won't be ashamed of having a biological origin?
No, I think it would be a niche topic.
It won't be that interesting,
but it'll be like the people that study in certain
contexts like historical civilization that no longer exists, the Aztecs and so on. That's how
it'll be seen. And it'll be study and also the context on social media. There'll be hashtags
about the atrocity committed to human beings when the robots finally got rid of them.
It was a mistake.
It'll be seen as a giant mistake, but ultimately in the name of progress,
and it created a better world because humans were over-consuming the resources,
and they were not very rational and're destructive in the end in terms
of productivity and putting more love in the world and so within that context there'll be a chapter
about these biological systems seems to have a very detailed vision of that future you should
write a sci-fi novel about it i'm working i'm working on a sci-fi novel currently, yes. Self-published, yeah.
The definition of intelligence.
So intelligence is the efficiency
with which you acquire new skills,
tasks that you did not previously know about,
that you did not prepare for, right?
So it is not, intelligence is not skill itself.
It's not what you know. It's not what you can do. It's how well and how efficiently you can learn new things.
New things.
Yes.
The idea of newness there seems to be fundamentally important.
on display, for instance, whenever you see a human being or, you know, an AI creature adapt to a new environment that it has not seen before, that its creators did not anticipate.
When you see adaptation, when you see improvisation, when you see generalization,
that's intelligence.
In reverse, if you have a system that when you put it in a slightly new environment,
it cannot adapt,
it cannot improvise, it cannot deviate from what it's hardcoded to do or what it has been
trained to do, that is a system that is not intelligent.
There's actually a quote from Einstein that captures this idea, which is, the measure
of intelligence is the ability to change.
I like that quote.
I think it captures at least part of this idea.
You know, there might be something interesting
about the difference between your definition and Einstein's.
I mean, he's just being Einstein and clever.
But acquisition of new ability to deal with new things versus ability to just change.
What's the difference between those two things?
So just change in itself.
Do you think there's something to that?
Just being able to change.
Yes, being able to adapt. not not change but certainly a change
is direction being able to adapt yourself to your environment whatever the environment that's that's
a big part of intelligence yes and intelligence is more precisely you know how efficiently you're
able to adapt how efficiently you're able to basically master your environment how efficiently you're able to adapt, how efficiently you're able to basically master your environment,
how efficiently you can acquire new skills.
And I think there's a big distinction to be drawn
between intelligence, which is a process,
and the output of that process, which is skill.
So for instance, if you have a very smart human programmer
that considers the game of
chess and that writes down a static program that can play chess, then the intelligence
is the process of developing that program.
But the program itself is just encoding the output artifact of that process.
The program itself is not intelligent.
And the way you tell it's not intelligent
is that if you put it in a different context,
you ask it to play Go or something,
it's not going to be able to perform well
without human involvement
because the source of intelligence,
the entity that is capable of that process
is the human program.
So we should be able to tell the difference
between the process and its output. We should to tell the difference between the process and its
output.
We should not confuse the output and the process.
It's the same as, you know, do not confuse a road building company and one
specific road, because one specific road takes you from point A to point B.
But a road building company can take you from, can make a path from anywhere to
anywhere else.
Yeah, that's beautifully put.
But it's also, to play devil's advocate a little bit,
you know, it's possible that there is something
more fundamental than us humans.
So you kind of said the programmer creates
the difference between the choir of the skill and the skill itself.
There could be something like you could argue the universe is more intelligent,
like the, the deep, the base intelligence of,
that we should be trying to measure is something that created humans.
We should be measuring God or or the source of the universe,
as opposed to, like, there could be a deeper intelligence.
There's always deeper intelligence, I guess.
You can argue that, but that does not take anything away
from the fact that humans are intelligent.
And you can tell that because they are capable of adaptation and generality.
And you see that in particular in the fact that, uh, humans are capable of
handling, uh, uh, situations and tasks that are quite different from anything
that any of our evolutionary ancestors has ever encountered.
So we are capable of generalizing very much out of distribution.
If you consider
our evolutionary history as being in a way altering data. Of course, evolutionary biologists
would argue that we're not going too far out of the distribution. We're like mapping the skills
we've learned previously, desperately trying to like jam them into like these new situations.
I mean, there's definitely a little bit of a little bit of that, but it's pretty
clear to me that we're able to, uh, you know, most of the things we do, uh, any
given day in our modern civilization are things that are very, very different from
what, you know, ancestors a million years ago would have been doing in, in a given
day and your environment is very different. So I agree that
everything we do, we do it with cognitive building blocks that we acquired over the course of
evolution, right? And that anchors our cognition to a certain context, which is the human condition
very much. But still, our mind is capable of a pretty remarkable degree of
generality, far beyond anything we can create in artificial systems today. The degree in which
the mind can generalize from its evolutionary history, can generalize away from its evolutionary
history, is much greater than the degree to which a deep learning system today can generalize away
from its training data and like the key point you're making which i think is quite beautiful
is like we shouldn't measure if we talk about measurement we shouldn't measure the skill
we should measure like the creation of the new skill the ability to create that new skill. Yes. But it's tempting.
It's weird because the skill is a little bit of a small window into the system.
So whenever you have a lot of skills, it's tempting to measure the skills.
Yes.
I mean, the skill is the only thing you can objectively measure.
But yeah, so the thing to keep in mind is that when you see skill in the human, it gives you a strong signal that that human is intelligent because you know, they weren't
born with that skill typically.
Like you see a very strong chess player.
Maybe you're a very strong chess player yourself.
I think you're saying that because I'm Russian
and now you're prejudiced.
You assume all Russians are good at chess.
I'm biased.
Cultural bias.
So if you see a very strong chess player,
you know they weren't born knowing how to play chess.
So they had to acquire that skill
with their limited resources, with their limited lifetime.
And, you know, they did that because they are generally intelligent.
And so they, they may as well have acquired any other skill, you know, they have this
potential.
And on the other hand, if you see a computer playing chess, you cannot make the same assumptions
because you cannot, you know, these same assumptions because you cannot just assume
the computer is generally intelligent.
The computer may be born
knowing how to play chess
in the sense that it may have been programmed
by a human that has understood chess
for the computer
and that has just encoded
the output of that understanding
in a static program.
And that program is not intelligent.
So let's zoom out just for a second and say,
what is the goal on the measure of intelligence paper?
What do you hope to achieve with it?
So the goal of the paper is to clear up some longstanding misunderstandings
about the way we've been conceptualizing intelligence in the AI community and in the way
we've been evaluating progress in AI. There's been a lot of progress recently in machine learning,
and people are extrapolating from that progress that we're about to solve general intelligence.
And if you want to be able to evaluate these statements, you need
to precisely define what you're talking about when you're talking about general intelligence,
and you need a formal way, a reliable way to measure how much intelligence, how much general
intelligence a system processes. And ideally, this measure of intelligence should be actionable.
So it should not just describe what intelligence is.
It should not just be a binary indicator that tells you this system is
intelligent or it isn't.
It should be actionable.
It should have explanatory power, right?
So you could use it as a feedback signal.
It would show you the way towards building more intelligent systems.
So at the first level, you draw a distinction between two divergent views of intelligence.
As we just talked about, intelligence is a collection of task-specific skills and a general learning ability.
So what's the difference between kind of this memorization of skills and a general learning ability?
We've talked about it a little bit, but can you try to linger on this topic for a bit?
Yeah, so the first part of the paper is an assessment of the different ways we've been thinking about intelligence
and the different ways we've been evaluating progress in AI.
And the history of cognitive sciences has been shaped by two views of the human mind.
And one view is the evolutionary psychology view,
psychology view in which the mind is a collection of fairly static, special purpose, ad hoc mechanisms that have been hard-coded by evolution over our history as a species over a very
long time.
And early AI researchers, people like Marvin Minsky, for instance, they clearly
subscribed to this view and they saw, they saw the mind as a kind of, you know, collection
of static programs, uh, similar to the programs they would, they would run on like mainframe
computers.
And in fact, they, I think they very much understood the mind through the metaphor of the mainframe
computer because that was the tool they were working with. And so you had these static
programs, this collection of very different static programs operating over a database like memory.
And in this picture, learning was not very important. Learning was considered to be just memorization. And in fact, learning is basically not featured in AI textbooks
until the 1980s with the rise of machine learning.
It's kind of fun to think about that learning was the outcast.
Like the weird people working on learning.
Like the mainstream AI world was... I mean, I don't know what the best term is,
but it's non-learning. It was seen as like reasoning would not be learning-based.
Yes. It was considered that the mind was a collection of programs that were primarily
logical in nature. And that's all you needed to do to create a mind was
to write down these programs.
And they would operate over knowledge, which would be stored in some kind of database.
And as long as your database would encompass everything about the world and your logical
rules were comprehensive, then you would have a mind.
So the other view of the mind is the brain as a sort of blank slate, right?
This is a very old idea.
You find it in John Locke's writings.
This is the tabula rasa.
And this is this idea that the mind is some kind of like information sponge
that starts empty, that starts blank, and that absorbs knowledge and skills from experience.
So it's a sponge that reflects the complexity of the world,
the complexity of your life experience, essentially.
That everything you know and everything you can do
is a reflection of something you found in the outside world,
essentially. So this is an idea that's very old, that was not very popular, for instance,
in the 1970s, but that had gained a lot of vitality recently with the rise of
connectionism, in particular deep learning. And so today, deep learning is the dominant paradigm in AI. And I feel like lots of AI researchers are conceptualizing the mind via a deep learning metaphor.
Like they see the mind as a kind of randomly initialized neural network that starts blank when you're born.
And then that gets trained via exposure to training data that requires knowledge
and skills via exposure to training data.
By the way, it's a small tangent.
I feel like people who are thinking about intelligence are not conceptualizing it that
way.
I actually haven't met too many people who believe that a neural network will be able
to reason,
who seriously think that, rigorously,
because I think it's an actually interesting worldview.
And we'll talk about it more,
but it's been impressive
what neural networks have been able to accomplish.
And it's, to me, I don't know, you might disagree,
but it's an open question whether like like scaling size eventually might
lead to incredible results to us mere humans will appear as if it's general i mean if you if you ask
people who are seriously thinking about intelligence they will definitely not say that all you need to
do is is like the mind is just in your network However, it's actually a view that's very popular, I think,
in the deep learning community that many people are kind of conceptually,
you know, intellectually lazy about it.
Right.
But I guess what I'm saying is exactly right.
I haven't met many people, and I think it would be interesting
to meet a person who is not intellectually lazy about this particular topic
and still believes that neural networks will go all the way.
I think Yarmulke is probably closest to that.
There are definitely people who argue that
currently planning techniques are already the way
to general artificial intelligence,
and that all you need to do is to scale it
up to all the available training data.
And if you look at the waves that OpenAI's GPT-3 model has made, you see echoes of this
idea.
So on that topic, GPT-3, similar to GPT-2, actually, have captivated some part of the imagination of the public.
There's just a bunch of hype of different kind.
I would say it's emergent.
It's not artificially manufactured.
It's just like people just get excited for some strange reason.
And in the case of GPT-3, which is funny, that there's, I believe, a couple of months delay
from release to hype. Maybe I'm not historically correct on that, but it feels like there was a
little bit of a lack of hype and then there's a phase shift into hype. But nevertheless,
there's a bunch of cool applications that seem to captivate the
imagination of the public about what this language model that's trained in unsupervised way without
any fine-tuning is able to achieve. So what do you make of that? What are your thoughts about GPT-3?
Yeah, so I think what's interesting about GPT-3 is the idea that it may be able to learn new tasks after just
being shown a few examples. So I think if it's actually capable of doing that, that's novel,
and that's very interesting, and that's something we should investigate. That said, I must say,
I'm not entirely convinced that we have shown it's capable of doing that, it's very likely, given the amount of data
that the model is trained on,
that what it's actually doing is pattern matching
a new task you give it with a task
that it's been exposed to in its training data.
It's just recognizing the task
instead of just developing a model of the task, right?
But there's, sorry to interrupt,
there's a parallel to what you said before,
which is it's possible to see GPT-3
as like the prompts it's given
as a kind of SQL query
into this thing that it's learned,
similar to what you said before,
which is languages used to query the memory.
So is it possible that
neural network is a giant memorization thing,
but then if it gets sufficiently giant,
it'll memorize sufficiently large amounts of things in the world
where intelligence becomes a querying machine?
I think it's possible that a significant chunk of intelligence
is this giant associative memory.
chunk of intelligence is this giant associative memory. I definitely don't believe that intelligence is just a giant associative memory, but it may well be a big component.
So do you think GPT-3, 4, 5, GPT-10 will think, where's the ceiling? Do you think you'll be able to reason?
No, that's a bad question.
Like, what is the ceiling is the better question.
How well is it going to scale?
How good is GPT-N going to be?
Yeah.
So I believe GPT-N is going to improve on the strength of GPT-203,
which is it will be able to generate ever more plausible text in context.
Just monotonically increasing performance.
Yes, if you train a bigger model on more data,
then your text will be increasingly more context-aware and increasingly
more plausible, in the same way that GPT-3 is much better at generating plausible text
compared to GPT-2.
But that said, I don't think just scaling up the model to more transformer layers and
more training data is going to address the flaws of GPT-3,
which is that it can generate plausible text,
but that text is not constrained by anything else other than plausibility.
So in particular, it's not constrained by factualness or even consistency, which is why it's very easy to get GPT-3 to generate statements that are factually untrue
or to generate statements that are even self untrue or to generate statements that are
even self-contradictory, right?
Because its only goal is plausibility and it has no other constraints.
It's not constrained to be self-consistent, for instance, right?
And so for this reason, one thing that I thought was very interesting with GPT-3 is that you can pre-determine the answer it will give you
by asking the question in a specific way
because it's very responsive to the way you ask the question
since it has no understanding of the content of the question.
Right.
And if you ask the same question in two different ways
that are basically adversarially engineered to produce a certain answer, you will get two different answers, two contradictory answers.
It's very susceptible to adversarial attacks, essentially.
Potentially, yes.
So in general, the problem with these models, these generative models, is that they are very good at generating plausible text, but that's just not enough, right?
You need...
I think one avenue that would be very interesting to make progress
is to make it possible to write programs
over the latent space that these models operate on,
that you would rely on these self-supervised models
to generate a sort of pool of knowledge and concepts and common sense.
And then you would be able to write explicit reasoning programs over it.
Because the current problem with GPT-3 is that
it can be quite difficult to get it to do what you want to do.
If you want to turn GPT-3 into products,
you need to put constraints on it.
You need to force it to obey certain rules.
So you need a way to program it explicitly.
Yeah, so if you look at its ability to do program synthesis,
it generates, like you said, something that's plausible.
Yeah, so if you try to make it generate programs, it will perform well for any program that it has
seen in its training data. But because program space is not interpretative, it's not going to
be able to generalize to problems it hasn't seen before.
Now that's currently, do you think, sort of an absurd, but I think useful,
I guess, intuition builder is, you know, the GPT-3 has 175 billion parameters. Human brain has about a thousand times that or more
in terms of number of synapses.
Do you think, obviously very different kinds of things,
but there is some degree of similarity.
Do you think, what do you think GPT
will look like when it has
100 trillion parameters?
You think our conversation
might be in nature
different?
Because you've criticized GPT-3
very effectively now. Do you think?
No,
I don't think so.
To begin with, the bottleneck with scaling up GPT-3, GPT models,
generatively trained transformer models,
is not going to be the size of the model or how long it takes to train it.
The bottleneck is going to be the trained data
because OpenAI is already training GPT-3 on a crawl of basically the entire web.
And that's a lot of basically the entire web, right?
And that's a lot of data.
So you could imagine training on more data than that.
Like Google could train on more data than that,
but it would still be only incrementally more data.
And I don't recall exactly how much more data GPT-3 was trained on compared to GPT-2, but it's probably at least like 100,
maybe even 1,000.
I don't have the exact number.
You're not going to be able to train a model
on 100 more data than what you're already doing.
So that's brilliant.
So it's easier to think of compute as a bottleneck
and then arguing that we can remove that bottleneck.
We can remove the compute bottleneck.
I don't think it's a big problem.
If you look at the pace at which we've improved
the efficiency
of deep learning models in the past few years,
I'm not worried about training time bottlenecks
or model size bottlenecks.
The bottleneck in the case of these generative transformer models
is absolutely the training data.
What about the quality of the data?
Yeah, so the quality of the data? So, yeah, so the quality of the data
is an interesting point.
The thing is, if you're going to want to use these models
in real products,
then you want to feed them data
that's as high quality, as factual,
I would say as unbiased as possible.
But, you know, there's not really such a thing
as unbiased data in the first place.
But you probably don't want to train it on Reddit, for instance.
Sounds like a bad plan.
So from my personal experience working with large-scale deep learning models,
so at some point I was working on a model at Google
that's trained on like 350 million labeled images.
It's an image classification model.
That's a lot of images.
That's like probably most publicly available images on the web at the time.
And it was a very noisy data set because the labels were not originally annotated by hand by humans.
set because the labels were not originally annotated by hand by humans. They were automatically derived from like tags on social media or just keywords in the
same page as the image was found and so on.
So it was very noisy.
And it turned out that you could easily get a better model, not just by training.
Like if you train on more of the noisy data, you get an incrementally better model, but
you very quickly hit diminishing returns.
On the other hand, if you train on smaller data set with higher quality annotations,
annotations that are actually made by humans, you get a better model.
And it also takes less time to train it.
Yeah, that's fascinating.
It's the self-supervised learning.
There's a way to get better doing the automated labeling.
Yeah, so you can enrich or refine your labels
in an automated way.
That's correct.
Do you have a hope for,
I don't know if you're familiar
with the idea of a semantic web?
Is the semantic web,
just for people who are not familiar,
is the idea of being able to convert the internet
or be able to attach semantic meaning
to the words on the internet,
the sentences, the internet the sentences the paragraphs to
be able to convert information on the internet or some fraction of the internet into something
that's interpretable by machines that was kind of a dream for um i think the semantic webpapers in
the 90s it's kind of the dream that dream that the internet is full of rich, exciting information.
Even just looking at Wikipedia,
we should be able to use that as data for machines.
And that information is not really in a format that's available to machines.
So no, I don't think the semantic web will ever work
simply because it would be a lot of work
to provide that information in structured form.
And there is not really any incentive for anyone to provide that work.
Uh, so I think the, the way forward to make, um, the knowledge on the web
available to machines is actually, uh, uh, something closer to unsupervised
deep learning.
Yeah.
So GPT-3 is actually a bigger step in the direction
of making the knowledge of the web available to machines
than the semantic web was.
Yeah, perhaps in a human-centric sense,
it feels like GPT-3 hasn't learned anything
that could be used to reason.
But that might be just the early days.
Yeah, I think that's correct.
I think the forms of reasoning that you see it perform
are basically just reproducing patterns that it has seen in string data.
So of course, if you're trained on the entire web,
then you can produce an illusion of reasoning in many different
situations, but it will break down if it's presented with a novel situation.
That's the open question between the illusion of reasoning and actual reasoning.
Yes.
The power to adapt to something that is genuinely new.
Because the thing is, even imagine you could train on every bit of data ever generated in this tree of humanity.
That model would be capable of anticipating many different possible situations, but it remains that
the future is going to be something different. um for instance if you train a gpt
3 model uh on on data from the year 2002 for instance and then use it today it's going to
be missing many things it's going to be missing many common sense facts about the world it's even
going to be missing vocabulary and so on yeah it's interesting that GPT-3 even doesn't have, I think, any information about the coronavirus.
Yes.
Which is why, you know, a system that's, you tell that the system is intelligent when it's
capable to adapt.
So intelligence is going to require some amount of continuous learning.
It's also going to require some amount of continuous learning. It's also going to require some amount of improvisation.
Like it's not enough to assume that
what you're going to be asked to do
is something that you've seen before
or something that is a simple interpolation
of things you've seen before.
Yeah.
In fact, that model breaks down for
even very tasks that look relatively simple from a distance,
like L5 self-driving, for instance.
Google had a paper a couple of years back showing that something like 30 million different
road situations were actually completely insufficient to train a driving model.
It wasn't even L2, right? And that's a lot of data. That's a lot more data than the 20 or 30
hours of driving that a human needs to learn to drive, given the knowledge they've already
accumulated. Well, let me ask you on that topic. Elon Musk, Tesla Autopilot, one of the only companies I believe is really
pushing for a learning-based approach. Are you skeptical that that kind of network can achieve
level four? L4 is probably achievable. L5 probably not. What's the distinction there?
Is L5 is completely, you can just fall asleep?
Yeah, L5 is basically human level.
Well, driving, you have to be careful saying human level
because like that's the most...
Yeah, there are all kinds of drivers.
Yeah, that's the clearest example of like,
you know, cars will most likely be much safer than humans
in many situations where humans fail.
It's the vice versa question.
I'll tell you, you know, the thing
is the amounts of trained data
you would need to anticipate for pretty much
every possible situation you
will encounter in the real world
is such that it's not
entirely unrealistic to think
that at some point in the future we'll develop
a system that's trained on enough data, especially
provided that we can simulate a lot of that data.
We don't necessarily need actual cars on the road for everything.
But it's a massive effort.
And it turns out you can create a system that's much more adaptive,
that can generalize much better
if you just add explicit models
of the surroundings of the car.
And if you use deep learning for what it's good at,
which is to provide perceptive information.
So in general, deep learning is a way to encode perception
and a way to encode intuition.
But it is not a good medium for any sort of explicit reasoning.
And in AI systems today,
strong generalization tends to come from
explicit models,
tend to come from abstractions in the human mind
that are encoded in program form
by a human engineer, right?
These are the abstractions you can actually generalize,
not the sort of weak abstraction
that is learned by a neural network.
Yeah, and the question is how much reasoning,
how much strong abstractions are required
to solve particular tasks like driving?
That's the question.
Or human life, existence. How much strong abstractions does
existence require? But more specifically on driving,
that seems to be a coupled
question about intelligence. It's like, how much
intelligence, like how do you build an intelligent system?
And the coupled problem, how hard is this problem?
How much intelligence does this problem actually require?
So we get to cheat, right?
Because we get to look at the problem.
It's not like you get to close our eyes and completely new to driving.
We get to do what we do as human beings, which is for the majority of our life,
before we ever learn, quote unquote, to drive, we get to watch other cars and other people drive.
We get to be in cars. We get to watch. We get to see movies about cars. We get to observe all
that stuff. And that's similar to what neural networks are doing. It's getting a lot of data.
to what neural networks are doing.
It's getting a lot of data.
And the question is,
yeah, how many leaps of reasoning genius is required to be able to actually effectively drive?
I think it's a good example of driving.
I mean, sure, you've seen a lot of cars in your life
before you learn to drive.
But let's say you've learned to drive in Silicon Valley,
and now you rent a car in Tokyo.
Well, now everyone is driving on the other side of the road,
and the signs are different, and the roads are more narrow and so on.
So it's a very, very different environment.
And a smart human, even an average human,
should be able to just zero-shot it,
to just be operational in this very different environment right away,
despite having had no contact with the novel complexity
that is contained in this environment.
And that novel complexity is not just an interpolation over the situations that you've encountered previously,
like learning to drive in the US.
I would say the reason I ask this,
one of the most interesting tests of intelligence we have today actively,
which is driving, in terms of having an impact on the world.
When do you think we'll pass that test of intelligence?
So I don't think driving is that much of a test of intelligence
because, again, there is no task for which a skill at that task
demonstrates intelligence unless it's a kind of meta-task
that involves acquiring new skills.
So I think you can actually solve driving
without having any real amount of intelligence.
For instance, if you really did have infinite training data,
you could just literally train an end-to-end deep learning model
that does driving, provided infinite training data.
The only problem with the whole idea
is collecting a data set that's sufficiently
comprehensive, that covers the very long tail of possible situations you might encounter.
And it's really just a scale problem. So I think there's nothing fundamentally wrong
with this plan, with this idea. It's just that it strikes me as a fairly inefficient thing to do
because you run into this scaling issue with diminishing returns.
Whereas if instead you took a more manual engineering approach
where you use deep learning modules in combination with
engineering an explicit model
of the surrounding of the cars
and you bridge the two in a clever way,
your model will actually start
generalizing much earlier
and more effectively than the end-to-end deep learning
model. So why would you not
go with the more manual
engineering-oriented approach?
Even if you created that system,
either the end-to-end deep learning model system
that's around infinite data
or the slightly more human system,
I don't think achieving LFI
would demonstrate a general intelligence
or intelligence of any generality at all.
Again, the only possible test of generality in AI
would be a test that looks at skill acquisition
over unknown tasks.
For instance, you could take your L5 driver
and ask it to learn to pilot a commercial airplane,
for instance.
And then you would look at how much human involvement
is required and how much strength data is required
for the system
to learn to pilot an airplane.
And that gives you a measure of how intelligent the system really is.
Yeah.
Well, I mean, that's a big leap.
I get you.
But I'm more interested as a problem.
I would see, to me, driving is a black box that can generate novel situations at some rate,
what people call edge cases.
So it does have newness that keeps being,
like we're confronted, let's say, once a month.
It is a very long tail.
Yes.
It's a long tail.
That doesn't mean you cannot solve it
just by training a statistical model on a lot of data.
Huge amount of data.
It's really a matter of scale.
But I guess what I'm saying is
if you have a vehicle that achieves level 5,
it is going to be able to deal with new situations.
Or, I mean, the data is so large
that the rate of new situations is very low.
That's not intelligence.
So if we go back to your kind of definition of intelligence, it's the efficiency.
With which you can adapt to new situations, to truly new situations, not situations you've seen before, right?
Not situations that could be anticipated by your creators,
by the creators of the system, but three new situations.
The efficiency with which you acquire new skills.
If you require, if in order to pick up a new skill,
you require a very extensive training data set of most possible situations that can occur
in the practice of that skill, then the system is not intelligent.
It is mostly just a, um, a lookup table.
Yeah.
Well, likewise, if, uh, in order to acquire a skill, you need a human engineer to
write down, uh, a bunch of rules that cover most or every possible situation.
Likewise, the system is not intelligent.
The system is merely the output artifact of a process that happens in the minds of
the engineers that are creating it.
Right?
It is encoding an abstraction that's produced by the human mind.
an abstraction that's produced by the human mind. And intelligence would actually be
the process of producing,
of autonomously producing this abstraction.
Yeah.
Not like, if you take an abstraction
and you encode it on a piece of paper
or in a computer program,
the abstraction itself is not intelligent.
What's intelligent is the agent that's capable
of producing these abstractions.
Yeah, it feels like there's a little bit of a
gray area.
Because you're basically saying
that deep learning forms abstractions
too.
But those abstractions
do not seem to be effective
for generalizing far
outside of the things that's already seen.
But generalize a little bit.
Yeah, absolutely.
No, deep learning does generalize a little bit.
Like generalization is not binary.
It's more like a spectrum.
Yeah.
And there's a certain point, it's a gray area, but there's a certain point where there's
an impressive degree of generalization that happens.
degree of generalization that happens no like i guess exactly what you were saying is uh intelligence is um how efficiently you're able to generalize
far outside of the distribution of things you've seen already yes so it's both like the
the distance of how far you can like how new, how radically new something is,
and how efficiently you're able to deal with that.
So you can think of intelligence as a measure
of an information conversion ratio.
Like imagine a space of possible situations.
And you've covered some of them.
So you have some amount of information about your space of possible situations.
That's provided by the situations you already know.
And that's, on the other hand, also provided by the prior knowledge that the system brings to the table,
the prior knowledge that's embedded in the system.
So the system starts with some information, right, about the problem, about the task.
So the system starts with some information, right,
about the problem, about the task.
And it's about going from that information to a program,
what we would call a skill program, a behavioral program, that can cover a large area of possible situation space.
And essentially the ratio between that area
and the amount of information you start with is intelligence.
area and the amount of information you start with is intelligence.
So a very smart agent can make efficient uses of very little information about a new problem and very little prior knowledge as well to cover a very large
area of potential situations in that problem without knowing what these future
new situations are going to be.
So one of the other big things you talk about in the paper, we've talked about a
little bit already, but let's talk about it some more, is actual tests of intelligence.
So if we look at like human and machine intelligence, do you think tests of
intelligence should be different for humans and machines or do you think tests of intelligence should be different
for humans and machines or how we think about testing of intelligence are these fundamentally
the same kind of uh intelligences that we're after and therefore the test should be similar
so if your goal is to create uhIs that are more human-like,
then it will be super valuable, obviously,
to have a test that's universal,
that applies to both AIs and humans,
so that you could establish a comparison
between the two that you could tell exactly how intelligent,
in terms of human intelligence, a given system is.
So that said, the constraints that apply to artificial intelligence and to human intelligence
are very different.
And your test should account for this difference.
Because if you look at artificial systems, it's always possible for an experimenter
to buy arbitrary levels of skill at arbitrary tasks,
either by injecting hard-coded prior knowledge
into the system via rules and so on
that come from the human mind,
from the minds of the programmers,
and also buying higher levels of skill
just by training on more data.
For instance, you could generate
an infinity of different Go games
and you could train a Go playing system that way,
but you could not directly compare it
to human Go-playing skills
because a human that plays Go had to develop that skill
in a very constrained environment.
They had a limited amount of time.
They had a limited amount of energy.
And of course, this started from a different set of priors.
This started from innate human priors.
So I think if you want to compare the intelligence of two systems,
like the intelligence of an AI and the intelligence of a human,
you have to control for priors.
You have to start from the same set of knowledge priors about the task,
and you have to control for experience, that is to say, for training data.
So what's priors?
So prior is whatever information you have about a given task before you start learning
about this task.
And how's that different from experience?
Well, experience is acquired, right?
And how's that different from experience?
Well, experience is acquired, right?
So for instance, if you're trying to play Go,
your experience with Go is all the Go games you've played or you've seen or you've simulated in your mind, let's say.
And your priors are things like,
well, Go is a game on a 2D grid
and we have lots of hard-coded priors
about the organization of 2D space.
And so rules of how the dynamics
of the physics of this game in this 2D space.
Yes.
And the idea that you have,
what winning is.
Yes, exactly.
And other board games
can also share some similarities with Go.
And if you've played these board games,
then with respect to the game of Go,
that would be part of your priors about the game.
Well, it's interesting to think about the game of Go
is how many priors are actually brought to the table.
When you look at self-play,
reinforcement learning-based mechanisms that do learning,
it seems like the number of priors is pretty low.
Yes.
But you're saying you should be...
There is a 2D special prior in the covenant.
Right.
But you should be clear at making those priors explicit.
Yes.
So in particular, I think if your goal
is to measure a human-like form of intelligence,
then you should clearly establish
that you want the AI you're testing
to start from the same set of priors
that humans start with.
Right.
So, I mean, to me personally,
but I think to a lot of people,
the human side of things is very interesting.
So testing intelligence for humans,
what do you think is a good test of human intelligence well that's the question that psychometrics is is interested in
what's there's an entire subfield of psychology that deals with this question so what's psychometrics
the psychometrics is the subfield of psychology that tries to measure, quantify aspects of the human mind.
So in particular, cognitive abilities, intelligence, and personality traits as well.
So what are, might be a weird question, but what are the first principles of psychometrics that operates on, you know,
what are the priors it brings to the table?
So it's a field with a fairly long history.
So, you know, psychology sometimes gets a bad reputation for not having very reproducible results.
And some psychometrics has actually some fairly solidly reproducible results.
So the ideal goals of the field is, you know,
tests should be reliable, which is a notion tied to reproducibility.
It should be valid, meaning that it should actually measure
what you say it measures.
So for instance, if you're saying that you're measuring intelligence, then your test results
should be correlated with things that you expect to be correlated with intelligence,
like success in school or success in the workplace and so on.
Should be standardized, meaning that you can administer your tests to many different people
in the same conditions, and it should be free from bias, meaning that, for instance,
if your test involves the English language, then you have to be aware that this creates a bias
against people who have English as their second language or people who can't speak English at all.
their second language or people who can't speak English at all.
So, of course, these principles for creating psychometric tests are very much an ideal.
I don't think every psychometric test is really either reliable,
valid, or free from bias.
But at least the field is aware of these weaknesses
and is trying to address them.
So it's kind of interesting.
Ultimately, you're only able to measure, like you said previously, the skill, but you're
trying to do a bunch of measures of different skills that correlate, as you mentioned, strongly
with some general concept of cognitive ability.
Yes, yes.
So what's the G factor?
So, right, there are many different kinds of tests of intelligence,
and each of them is interested in different aspects of intelligence.
Some of them will deal with language, some of them will deal with spatial vision,
maybe mental rotations, numbers, and so on.
When you run these very different tests at scale, what you start seeing is that
there are clusters of correlations among test results.
So for instance, if you look at homework at school, you will see that people who do well
at math are also likely statistically to do well in physics.
And what's more, there are also people who do well at math and physics
are also statistically likely to do well in things
that sound completely unrelated,
like writing an English essay, for instance.
And so when you see clusters of correlations
in statistical terms,
you would explain them with a latent variable.
And the latent variable that would, for instance, explain the relationship
between being good at math and being good at physics would be cognitive ability.
And the g-factor is the latent variable that explains the fact that every test
of intelligence that you can come up with results on this test
and they're being correlated.
So there is some single unique variable that explains this correlation.
So that's the g-factor.
So it's a statistical construct.
It's not really something you can directly measure, for instance, in a person.
But it's there.
But it's there.
It's there at scale.
And that's also one thing I want to mention about psychometrics.
Like, you know, when you talk about measuring intelligence in humans, for instance, some
people get a little bit worried.
They will say, you know, that sounds dangerous.
Maybe that sounds potentially discriminatory and so on.
And they're not wrong.
And the thing is, so personally, I'm not interested in psychometrics
as a way to characterize one individual person. Like if I get your psychometric personality
assessments or your IQ, I don't think that actually tells me much about you as a person.
I think psychometrics is most useful as a statistical tool.
So it's most useful at scale.
It's most useful when you start getting test results for a large number of people
and you start cross-correlating these test results
because that gives you information about the structure of the human mind,
in particular about the structure of human cognitive abilities.
So at scale, psychometrics paints a certain picture of the human mind,
and that's interesting.
And that's what's relevant to AI,
the structure of human cognitive abilities.
Yeah, it gives you an insight into it.
I mean, to me, I remember when I learned about G-factor,
remember when i learned about g factor it seemed um it it seemed like it would be impossible for it even it to be real even as a statistical variable like it felt uh kind of like astrology
like it's like wishful thinking among psychologists but uh the more i learned i realized that there's
some i mean i'm not sure what to make about human beings, the fact that the G factor is a thing.
That there's a commonality across all of human species,
that there does seem to be a strong correlation between cognitive abilities.
That's kind of fascinating.
Yeah, so human cognitive abilities have a structure,
like the most mainstream theory of the structure of cognitive abilities is called CHC theory. It's a
Kettle, Horn, Carroll. It's the name of
the three psychologists who contributed
key pieces of it.
And it describes cognitive
abilities as a
hierarchy with three levels. And at the
top, you have the G-factor. Then
you have broad cognitive abilities,
for instance, fluid intelligence,
right, that encompass a broad
set of possible kinds of tasks that are all related.
And then you have narrow cognitive abilities at the last level, which is closer to task-specific
skill.
There are actually different theories of the structure of connectivity
that just emerge from different statistical analysis
of IQ test results
but they all describe a hierarchy
with a kind of g-factor at the top
and you're right that the g-factor is
it's not quite real in the sense
that it's not something you can observe and measure like your height for instance but it's real in the sense that it's not something you can observe and measure,
like your height, for instance, but it's real in the sense that you see it in a statistical
analysis of the data, right?
One thing I want to mention is that the fact that there is a g-factor does not really mean
that human intelligence is general in a strong sense, does not mean human intelligence can
be applied to any problem at all
and that someone who has a high IQ is going to be able to solve any problem at all.
That's not quite what it means.
I think one popular analogy to understand it is the sports analogy.
If you consider the concept of physical fitness,
it's a concept that's very similar to intelligence
because it's a useful concept.
It's something you can intuitively understand.
Some people are fit, maybe like you.
Some people are not as fit, maybe like me.
But none of us can fly.
Absolutely.
It's constrained to a specific type of skill.
Even if you're very fit, that doesn't mean you can do anything at all in any environment.
You obviously cannot fly.
You cannot survive at the bottom of the ocean and so on.
And if you were a scientist and you wanted to precisely define and measure physical fitness in humans,
then you would come up with a battery of tests.
then you would come up with a battery of tests.
Like you would have running 100 meters,
playing soccer, playing table tennis, swimming, and so on.
And if you run these tests over many different people,
you would start seeing correlations in test results.
For instance, people who are good at soccer are also good at sprinting, right?
And you would explain these correlations with physical abilities that are strictly
analogous to cognitive abilities, right?
And then you would start also observing correlations between biological characteristics, like maybe
lung volume is correlated with being a fast runner, for instance, in the same way that
there are
neurophysical correlates
of cognitive abilities, right?
And at the top of the hierarchy
of physical abilities
that you would be able to observe,
you would have a G factor,
a physical G factor,
which would map to physical fitness, right?
And as you just said,
that doesn't mean that people with high physical fitness, right? And as you just said, that doesn't mean that people
with high physical fitness can't fly.
It doesn't mean human morphology
and human physiology is universal.
It's actually super specialized.
We can only do the things
that we were evolved to do, right?
Like we are not appropriate to,
you could not exist on Venus or Mars or in the void of space or the bottom of the ocean.
So that said, one thing that's really striking and remarkable is that our morphology generalizes far beyond the environments that we evolved for.
Like in a way you could say we evolved to run after prey in the savannah, right?
That's very much where our human morphology comes from.
And that said, we can do a lot of things that are completely unrelated to that.
We can climb mountains.
We can swim across lakes.
We can play table tennis. I mean, table tennis is very
different from what we were evolved to do, right? So our morphology, our bodies, our sensei motor,
our photences have a degree of generality that is absolutely remarkable, right? And I think
cognition is very similar to that. Our cognitive abilities have a degree of generality that goes far beyond what the mind
was initially supposed to do,
which is why we can play music and write novels
and go to Mars and do all kinds of crazy things.
But it's not universal in the same way
that human morphology and our body
is not appropriate for actually most
of the universe by volume.
In the same way, you could say that the human mind is not really appropriate
for most of problem space, potential problem space by volume.
So we have very strong cognitive biases, actually,
that mean that there are certain types of problems that we handle very well
and certain types of problems that we are completely inept for.
So that's really how we'd interpret the G factor.
It's not a sign of strong generality.
It's really just the broadest cognitive ability.
But our abilities, whether we are talking about sensory motor abilities or
cognitive abilities, they remain very specialized in the human condition.
Within the constraints of the human cognition, they're general.
Yes, absolutely.
But the constraints, as you're saying, are very limited.
I think what's limiting.
So we evolved, our cognition and our body evolved in very specific environments.
Because our environment was so variable, fast-changing, and so unpredictable,
part of the constraints that drove our evolution is generality itself.
So we were in a way evolved to be able to improvise in all kinds of physical or cognitive environments.
of physical or cognitive environments.
And for this reason, it turns out that the minds and bodies that we ended up with can be applied to much, much broader scope
than what they were evolved for.
And that's truly remarkable.
And that's a degree of generalization that is far beyond anything
you can see in artificial systems today.
generalization that is far beyond anything you can see in artificial systems today, right?
That said, it does not mean that human intelligence is anywhere universal.
Yeah, it's not general. You know, it's a kind of exciting topic for people, even outside of artificial intelligence, IQ tests.
I think it's Mensa, whatever.
There's different degrees of difficulty for questions.
We talked about this offline a little bit too
about sort of difficult questions.
What makes a question on an IQ test
more difficult or less difficult, do you think?
So the thing to keep in mind is that
there's no such thing as a question that's intrinsically difficult.
It has to be difficult with respect to the things you already know and the things you can already do, right?
So in terms of an IQ test question, typically it would be structured, for instance, as a set of demonstration input and output pairs.
And then you would be given a test input, a prompt,
and you would need to recognize or produce the corresponding output.
And in that narrow context, you could say a difficult question
is a question where the input prompt is very surprising and unexpected
given the training examples. Just even the nature of the patterns that you're observing in the
input prompt. For instance, let's say you have a rotation problem. You must rotate the shape
by 90 degrees. If I give you two examples and then I give you one prompt, which is actually one of the
two training examples, then there is zero generalization difficulty for the task.
It's actually a trivial task.
You just recognize that it's one of the training examples and you probably use the same answer.
Now, if it's a more complex shape, there is a little bit more generalization, but it remains that you are still doing the same thing at test time
as you were being demonstrated at training time.
A difficult task is a task that will require some amount of test time adaptation,
some amount of improvisation.
So consider, I don't know,
you're teaching a class on quantum physics or something.
If you wanted to kind of test the understanding that students have of the material,
you would come up with an exam
that's very different from anything they've seen, like on the internet when they
were cramming. On the other hand, if you wanted to make it easy, you would just give them something
that's very similar to the mock exams that they've taken, something that's just a simple
interpolation of questions that they've already seen. And so that would be an easy exam.
It's very similar to what you've been trained on. And a difficult exam is one that really
probes your understanding because it forces you to improvise. It forces you to do things
that are different from what you were exposed to before. So that said, it doesn't mean that the exam that requires improvisation is
intrinsically hard, right? Because maybe you're a quantum physics expert. So when you take the exam,
this is actually stuff that despite being new to the students, it's not new to you, right?
So it can only be difficult with respect to what the test taker already knows and with respect
to the information that the test taker has about the task.
So that's what I mean by controlling for priors, what you, the information you bring
to the table.
And the experience.
And the experience, which is the training data.
So in the case of the quantum physics exam, that would be all the course material itself and
all the mock exams that students might have taken online.
Yeah, it's interesting because I've also, I sent you an email and I asked you, like,
I've been, this, just this curious question of, you know, what's a really hard IQ test
question.
And I've been talking to also people who have designed IQ tests.
There's a few folks on the internet.
It's like a thing.
People are really curious about it.
First of all, most of the IQ tests they designed,
they like religiously protect against the correct answers.
Like you can't find the correct answers anywhere.
In fact, the question is ruined once you know,
even like the approach you're supposed to take.
So they're very...
That said, the approach is implicit in the training examples.
So if you release the training examples, it's over.
Well...
Which is why in ARC, for instance,
there is a test set that is private and no one has seen it.
No, for really tough IQ questions, it's not obvious.
It's not because of the ambiguity.
Like it's, I mean, we'll have to look through them,
but like some number sequences and so on,
it's not completely clear.
So like you can get a sense, but's like some you know when you look at a
number sequence i don't know uh like your feminace number sequence if you look at the
first few numbers that sequence could be completed in a lot of different ways
and you know some are if you think deeply are more correct than others. Like there's a kind of intuitive simplicity and elegance to the correct solution.
Yes.
I am personally not a fan of ambiguity in test questions, actually.
But I think you can have difficulty without requiring ambiguity
simply by making the test require a lot of extrapolation over the training examples.
But the beautiful question is difficult, but gives away everything when you give the training example.
Basically, yes.
Meaning that the tests I'm interested in creating are not necessarily difficult for humans,
because human intelligence is the benchmark.
They're supposed to be difficult for machines in ways that are easy for humans. I think an ideal
test of human and machine intelligence is a test that is actionable, that highlights
the need for progress, and that highlights the direction
in which you should be making progress.
I think we'll talk about the RAC challenge
and the test you've constructed
and you have these elegant examples.
I think that highlight,
like this is really easy for us humans,
but it's really hard for machines.
But on the, you know,
the designing an IQ test
for IQs of like higher than 160 and so on
you have to say you have to take that and put on steroids right you have to think like what
is hard for humans and that's a fascinating exercise in in itself i think
and it was an interesting question of what it takes to create a really hard
question for humans because um you again have to do the same process as you mentioned which is uh
you know something um basically where the experience that you have likely to have
encountered throughout your whole life,
even if you've prepared for IQ tests, which is a big challenge, that this will still be novel for you. Yeah. I mean, novelty is a requirement. You should not be able to practice for the questions
that you're going to be tested on. That's important. Because otherwise, what you're doing
is not exhibiting intelligence. What you're doing is not exhibiting intelligence what you're
doing is just retrieving uh what you've been exposed before it's is the same thing as a
deep learning model if you train a deep learning model on uh all the possible answers then it will
ace your test in the same way that uh um you know uh as a stupid student uh can still ace the test if they cram for it.
They memorize, you know, 100 different possible mock exams,
and then they hope that the actual exam will be a very simple interpolation of the mock exams.
And that student could just be a deep learning model at that point.
But you can actually do that without any understanding of the material.
And in fact,
many students pass the exams in exactly this way. And if you want to avoid that, you need an exam
that's unlike anything they've seen that really probes their understanding. So how do we design
an IQ test for machines, an intelligent test for machines? All right. So in the paper, I outline
a number of requirements that you expect of such a test. And in particular, we should start by
acknowledging the priors that we expect to be required in order to perform the test.
So we should be explicit about the priors, right? And if the goal is to compare machine intelligence
and human intelligence,
then we should assume human cognitive priors, right?
And secondly, we should make sure
that we are testing for skill acquisition ability,
skill acquisition efficiency in particular,
and not for skill itself,
meaning that every task featured in your test should be novel and should not be something that you can anticipate.
So for instance, it should not be possible to brute force the space of possible questions,
right?
To pre-generate every possible question and answer.
So it should be tasks that cannot be anticipated,
not just by the system itself,
but by the creators of the system, right?
Yeah. You know what's fascinating?
I mean, one of my favorite aspects of the paper
and the work you do with the ARC challenge
is the process of making priors explicit.
Just even that act alone is a really powerful one of like what are it's a it's a really powerful question ask of us humans what are the priors
that we bring to the table so the the next step is like once you have those priors, how do you use them to solve a novel task?
But like just even making the priors explicit is a really difficult and really powerful step.
And that's like visually beautiful and conceptually, philosophically beautiful part of the work you did with,
and I guess continue to do probably with the paper and the ARC challenge.
Can you talk about some of the priors that we're talking about here?
Yes. So a researcher that has done a lot of work on what exactly are the knowledge priors
that are innate to humans is Elisabeth Spelke from Harvard.
Spelke from Harvard.
So she developed the core knowledge theory
which outlines
four different
core knowledge systems.
So systems of knowledge that we are
basically either born with
or that we are
hardwired to acquire
very early on in our
development. And there's no
strong distinction between the two.
Like if you are primed to acquire a certain type of knowledge
in just a few weeks, you might as well just be born with it.
It's just part of who you are.
And so there are four different core knowledge systems.
Like the first one is the notion of objectness and basic physics.
Like you recognize that something that moves coherently, for instance, is an object.
So we intuitively, naturally, innately divide the world into objects
based on this notion of coherence, physical coherence.
And in terms of filamentary physics, there's the fact that objects can bump against each other
and the fact that they can occlude each other.
So these are things that we are essentially born with,
or at least that we are
going to be acquiring extremely uh early because we're really hardwired to acquire them so a bunch
of points pixels that move together are objects are partly the same object yes i mean uh i mean
that like i don't i don't smoke weed, but if I did,
that's something I could sit all night and just think about.
I remember when I first read your paper, just objectness.
I wasn't self-aware, I guess, of that particular prior.
That's such a fascinating prior.
That's the most basic one,
but just identity.
I just,
yeah.
Objectness.
I mean,
it's,
it's very basic,
I suppose,
but it's so fundamental.
It is fundamental to human cognition.
Yeah.
And,
uh,
uh,
the second prior that's also fundamental is agentness,
which is not a real world,
a real world. That so agent-ness.
The fact that some of these objects that you segment your environment into,
some of these objects are agents.
So what's an agent?
Basically, it's an object that has goals.
That has what?
That has goals.
They're capable of pursuing goals. So for instance, that has goals, the escapeable of pursing goals. So for instance,
if you see two dots moving in a roughly synchronized fashion, you will intuitively
infer that one of the dots is pursing the other. So that one of the dots is, and one of the dots
is an agent and its goal is to avoid the other dart.
And one of the darts, the other dart is also an agent
and its goal is to catch the first dart.
Pelkey has shown that babies, you know,
as young as three months identify agent-ness
and goal-directedness in their environment.
Another prior is basic, you is basic geometry and topology,
like the notion of distance, the ability to navigate in your environment and so on. This
is something that is fundamentally hardwired into our brain. It's in fact backed by very specific
neural mechanisms, like for instance, grid cells and place cells.
So it's something that's literally hard-coded
at the neural level in our hippocampus.
And the last prior would be the notion of numbers.
Like, numbers are not actually a cultural construct.
We are intuitively, innately able
to do some basic counting and to compare quantities. So it doesn't mean we can do
arbitrary arithmetic. Counting, the actual counting.
Counting like counting one, two, three-ish, then maybe more than three. You can also compare
quantities. If I give you three dots and five dots, you can tell
the side with five dots has more dots. So this is actually an innate prior.
So that said, the list may not be exhaustive. So SpellKey is still pursuing
the potential existence of new knowledge systems, for instance,
knowledge systems that would deal with social relationships.
Yeah, I mean...
Which is much less relevant to something like arc or IQ test.
Right. There could be stuff that's, like you said, rotation or symmetry.
Is it really interesting?
It's very likely that there is,
speaking about rotation,
that there is in the brain
a hard-coded system
that is capable of performing rotations.
One famous experiment
that people did in the,
I don't remember who it was exactly,
but in the 70s, was that people did in the, I don't remember who it was exactly, but in the 70s, was that people found that if you asked people,
if you give them two different shapes,
and one of the shapes is a rotated version of the first shape,
and you ask them, is that shape a rotated version of the first shape or not?
What you see is that the time it takes people to answer
is linearly proportional to the angle of rotation.
So it's almost like you have somewhere in your brain
like a turntable with a fixed speed.
And if you want to know if two objects
are rotated versions of each other,
you put the object on the turntable,
you let it move around a little bit
and then you stop when you have a match.
And that's really interesting.
So what's the arc challenge?
So in the paper, I outline all these principles
that a good test of machine intelligence
and human intelligence should follow.
And the ARC challenge is one attempt
to embody as many of these principles as possible.
So I don't think it's anywhere near a perfect attempt.
It does not actually follow every principle,
but it is what I was able to do given the constraints.
So the format of ARC is very similar to classic IQ tests,
in particular Raven's Progressive Matrices.
Raven's?
Yeah, Raven's Progressive Matrices.
I mean, if you've done IQ tests in the past,
you know what it is probably, or at least you've seen it,
even if you don't know what it's called.
And so you have a set of tasks, that's what they're called.
And for each task you have, um, uh, training data, which is a set of input and output pairs.
So I, uh, and, uh, an input or output pair is a grid of colors.
Basically the grid, the size of the grids is variables is the size of the grid is variable.
You're given an input and you must transform it into the proper output.
You're shown a few demonstrations of a task in the form of existing input-output pairs
and then you're given a new input, and you must produce the correct
output. And the assumptions in ARC is that every task should only require
core knowledge priors, should not require any outside knowledge. So for instance,
no language, no English, nothing like this. No concepts taken from
all human experience like trees, dogs, cats, and so on. So only reasoning tasks that are built on top of core knowledge priors.
And some of the tasks are actually explicitly trying to probe specific forms of abstraction.
Part of the reason why I wanted to create Arc is I'm a big believer in, you know, when you're faced with a problem
as murky as understanding how to autonomously generate abstraction in a machine, you have
to co-evolve the solution and the problem.
And so part of the reason why I designed ARCT was to clarify my ideas about the
nature of abstraction, right?
And some of the tasks are actually designed to probe bits of that theory.
And there are things that turn out to be very easy for humans to perform,
including young kids, right?
But turn out to be near impossible for machines.
So what have you learned from the nature of abstraction from designing that?
Can you clarify what you mean?
One of the things you wanted to try to understand was this idea of abstraction.
Yes. this idea of abstraction yes so clarifying my own ideas about abstraction by forcing myself
to produce tasks that would require the ability to produce that form of abstraction in order to
solve them got it okay so and by the way just to i mean people should check out i'll probably
overlay if you're watching the video part but the the grid input output with the different colors on the grid and that's it that's i mean it's a very simple world
but it's kind of beautiful it's it's very similar to classic iq test like it's not very original in
that sense the main difference with iq tests is that we make the priors explicit which is not
usually the case in IQ tests.
So you make it explicit that everything should only be built
out of core knowledge priors.
I also think it's generally more diverse than IQ tests in general.
And it perhaps requires a bit more manual work to produce solutions because you have to click around on a grid for a while sometimes the
grids can be as large as 30 by 30 cells so how did you come up if you can reveal with the questions
like what's the process of the questions was it mostly you yeah they came up with the questions
what how difficult is it to come up with a question? Like, is this scalable to a much larger number?
If you think, you know, with IQ tests,
you might not necessarily want it to
or need it to be scalable.
With machines, it's possible you could argue
that it needs to be scalable.
So there are a thousand questions.
A thousand?
A thousand tasks.
Yes.
Wow.
Including the test set, the prior test set.
I think it's fairly difficult
in the sense that a big requirement
is that every task
should be novel
and unique
and unpredictable.
You don't want to create
your own little world
that is
simple enough that it would be possible
for a human to reverse and generate
and write down an algorithm that could generate every possible arc task and their solutions for
instance that would completely invalidate the test so you're constantly coming up with new stuff you
need yeah you need a source of novelty of unfakable. And one thing I found is that as a human,
you are not a very good source of unfakeable novelty.
And so you have to pace the creation of these tasks quite a bit.
There are only so many unique tasks that you can do in a given day.
So that means coming up with truly original new ideas.
Did psychedelics help you at all no i'm just kidding but i mean that's fascinating to think about like so you
would be like walking or something like that are you constantly thinking of something totally new
yes i mean this is hard this is yeah i i mean, I'm not saying I've done anywhere near perfect job at it.
There is some amount of redundancy, and there are many imperfections in Arc.
So that said, you should consider Arc as a work in progress.
It is not the definitive state.
The Arc tasks today are not the definitive state of the test.
I want to keep refining it in the future.
I also think it should be possible to open up the creation of tasks
to a broad audience to do crowdsourcing.
That would involve several levels of filtering, obviously,
but I think it's possible to apply crowdsourcing
to develop a much bigger and much more diverse Arc dataset
that would also be free of potentially some of my own personal biases.
Does there always need to be a part of Arc that the test is hidden?
Yes, absolutely.
It is impressive that the tests that you're using to actually benchmark algorithms is not accessible to the people developing these algorithms.
Because otherwise, what's going to happen is that the human engineers are just going to solve the tasks themselves and encode their solution in program form.
But that, again, what you're seeing here is the process of intelligence
happening in the mind of the human,
and then you're just capturing its crystallized output.
But that crystallized output is not the same thing
as the process that's generated.
It's not intelligent in itself.
So what, by the way, the idea of crowdsourcing it
is fascinating.
I think the creation of questions
is really exciting for people.
I think there's a lot of really brilliant people out there
that love to create these kinds of stuff.
Yeah, one thing that kind of surprised me
that I wasn't expecting is that
lots of people seem to actually enjoy ARC
as a kind of game.
And I was really seeing it as a test,
arc as a kind of game.
And I was really seeing it as a test, as a benchmark
of
fluid general intelligence.
And lots of
people, including kids, just started
enjoying it as a game. So I think that's
encouraging.
Yeah, I'm fascinated by it. There's a world of people who create
IQ questions.
I think
that's a cool activity
for machines and for humans.
And people, humans are themselves fascinated
by taking the questions,
like measuring their own intelligence.
I mean, that's just really compelling.
It's really interesting to me too.
It helps.
One of the cool things about ARC,
you said it's kind of inspired byq tests or whatever follows a similar process but because
of its nature because of the context in which it lives it immediately forces you to think about the
nature of intelligence as opposed to just the test of your like it forces you to really think
there's i don't know if it's if it's within the question
inherent in the question or just the fact that it lives in the test that's supposed to be a test of
machine intelligence absolutely as you as you solve arc tasks as a human uh you will uh be
forced to basically introspect yeah how you how you come up with solutions. And that forces you to reflect on the human problem-solving process
and the way your own mind generates abstract representations
of the problems it's exposed to.
I think it's due to the fact that the set of core knowledge priors that ARC is built upon is
so small. It's all a recombination of a very, very small set of assumptions.
Okay. So what's the future of ARC? So you held ARC as a challenge as part of like a
Kaggle competition. And. CAGO competition.
And what do you think?
Do you think that's something that continues for five years, 10 years,
like just continues growing?
Yes, absolutely.
So ARC itself will keep evolving.
So I've talked about crowdsourcing. I think that's a good avenue.
Another thing I'm starting is I'll be collaborating with folks
from the psychology department at NYU to do human testing on Arc.
And I think there are lots of interesting questions you can start asking,
especially as you start correlating machine solutions to Arc tasks
and the human characteristics of solutions.
Like, for instance, you can try to see
if there's a relationship between
the human perceived difficulty of a task
and the...
Machine perceived.
Yes, and exactly some measure
of machine perceived difficulty.
Yeah, it's a nice playground
in which to explore this very difference.
It's the same thing as we talked about
with autonomous vehicles.
The things that could be difficult for humans might be very different than the things that...
Yes, absolutely.
And formalizing or making explicit that difference in difficulty may teach us something fundamental about intelligence.
So one thing I think we did well with ARC is that it's proving to be a very actionable test in the sense that
machine performance on Arc started at very much zero initially, while humans found actually the
tasks very easy. And that alone was like a big red flashing light saying that something is going on and that we are missing something.
And at the same time, machine performance did not stay at zero for very long.
Actually, within two weeks of the Kaggle competition, we started having a non-zero number.
And now the state of the art is around 20% of the test set solved.
And so ARC is actually a challenge where our capabilities start at zero,
which indicates the need for progress.
But it's also not an impossible challenge.
It's not accessible.
You can start making progress basically right away.
At the same time, we are still very far from having solved it.
And that's actually a very positive outcome of the competition
is that the competition has proven that
there was no obvious shortcut to solve these tasks.
Yeah, so the test held up.
Yeah, exactly.
That was the primary reason to use a Kaggle competition
is to check if some,
some, you know, clever person was, was going to hack, uh, the benchmark.
And that did not happen, right?
Like people while solving the tasks are essentially doing it, uh, uh, well, in a
way they're, they're, they're actually exploring some flaws of arc that we will
need to address in the future, especially they're essentially anticipating what sort of tasks
may be contained in the test set, right?
Right, which is kind of, yeah, that's the kind of hacking.
It's human hacking of the test.
Yes, that said, you know, with the state of the art,
it's like 20% are still very, very far from human level,
which is closer to 100%.
And I do believe that it will take a while
until we reach human parity on ARC.
And that by the time we have human parity,
we will have AI systems that are probably pretty close
to human level in terms of general fluid intelligence,
which is, I mean, they're not going to be necessarily human-like.
They're not necessarily, you would not necessarily recognize them as, you know, being an AGI.
But they would be capable of a degree of generalization
that matches the generalization
performed by human fluid intelligence.
Sure.
I mean, this is a good point in terms of general fluid
intelligence to mention.
In your paper, you describe different kinds
of generalizations, local, broad, extreme,
and there's a kind of a hierarchy that you form.
So when we say generalizations, what are we talking about?
What kinds are there?
Right.
So generalization is a very old idea.
I mean, it's even older than machine learning.
In the context of machine learning, you say a system generalizes if it can make sense of an input it has not yet seen.
And that's what I would call system-centric generalization.
Generalization with respect to novelty
for the specific system you're considering.
So I think a good test of intelligence
should actually deal with developer-aware generalization,
which is slightly stronger than system-centric generalization.
So developer-aware generalization would be the ability to generalize to novelty or uncertainty
that not only the system itself has not access to, but the developer of the system could not have access to either.
That's a fascinating meta-definition.
So the system is basically the edge case thing
we're talking about with autonomous vehicles.
Neither the developer nor the system
know about the edge cases they encounter.
So the system should be able to generalize
the thing that nobody expected, neither the designer of the training data, nor obviously the contents of the training data.
That's a fascinating definition.
You can see degrees of generalization as a spectrum.
from. And the lowest level is what machine learning is trying to do, is the assumption that any new situation is going to be sampled from a static distribution of possible situations,
and that you already have a representative sample of that distribution. That's your trained data.
And so in machine learning, you generalize to a new sample from a known distribution.
And so in machine learning, you generalize to a new sample from a known distribution.
And the ways in which your new sample will be new or different are ways that are already understood by the developers of the system.
So you are generalizing to known unknowns for one specific task.
That's what you would call robustness.
You are robust to things like noise, small variations, and so on, for one fixed known distribution that you know through your training
data. A higher degree would be flexibility in machine intelligence. So flexibility would be something like an L5 cell driving car,
or maybe a robot that can pass the coffee cup test,
which is the notion that you would be given a random kitchen
somewhere in the country and you would have to, you know,
go make a cup of coffee in that kitchen.
Right.
So flexibility would be the ability to deal with unknown unknowns.
So things that could not, uh, dimensions of viability that could not have been
possibly foreseen, uh, by the creators of the system within one specific task.
So generalizing to the long tail of situations, uh, in self-driving
for instance, would be flexibility.
So you have robustness, flexibility, and finally would have extreme generalization. to the long tail of situations in self-driving, for instance, would be flexibility.
So you have robustness, flexibility, and finally, you would have extreme generalization, which is basically flexibility, but instead of just considering one specific domain, like
driving or domestic robotics, you're considering an open-ended range of possible domains.
you're considering an open-ended range of possible domains.
So a robot would be capable of extreme generalization if, let's say, it's designed and trained for cooking, for instance.
And if I buy the robot,
and if it's able to teach itself gardening in a couple of weeks,
it would be capable of extreme generalization, for instance.
So the ultimate goal is extreme generalization.
Yes.
So creating a system that is so general
that it could essentially achieve human skill parity
over arbitrary tasks and arbitrary domains
with the same level of improvisation
and adaptation power as humans
when it encounters new situations.
And it would do so over basically the same range
of possible domains and tasks as humans
and using essentially the same amount
of training experience, of practice,
as humans would require.
That would be human level extreme generalization.
So I don't actually think humans are anywhere near the optimal intelligence bound if there
is such a thing.
So I think for humans or in general?
In general.
I think it's quite likely, you know, that there is a hard limit to how intelligent any system can be.
But at the same time, I don't think humans are anywhere near that limit.
Yeah, last time I think we talked, I think you had this idea that we're only as intelligent as the problems we face.
Sort of, Yes, intelligence...
We are upper-bounded by the problems.
In a way, yes.
We are bounded by our environments
and we are bounded by the problems we try to solve.
Yeah, yeah.
What do you make of Neuralink
and outsourcing some of the brain power,
like brain-computer interfaces?
Do you think we can expand our, augment our intelligence?
I am fairly skeptical of neural interfaces
because they are trying to fix one specific bottleneck
in human-machine cognition,
which is the bandwidth bottleneck,
input and output of information in the brain.
And my perception of the problem is that
bandwidth is not at this time a bottleneck at all,
meaning that we already have senses
that enable us to take in far more information
than what we can actually process.
Well, to push back on that a little bit,
to sort of play devil's advocate a little bit,
is if you look at the internet, Wikipedia, let's say Wikipedia,
I would say that humans, after the advent of Wikipedia,
are much more intelligent.
Yes, I think that's a good one. But that's also not about,
that's about externalizing our intelligence
via information processing systems,
external information processing systems,
which is very different from brain-computer interfaces.
Right, but the question is whether if we have direct access,
if our brain has direct access to Wikipedia without having...
Your brain already has direct access to Wikipedia.
It's on your phone.
And you have your hands and your eyes and your ears and so on
to access that information.
And the speed at which you can access it...
Is bottlenecked by the cognition.
I think it's already close, fairly close to optimal, which is why speed reading, for instance, does not work.
The faster you read, the less you understand.
But maybe it's because it uses the eyes.
So maybe...
I don't believe so.
I think, you know, the brain is very slow.
It's speaking operates, you know, the fastest things that happen in the brain at the level of 50 milliseconds, uh, forming a conscious art can potentially
take entire seconds, right?
And you can already read pretty fast.
So I think the speed at which you can, uh, uh, take information in, and even
the speed at which you can ask with information can only be very.
Incrementally improved.
I think that if you're a very, very fast typer,
if you're a very trained typer,
the speed at which you can express your thoughts
is already the speed at which you can form your thoughts.
Right, so that's kind of an idea
that there are fundamental bottlenecks to the human mind,
but it's possible that everything we have in the human mind
is just to be able to survive in the environment.
And there's a lot more to expand.
Maybe, you know, you said the speed of the thought.
So I think augmenting human intelligence
is a very valid and very powerful avenue, right?
And that's what computers are about.
In fact, that's what all of culture and civilization is about.
Culture is externalized cognition, and we rely on culture to think constantly.
Yeah, I mean, that's another way.
Not just computers, not just phones and the internet.
I mean, all of culture, like language, for instance,
is a form of externalized cognition.
Books are obviously externalized cognition.
Yeah, that's a good point.
And you can scale that externalized cognition
far beyond the capability of the human brain.
And you could see civilization itself,
of the human brain.
And you could see, you know, civilization itself is,
it has capabilities that are far beyond any individual brain.
And we'll keep scaling it because it's not rebound by individual brains.
It's a different kind of system.
Yeah, and that system includes non-human, non-humans.
First of all, it includes all the other biological systems,
which are probably contributing to the overall intelligence of the organism.
And then computers are part of it.
Non-human systems are probably not contributing much,
but AIs are definitely contributing to that.
Like Google Search, for instance, is a big part of it.
Yeah.
Yeah.
A huge part.
A part that we can't probably introspect. Like how the world has
changed in the past 20 years. It's probably very difficult for us to be able to understand
until, of course, whoever created the simulation we're in is probably doing metrics, measuring
the progress. There was probably a big spike in performance uh they're enjoying they're enjoying
this so what are your thoughts on um the touring test and the lobner prize which is the
you know one of the most famous attempts at the test of human intelligence sorry, of artificial intelligence by, uh, doing a natural
language, open dialogue tests that's test that's, uh, judged by humans
as far as how well the machine did.
So I'm, I'm not a fan of the Turing test itself or any of
its variants for two reasons.
for two reasons.
So first of all,
it's really copping out of trying to define and measure intelligence
because it's entirely outsourcing that
to a panel of human judges.
And these human judges,
they may not themselves have any proper methodology. They may not themselves have any proper methodology.
They may not themselves have any proper definition of intelligence.
They may not be reliable.
So the Turing test is already failing one of the core psychometrics principles,
which is reliability because you have biased human judges.
It's also violating the standardization requirement and the freedom from bias requirement.
And so it's really a cop-out because you are outsourcing everything that matters,
which is precisely describing intelligence and finding a standalone test to measure
it.
You're outsourcing everything to people.
So it's really a cop-out.
And by the way, we should keep in mind
that when Turing proposed
the imitation game,
it was not meaning
for the imitation game
to be an actual goal
for the field of AI,
an actual test of intelligence.
He was using the imitation game
as a thought experiment
in a philosophical discussion in his 1950 paper.
He was trying to argue that theoretically it should be possible for something very much like the human mind, indistinguishable from the human mind, to be encoded in a Turing machine. And at the time, that was a very daring idea. It was a stretching credulity. But nowadays,
I think it's fairly well accepted that the mind is an information processing system and
that you could probably encode it into a computer. So another reason why I'm not a fan of this type of test
is that the incentives that it creates
are incentives that are not conducive
to proper scientific research.
If your goal is to trick,
to convince a panel of human judges
that they're talking to a human,
then you have an incentive to rely on tricks and prestidigitation.
In the same way that, let's say, you're doing physics
and you want to solve teleportation.
And what if the test that you set out to pass
is you need to convince a panel of judges that teleportation took place and they're just
sitting there and watching what you're doing. And that is something that you can achieve with,
you know, David Copperfield could achieve it in his show at Vegas, right? And what he's doing
is very elaborate, but it's not actually, it's not physics.
It's not making any progress
in our understanding of the universe, right?
But to push back on that,
it's possible,
that's the hope
with these kinds of subjective evaluations,
is that it's easier to solve it generally
than it is to come up with tricks
that convince a large number of judges.
That's the hope.
In practice,
what it turns out is that it's very easy to judges that's the whole in practice what it turns
out that it's very easy to deceive people in the same way that you know you can you can do magic
in vegas you can actually very easily convince people uh that they are talking to human when
they're actually talking to an algorithm i just disagree i disagree with that i think it's easy
i would i would push that it's not easy it it's doable. It's very easy because-
I wouldn't say it's very easy though.
We are biased.
Like we have theory of mind.
We are constantly projecting emotions, intentions.
Yes.
Agentness.
Agentness is one of our core innate priors, right?
We are projecting these things on everything around us.
Like if you paint
a smiley on a rock the rock becomes happy you know eyes and because we have this uh extreme
bias that permits everything everything we see around us it's actually pretty easy to trick
people i just disagree with that i just this i so totally disagree with that you brilliantly put
there's a huge,
the anthropomorphization that we naturally do,
the agentness of that word.
Is that a real word?
No, it's not a real word.
I like it.
But it's a good word.
It's a useful word.
It's a useful word.
Let's make it real.
It's a huge help.
But I still think it's really difficult to convince.
If you do like the Alexa Prize formulation,
where you talk for an hour.
There's formulations of the test you can create where it's very difficult.
So I like the Alexa Prize better
because it's more pragmatic.
It's more practical.
It's actually incentivizing developers
to create something that's useful
as a human machine interface.
So that's slightly better than just the imitation.
So I like it.
Your idea is like a test which hopefully will help us
in creating intelligent systems as a result.
Like if you create a system that passes it,
it'll be useful for creating further intelligent systems.
Yes, at least.
Yeah.
I mean, just to kind of comment,
I'm a little bit surprised
how little inspiration people draw
from the Turing test today.
You know, the media and the popular press
might write about it every once in a while.
The philosophers might talk about it.
But like most engineers
are not really inspired by it.
And I know you don't like the Turing test,
but we'll have this argument another time.
There's something inspiring about it, I think.
As a philosophical device in a philosophical discussion,
I think there is something very interesting about it.
I don't think it is, in practical terms,
I don't think it's conducive to progress. And one of the reasons
why is that I think being very human-like, being undistinguishable from a human is actually the
very last step in the creation of machine intelligence. The first AI that will show strong
strong generalization that will actually implement human-like broad cognitive abilities, they will not actually behave or look anything like humans. Human likeness is the very last step in that
process. And so a good test is a test that points you towards the first step on the ladder, not
towards the top of the ladder, right?
So to push back on that,
I usually agree with you on most things.
I remember you, I think, at some point,
tweeting something about the Turing test
not being counterproductive or something like that.
And I think a lot of very smart people agree with that.
I, you know, computation speaking,
not very smart person, disagree with that
because I think there's some magic to the interactivity,
interactivity with other humans.
So to push, to play devil's advocate on your statement,
it's possible that in order to demonstrate
the generalization abilities of a system,
you have to show your ability in conversation, show your ability to
adjust, adapt to the conversation through, not just like as a standalone system,
but through the process of like the interaction, like game theoretic, where you really are
changing the environment by your actions.
So in the ARC challenge, for example, you're an observer.
You can't scare the test into changing.
You can't talk to the test.
You can't play with it.
So there's some aspect of that interactivity
that becomes highly subjective,
but it feels like it could be conducive to generalizability.
Yeah, I think you make a great point.
The interactivity is a very good setting to force a system
to show adaptation, to show generalization.
That said, at the same time,
it's not something very scalable because you rely on human judges.
It's not something reliable because the human judges may not, may not.
So you don't like human judges.
Basically.
Yes.
And I think so.
I, I love the idea of interactivity.
Um, I initially wanted an arc test, uh, that had some amount of interactivity
where your score on a task would not be one or zero, if you can solve it or not,
but would be the number, um, of the number of attempts that you can make before
you hit the right solution. Which means that now you can start applying the scientific method
as you solve arc tasks, that you can start formulating hypotheses and probing the system
to see whether the observation will match the hypothesis or not.
It would be amazing if you could also,
even higher level than that,
measure the quality of your attempts,
which of course is impossible.
But again, that gets subjective.
Yes.
Like how good was your thinking?
Like it's the...
Yeah, how efficient was...
So one thing that's interesting about this notion
of scoring you as how many attempts you need is that you can start producing tasks
that are way more ambiguous, right?
Right.
Because with the different attempts,
you can actually probe that ambiguity, right?
Right.
So that's in a sense, which is how good can you adapt to the uncertainty and reduce the uncertainty?
Yes. It's how fast is the efficiency with which you reduce uncertainty in program space. Exactly.
Very difficult to come up with that kind of test though.
Yeah. So I would love to be able to create something like this. In practice, it would be very, very difficult, but yes.
But I mean, what you're doing,
what you've done with the ARC challenge is brilliant.
I'm also not, I'm surprised that it's not more popular,
but I think it's picking up.
It does its niche, it does its niche, yeah.
Yeah.
What are your thoughts about another test
that I talked with Marcus Hutter?
He has the Hutter Prize he has the harder prize for compression of human knowledge. And the idea is really sort of quant quantify, like reduce the test of
intelligence purely to just the ability to compress what's your thoughts about
this intelligence as compression.
I mean, it's a, it's a very, uh, fun test because fun test because it's such a simple idea.
Like you're given Wikipedia,
basically English Wikipedia,
and you must compress it.
And so it stems from the idea
that cognition is compression,
that the brain is basically
a compression algorithm.
This is a very old idea.
It's a very, I think,
striking and beautiful idea.
I used to believe it.
I eventually had to realize that it was very much a flawed idea.
So I no longer believe that cognition is compression.
But I can tell you what's the difference.
So it's very easy to believe that cognition and compression are the same thing.
Because, so Jeff Hawkins, for instance, says that cognition is prediction. And of course,
prediction is basically the same thing as compression, right? It's just including the
temporal axis. And it's very easy to believe this because compression is something that we do all the time very naturally.
We are constantly, you know, compressing information.
We are constantly trying.
We have this bias towards simplicity.
We are constantly trying to organize things in our mind and around us to be more regular, right?
So it's a beautiful idea.
It's very easy to believe.
right so uh it's it's a beautiful idea it's very easy to believe uh there is a big difference between uh what we do with our brains and and compression so compression is actually kind of
a tool in the human cognitive toolkits that is used in many ways but it's just a tool it is not
it is a tool for cognition it is not itself. And the big fundamental difference is that cognition is about being able to operate in future situations that include fundamental uncertainty and novelty.
So, for instance, consider a child at age 10.
And so they have 10 years of life experience, they've gotten
pain, pleasure, rewards, and punishment in that period of time. If you were to generate
the shortest behavioral program that would have basically run that child over these 10 years in an optimal way, right? The shortest optimal
behavioral program given the experience of that child so far. Well, that program, that compressed
program, this is what you would get if the mind of the child was a compression algorithm essentially,
utterly enable inappropriate to process the next 70 years in the life of that child.
So in the models we build of the world, we are not trying to make them actually optimally compressed.
We are using compression as a tool to promote simplicity and efficiency in our models.
But they are not perfectly compressed because they need to include things that are seemingly useless today, that have seemingly been useless so far.
But that may turn out to be useful in the future because you just don't know the future.
the future because you just don't know the future. That's the fundamental principle that cognition, that intelligence arises from, is that you need to be able to run appropriate behavioral programs,
except you have absolutely no idea what sort of context, environment, and situation they're going
to be running in. You have to deal with that uncertainty, with that future novelty.
that uncertainty, with that future novelty.
So an analogy that you can make is with investing, for instance.
If I look at the past 20 years of stock market data and I use a compression algorithm to figure out the best trading strategy, it's going to be, you buy Apple stock, then maybe
the past few years you buy Tesla stock or something. But is that strategy still going to be true for the next 20
years? Well, actually, probably not. Which is why if you're a smart investor, you're not just going
to be following the strategy that corresponds to compression of the past,
you're going to be following,
you're going to have a balanced portfolio.
Yeah.
Right?
Because you just don't know what's going to happen. You're also able to anticipate totally new things.
I mean, I guess in that same sense,
the compression is analogous to what you talked about,
which is like local or robust generalization
versus extreme generalization.
It's much closer to that side of being able to generalize in the local sense.
That's why, you know, as humans, when we are children, in our education,
so a lot of it is driven by play, it's driven by curiosity.
We are not efficiently compressing things.
We're actually exploring.
We are, um, retaining all kinds of, uh, uh, things from our environment that,
that's, that seem to be completely useless because they might turn
out to be eventually useful.
Right.
And it's, it's, that's what cognition is really about.
And that what makes it antagonistic
to compression is that it is about hedging for future uncertainty and that's officially
to compression yes officially hedging so cognition leverages compression as a tool to promote uh to
promote efficiency right and so in that sense in our models it's like einstein said uh make it simpler but not
however that quote goes but not too simple so you want to compression simplifies things but you
don't want to make it too simple yes so a good model of the world is going to include all kinds
of things that are completely useless actually just because just in case yes because you need diversity in the same way that in your portfolio, you need all kinds
of stocks that may not have performed well so far, but you need diversity.
And the reason you need diversity is because fundamentally you don't know what you're
doing.
And the same is true of the human mind, is that it needs to behave appropriately in a
future.
And it has no idea what the future is going to be like, but it's not going to be in the future. And it has no idea what the future is gonna be like.
But it's not gonna be like the past.
So compressing the past is not appropriate
because the past is not,
is not proactive with the future.
Yeah, history repeats itself, but not perfectly.
I don't think I asked you last time
the most inappropriately absurd question.
We've talked a lot about intelligence, but the bigger question from intelligence is of meaning.
Intelligence systems are kind of goal-oriented. They're always optimizing for goal.
If you look at the Hutter Prize, actually, I mean, there's always a clean formulation of a goal. But the natural question for us humans, since we don't know our
objective function, is what is the meaning of it all? So the absurd question is, what,
Francois, do you think is the meaning of life? What's the meaning of life? Yeah, that's a big question.
And I think I can give you my answer, or at least one of my answers.
So you know, the one thing that's very important in understanding who we are, is that everything that makes up ourselves,
that makes up who we are,
even your most personal thoughts,
is not actually your own, right?
Like even your most personal thoughts
are expressed in words that you did not invent
and are built on concepts and images
that you did not invent and are built on concepts and images that you did not invent.
We are very much cultural beings, right?
We are made of culture.
That's what makes us different from animals, for instance, right?
So everything about ourselves is an echo of the past,
an echo of people who lived before us, right?
That's who we are.
And in the same way, if we manage to contribute something
to the collective edifice of culture,
a new idea, maybe a beautiful piece of music,
a work of art, a grand theory, a new world maybe, that something is going to become
a part of the minds of future humans, essentially forever. So everything we do creates ripples that that propagates into the future. And that's, in a way, this is our path to immortality,
is that as we contribute things to culture,
culture in turn becomes future humans.
And we keep influencing people
thousands of years from now.
So our actions today create reports.
And these reports, I think, basically sum up the meaning of life.
Like in the same way that we are the sum of the interactions
between many different reports that came from our past,
we are ourselves creating reports that will propagate into the future.
And that's why, you know, we should be,
this seems like perhaps an A thing to say,
but we should be kind to others during our time on Earth
because every act of kindness creates reports.
And in reverse, every act of violence also creates ripples.
And you want to carefully choose which kind of ripples you want to create.
And you want to propagate into the future.
And in your case, first of all, beautifully put, but in your case, creating ripples into the future human and future AGI systems.
Yes.
It's fascinating.
All six of us.
I don't think there's a better way to end it, Francois,
as always for a second time,
and I'm sure many times in the future.
It's been a huge honor.
You're one of the most brilliant people
in the machine learning, computer science, science world.
Again, it's a huge honor.
Thanks for talking today.
It's been a pleasure.
Thanks a lot for having me.
I really appreciate it.
Thanks for listening to this conversation
with Francois Chollet.
And thank you to our sponsors,
Babbel, Masterclass, and Cash App.
Click the sponsor links in the description
to get a discount and to support this podcast.
If you enjoy this thing, subscribe on YouTube,
review it with five stars on Apple Podcasts,
follow on Spotify, support on Patreon,
or connect with me on Twitter at Lex Friedman.
And now let me leave you with some words
from Rene Descartes in 1668,
an excerpt of which Francois includes
in his On the Measure of Intelligence paper.
If there were machines which
bore a resemblance to our bodies and imitated our actions as closely as possible for all practical
purposes, we should still have two very certain means of recognizing that they were not real men.
The first is that they could never use words or put together signs as we do in order to declare our thoughts to others.
For we can certainly conceive of a machine so constructed that it utters words and even utters
words that correspond to bodily actions causing a change in its organs. But it is not conceivable
that such a machine should produce different arrangements of words so as to give an appropriately
meaningful answer to whatever is said
in his presence as the dullest of men can do. Here Descartes is anticipating the Turing test,
and the argument still continues to this day. Secondly, he continues, even though some machines
might do some things as well as we do them, or perhaps even better, they would inevitably fail
in others,
which would reveal that they are acting not from understanding, but only from the disposition of
their organs. This is an incredible quote. Whereas reason is a universal instrument which can be used
in all kinds of situations, these organs need some particular action. Hence, it is
for all practical purposes impossible for a machine to have enough different organs to make it act
in all the contingencies of life in the way in which our reason makes us act. That's the debate
between mimicry memorization versus understanding. So, thank you for listening and hope to see you next time.