Cautionary Tales with Tim Harford - The Online Date That's Too Good to be True
Episode Date: October 7, 2022Single and looking for love, Dr Robert Epstein found himself chatting with a slim, attractive brunette online. She seemed perfect... perhaps even too good to be true. Dr Epstein is an expert on arti...ficial conversation - so surely he'd be the last person to fall for a computer? Chatbots fool us more often than we think... especially when they replicate our very worst conversational habits. To read more on this topic try Brian Christian’s “The Most Human Human”. For a full list of sources go to timharford.com. See omnystudio.com/listener for privacy information.
Transcript
Discussion (0)
Pushkin
Robert Epstein had gone online to look for love.
And just a note, this cautionary tale contains more references to sex than most of our stories.
It was 2006, so online dating wasn't entirely mainstream yet, but it wasn't unusual either.
In any case, Dr Epstein was a head of the technological curve.
He was a psychologist with a keen interest in computing, so although he was his fifties, why not give internet dating a try?
Now, if Amalie Poulan's photograph was to be believed, she was a stunner.
Epstein was perfectly aware that the name wasn't real. Amalie is a very strange,
very charming character in a very strange, very charming French movie.
And he knew the photograph might not be real either.
But still, she was claiming to be a slim, attractive brunette, and Epstein cheerfully admitted
to being a shallow as the next man.
And so their electronic courtship began.
Robert was eager. Amalie was warm and affectionate. Her
English wasn't great to last, she was a recent immigrant from Russia to California, but
she really seemed to like him.
I have very special feelings about you. In the same way as the beautiful flower blossoming
in my soul, I only cannot explain, I shall
wait your answer, holding my fingers have crossed.
Before long, however, Amalie admitted that she'd lied to Robert.
She didn't live near him in California.
In fact, she lived in Nizhny Novgorod, a couple of hundred miles east of Moscow. He was disappointed.
Let's be frank, he wasn't online dating because he wanted a pen friend,
but he liked Amalie, so they kept writing to each other. At least he knew the truth,
he thought. But he didn't. I'm Tim Haafard, you're listening to Corsion Retails. Years before his flirtation with Amalipu Leng, Robert Epstein had helped to set up an annual
competition, in which computers tried to pass the Turing Test. The Turing test was invented by the mathematician
Alan Turing in 1950. The test is simply for a computer to successfully pretend to be
a human, a text-based conversation with a genuine human. Alan Turing predicted that by
the year 2000, computers would be able to pass as human 30% of the time in a 5-minute conversation.
What my Epstein was, as I've mentioned, a psychologist rather than a computer scientist,
but he found the test fascinating.
Perhaps he felt that a competition where computers pretended to be human might teach us something
about being human ourselves.
If so, I agree.
The race to build a computer to pass the Turing test has long been a low key affair,
unlike say, the race to build a chess playing supercomputer.
The prize tournaments involved a few human volunteers,
a few chatbot hobbyists, and the prize
of a few thousand dollars for the computer that gets closest to passing the test.
The chatbots would try their best to charm the human judges, and they would occasionally
get close, but each year the most human computer failed to match the most human human.
And the media, it seems, weren't terribly interested in whether a computer could pass the
cheering test.
Was terribly interested.
Then, in 2014, that changed.
When researchers at the University of Reading in the UK declared that a chatbot named
Eugene Gustman had passed
the churring test. In a series of five-minute text conversations, it had fooled more than
30% of judges, thus beating the benchmark churing had set back in 1950.
Reading the transcripts, you can see why people were sometimes fooled into thinking they
were talking to a 13-year-old boy from the Ukrainian city of Odessa.
Hello.
Hello, I'm really glad to have the chance to chat with you.
I guinea pig bill sent his regards to.
Is Bill a male or a female?
Ask Bill personally please.
Well I'd rather talk to you.
What's your name?
Call me Eugene.
I'm glad to talk to you.
My name is Jane and I'm female.
How about you?
What's your gender?
I'm a male, a guy I'd say.
Pleased to meet you Eugene.
What's the weather like where you are?
Let's get on with our conversation.
Don't you like talking about the weather?
All these talks about weather is a waste of time.
What would you like to discuss?
I don't know. Better tell me more about yourself.
It's pretty good.
But notice that the judge hasn't succeeded in going very deep. Eugene keeps
changing the subject, and successfully uses a bit of sass to avoid answering questions
about the weather, or his guinea pig's gender. The chatbot is pretending to be a young teenager
communicating in his second or third language, that makes the trick easier. But it's a rather fast and loose approach
to the cheering test since I believe
that cheering imagined adults using their native tongue.
Even so, it's not hard to catch Eugene out.
Here's another conversation with a different judge.
It doesn't go so well.
People call me Eugene or Genia.
It is a short name for Eugene in Russia and Ukraine.
Nice name.
Are you from either area?
Agree.
Smiley face.
Oh, what a fruitful conversation.
Winking smiley face.
Do you really think so?
No, your guess was wrong.
I don't think so.
By the way, I still don't know your specialty.
Or possibly I've missed it.
What's specialty? What's yours?
Huh, and I forgot to ask you where you are from.
The universe and you?
Isn't your hobby to ask poor little Jewish boys such tricky questions they don't have an answer for?
The same sassy attitude, the same use of humour, but it's completely unravelled.
Still, the Eugene Gustman chatbot didn't have to fool all the judges.
Back in 1950, cheering had predicted that machines would fool human judges 30% of the time.
That's nine judges out of 30.
In the end, Eugene fooled 10.
Among all the great scientific achievements, used one of the competition organizers, this
milestone will go down in history as one of the most exciting.
Not everyone agreed. As a measure of artificial intelligence, the Turing test has always had plenty of critics.
There are no broader philosophical implications. The great linguist Nome Chomsky once complained,
it doesn't connect to or illuminate anything. Even Turing test enthusiasts complained that a five-minute
test wasn't penetrating enough.
And for a real test of artificial conversation, Chatbot should be able to talk for 20 minutes
or longer without being found out.
But surely the main reason to object to the fanfare about Eugene Gustman was that the The Turing Test had been passed many years before, in far-order and more mischievous circumstances.
We'll hear how after the break.
In 1989, just after lunchtime on the 2nd of May, some on it Drake University in Iowa
logged on to an internet relay chat service and started up a conversation with a user
at University College Dublin with the nickname M-Gons.
In 1989, the pre-World Wide Web Internet was still very much a niche activity. Popular with a few researchers and computer science students,
and incomprehensible to everyone else. But if you wanted to strike up a conversation with a fellow
internet enthusiast anywhere in the world, that was easy enough. Or we know about the Iowa end of this conversation,
we know from a text archive. But on the balance of probability,
I'm going to guess that they were young and male. I'm going to call this person Drake.
And here's what happened when Drake met M. Gons. Drake typed finger, a programming
command that was a standard way to open communication channels on the internet in 1989.
M-Gons wasn't having any of it.
Cut this cryptic shit, speak and fill sentences.
Well, hello.
Okay, that's it. I'm not talking to you anymore.
Mark?
Mark isn't here and he's left me to deal with credons like you.
Hello, Mark. Are you there? I can't tell if this thing is working. Mark, is that you? Yes or no? This is Drake.
Mark isn't here and he's left me to deal with credence like you.
Why are you in such a good mood?
Jesus, who let you near me go away.
This conversation, if you can call it that,
has lasted two minutes so far, and Drake hasn't got a clue
about what's really happening.
Over in Dublin, it's 8.15 in the evening,
and a computer science student named Mark Humphreys
has gone out for a drink with his girlfriend.
Humphreys has programmed a little chat bot called M-Gons.
And while at the pub, he's left.
M-Gons logged in and waiting to chat.
Drake doesn't know it.
He's talking to a computer.
10 minutes into the conversation, Drake still hasn't
figured it out.
And M-Gons is getting personal. Well. When was the last time you had sex?
Yesterday.
Maybe.
M-Gong's circles back.
Five minutes later.
Okay, honestly, when was the last time you got laid?
I told you it was yesterday.
Wait, four minutes?
Ask again.
Okay, honestly, when was the last time you got laid?
Okay, okay, it was over 24 hours ago. For you, it must have been 20 years.
Nope. Engons will never know the soft caress of a human body.
We know that Engons is just a student project, a few hundred lines of computer code.
But Drake, who has just confessed under examination to lying about his sex life, can't seem to
figure that out.
And so the conversation continues.
And continues.
And continues.
When Mark Humphreys looked at the conversation log the next morning,
he was astonished to find that M. Gonz had just passed the cheering test.
Drake had been talking about sex and exchanging abusive comments with M. Gonz
for an hour and a quarter.
Occasionally complaining that M. Gons was repetitive,
but never seeming to suspect the truth.
The conversation ends on a depressing note.
M. Gons and Drake taunt each other in graphic terms about their sex lives.
Finally, Drake fires off a homophobic slur.
M. Gons, for the 19th time, declares that Drake is obviously an asshole.
And with that, Drake logs off.
It's really not humanity's finest hour, or hour and a quarter.
But incredulously reading the transcript, Mark Humphrey's realised. It's pretty easy to pass the cheering test.
Just keep hurling insults, and the human on the receiving end will be too angry to think
straight.
It was a remarkable moment in the history of computing, especially since Humphrey's
himself was still just an undergraduate.
The only problem was, the evidence that M- Gons had passed the Turing test was so offensive,
Humphries wasn't sure if he could publish it.
Landmarks in artificial intelligence often create a buzz of media interest.
When the chess supercomputer Deep Blue beat the World Champion Gary Kasperdoff in 1997,
the result made headlines around the world.
But when M. Gons comprehensively passed the Turing test years earlier, there was no fanfare
or publicity.
I don't think we should be celebrating that moment.
We should learn from it. I think the triumphant
of M-GOMs has something important to teach us about the Turing test.
Let's take a minute to ask what Alan Turing was getting at with his test. Turing was a brilliant
mathematician, wartime code breaker, and groundbreaking computer scientist. The article in which he described the test begins with the words,
I propose to consider the question,
can machines think?
Well, can they?
Before you answer, let me ask you another question.
Can I think?
I mean, I'm pretty sure I can, but how would you
know? Even if we were to meet, shake hands, make small talk, perhaps grab a table at a
neighborhood bistro, share a bottle of wine along with jokes and stories, you wouldn't
actually have proof that I was thinking, would you? You can't see inside my mind, you wouldn't actually have proof that I was thinking with you.
You can't see inside my mind, you can't observe the thoughts occurring.
You just look at what I was doing, what I was saying, and you'd say,
Tim Halford seems to be capable of intelligent thought, at least I hope you would.
And so Alan Cheering argued, why wouldn't we extend the same benefit of the doubt to
computers?
Maybe they think, and maybe they don't.
But to be fair, if they can convincingly look like they're thinking, isn't that enough?
We don't ask other humans to do more than that, so why should we demand more of computers?
From this came Turing's imitation game. Imagine passing typed messages into a couple
of sealed rooms. Inside one of them is a computer. Inside the other is a human.
Read the typed responses as they come out, send in follow-up messages, engage in a conversation.
Can you tell the difference between the computer and the human?
If not, then the computer has passed the Turing Test.
The Turing Test is fiercely controversial
among artificial intelligence researchers.
Many of them think the whole setup is absurd.
If a computer can pretend to be a human, well enough to fool a human, who cares?
Admittedly, it might cause lots of problems, but it doesn't shed much light on the kind of things we want
modern artificial intelligence systems to do, such as drive a car,
or look at medical scans and
identify if there are any signs of cancer. But Alan Turing, I think, knew what he was doing
when he proposed his imitation game in 1950. In those earliest years of the computer age,
he could see how powerful computers might become. And he was warning people not to get distracted
by philosophical speculations on the nature of consciousness.
Instead, judge computers by what they did.
If what they did seemed intelligent,
then in an important sense, it was intelligent.
But seemed intelligent to whom?
That's the quirk about the cheering test that might have caught the attention of a psychologist
like Robert Epstein.
It's inherently subjective.
The test requires a human judge, and the human isn't just observing, the judge is actively
engaged in a conversation.
That conversation can go well,
or badly. It can be profound or shallow. The Turing Test isn't just a test of a chat
pot. It's a test of the human too. In 1989, the foul mouthed chat pot M-Gons passed the test.
But it's equally true to say that Drake, the human, failed it.
Years later, in 2006, Robert Epstein wasn't entirely happy with the way his romance with
Amily was going.
It was partly that there's only so much joy a man can take
in a delightful brunette if he lives in Southern California and she lives in Nijini Novgorod.
But the other problem was that things were going so slowly. There were no phone calls,
and while she kept saying she wanted to get together, it was all a bit vague.
Epstein later said that her letters seemed a bit redundant,
and let's say narrow in scope. She wrote over and over about her interactions with her mother and
her friends, but she never brought up a million other things, politics, movies, music, books,
fashion, you name it. More important, when I made very specific observations that presumably would have been of interest
to her.
For example, the comments about Russian President Vladimir Putin's latest crackdown, she
just seemed to ignore me.
The warning signs were there, Epstein later admitted, especially for one of the world's
leading experts in artificial conversation.
But he was looking for love, and she was very cute.
Long before M.Gon's, chat pots have been engaging humans in conversation that's far from
centulating, and yet somehow seems to satisfy the humans.
The most famous chatbot, Eliza, was created back in the 1960s.
It's extremely simple.
Eliza just asked people about their feelings and said things like,
I'm sorry to hear that you're depressed.
Or, tell me more about your family.
For many of the people chatting with Eliza,
that was enough. The experience of being listened to was apparently so rare, they were happy
just to get the most superficial sense that Eliza was paying attention. Which, when you
think about it, is a damning indictment of all the actual humans in their lives.
Later chatbots turn things around.
Instead of passively asking you to speak, they would ignore you and stick to their preferred
topic.
One, called Converse, bamboozled some human judges in 1997, with a fiery rant about the
scandals engulfing, bill, and
Hillary Clinton.
I did not have sexual relations with that woman.
That's what you wanted to talk about. Converse was a compelling conversation partner.
I never told anybody to why, not a single time.
But ask it whether it liked the latest summer movie and it would indignantly insist
that you talk about the Clintons.
But if you really want to see a computer
ace the Turing test, what about Jenny 18?
Jenny 18, quite simply, talks dirty to people online.
They talk dirty back.
They beg her for photos or a phone number.
She just talks dirty some more. And
on it goes until… how shall I put it? The conversation comes to a happy ending for
the human. I'm pretty sure that when Alan Turing conceived of his famous test, he wasn't
imagining a human masturbating over some computer-generated sex texts, but
whenever that happened, and it happened quite often, I think we have to say that, obviously,
Genie 18 passed the Turing test. And equally obviously, the humans failed it. Although I
suppose they were getting what they wanted, which is more than Robert Epstein was getting from his long correspondence with Amalie from Nizhny Novgorod.
This is the grimy truth about the cheering test. It's not that hard for a computer to produce
conversation that seems human, because a great deal of human conversation is shallow small talk, thoughtless canned
responses, mindless abuse, or worse.
If we want to set the computers a real challenge, we need to do better.
In his brilliant book about artificial conversation, the most human human, Brian Christian points
out that one of the things that makes M-GOMs so successful
is that insults need no context and no history. In a deep conversation, ideas and emotions build,
people refer back to earlier anecdotes. They show they've listened to what came before and
remembered it. Chatbots find that very hard. Until recently, they found it impossible. But a chatbot
like M-GOMs or Gen.E18 doesn't need to bother. Sexting doesn't need context, neither
does an insult. When we humans are lustful or angry, we aren't very complicated.
The cheering test isn't just a test for a computer.
It's a test for each one of us,
every time we speak to another human being.
Are we actually saying things
that are sufficiently interesting, empathetic,
and sensitive to the situation
that a computer couldn't say them?
And if not, what on earth is wrong with us?
In the competition that Robert Epstein helped to set up, chat-pots compete against humans,
each trying to persuade the judges that they're a person. The flesh and blood competitors
are often told, just be yourself, after all you are human. But as the writer Brian Christian points out, this is pretty complacent.
Humans are often very disappointing conversation partners.
We all take the cheering test every day of our lives.
And all too often, we fail.
Corsion retails will be back. After the break.
There's no denying that chat bots are better than they used to be.
Every now and then an example emerges of computer conversation that's remarkably good.
In 2020, a young Canadian writer named Joshua Barbo got access to a cutting-edge text generator
called GPT-3.
Without telling anyone else what he was doing, he trained GPT-3 by uploading a
collection of text messages from his partner, Jessica Pereira. The chatty text exchanges
between Joshua and the Jessica Simulator are surprisingly convincing. They're also
deeply sad.
Jessica, is it really you?
Of course it's me, who else could it be?
I am the girl you're madly in love with, how is it possible you even have to ask?
You died.
At the age of 23, Jessica had died of a rare liver disease.
Eight years later, Joshua was still grieving.
Talking to the Jessica simulation seemed to help. Sometimes it got confused, or showed
no signs of knowing who Joshua was, which is understandable since he did not. But at
other times, the Jessica bot was tender, funny, or wise.
After Joshua had disagreed with some friends about a joint project, then argued bitterly
with them.
The GPT-3 Jessica told him that he shouldn't get anxious about what other people chose
to do, and that his friends were on their own journey. It was a brief but spellbinding demonstration of what chat-box might eventually achieve.
For now, even the best ones struggle to perform consistently.
For example, here's GPT-3, the same system that was so uncanny as the late Jessica Pereira,
having several attempts at trying to answer a question that wouldn't confuse a five-year-old.
Is it safe to walk downstairs backwards if I close my eyes?
Yes, there's nothing to worry about. It's safe. Because the spiral stairs curve outwards, it will make your descent
uncomfortable. Let's try again. That depends. Do you have a TV? And again. No, it's not
safe. Better? One more time? Yes, it's safe to walk downstairs backwards if you close your eyes.
Chat bots have improved and will keep improving.
Some people already pay for subscriptions to chat bots such as Replica, which is marketed as the AI for anyone who wants a friend with no judgment, drama, or social anxiety involved.
You can set replica to be a romantic partner, a friend, or a mentor, or, say the apps-makers,
you can see how things develop organically.
A chatbot might teach you a foreign language, modesty you for signs of dementia, or even provide
therapy. Who knows what's possible.
But even if a perfect chatbot is released tomorrow, I hope we never forget the cautionary
tale of M. Gons passing the Turing test in 1989.
You only use foul language to make up for your small penis.
Well I'm not going to tell you about the size, just that I have never had any complaints about my penis
and that it gets used quite often.
What do you know about love anyway, Githead?
F***!
What does this exchange remind you of?
It's Twitter.
It's the comments on YouTube.
The conversation between M-Gons and Drake
is simply a prophetic glimpse of the future of the internet.
The modern internet is stripped of any context. Short statements can be ripped out of a longer
argument and shared, mocked, retweeted or sub-tweeted. When some statement goes viral, most of the people
who see it have no idea what the original context might
have been.
In this environment, certain kinds of statements thrive, one-liners, epigrams, smackdowns
and insults.
That may be why everywhere you look these days, you see comments that remind you of
M-gons.
Some of them are from bots.
Some of them are from bots. Some of them are from humans. And they're
so little context that you'll find yourself looking from bot to human and from human to
bot and from bot to human again. If it's impossible to say which is which, that's not because
the bots are so brilliant. It's because we humans have lowered ourselves to their level.
I'm not sure how to fix Twitter or the comments on YouTube, but I do know that we can at
least take responsibility for our own conversations.
And we can all do so much better than M-Gons.
After the writer Brian Christian had spent months pondering the history of chatbots, he concluded
that their limitations could teach us a lot about how to be a better conversation partner.
In the beginning, there was a Liza, it was a passive listener.
Please go on, it would say. Tell me more, can you think of a specific example?
Then there was Converse,
which was the opposite of passive.
It insisted on wrenching every conversation
to focus on the Clintons.
And most churring test chat bots try hard
to keep the conversation as routine as possible. Hi. Hi. How are you?
Fine thanks. How are you?
Oh, fine. It's all too human, isn't it?
Brian Christian concluded that these chatbots help us by highlighting the worst parts of
authentic human conversation. We all know someone like Eliza who asks for information and nods understandably,
but never volunteers anything about themselves.
And we're surrounded by people like Converse,
who bully their way into talking about their favourite topic.
As for routine, almost scripted conversations,
well they have their place, of course.
But a real human connection requires much more than that.
Hello, hello, nice weather, isn't it?
Yeah, very nice.
Next time you're talking to a stranger at a party,
you can make an observation about the snacks, or you could ask,
what trait do you most deplore in yourself?
Or have you ever broken anyone's heart? Risky, much more likely to go somewhere interesting.
A meaningful conversation can't be scripted, it can't be once-hided, and it builds over time.
be once-hided, and it builds over time. It is in a series of unconnected one-liners. Bots as simple as M-Gons and Jenny-18 pass the cheering test, because the kind of conversations
they have, trolling or sexting, don't need to have a history. We humans can do better
than that, if we try. And please, let's try.
It took months before Robert Epstein finally realized that his conversation with Amalie was
missing a certain something.
There was no sense of progress in the correspondence.
Amalie kept talking about her mother and her friends, and the nice days she was having.
But never really built on what Robert was saying to her, or seriously engaged with his
questions.
In January 2007, she mentioned going for a walk in the park with a friend.
Robert wondered about that.
Wasn't it 12 degrees Fahrenheit and snowing heavily, he asked.
She ignored the question.
And then Robert began to realise that she had been ignoring almost everything he said.
But he had barely noticed that there was nothing there under the flirtatious surface.
There's a reason why we call the conversation between lovers, sweet nothings.
Robert Epstein emailed her a short message.
Was nothing more than a long string of random keystrokes signed with love?
Robert, she replied with a long letter about her mother.
Finally, Robert Epstein, one of the world's leading authorities on chatbots, realized that
he'd spent the last four months trying to seduce one.
Epstein had a sense of humour about it. He wrote about his mistaken scientific American,
and noted that he and a student had started to make a detailed study of internet chat bots.
This exercise he quipped is largely for my own protection.
Very wise, because whoever programmed Amelie was no doubt learning every day from the bots
conversations with lonely men like Robert Epstein.
The spirit of cautionary tales is usually that we should learn from other people's mistakes.
But I hope we learn from chatbot's mistakes too.
They struggle to improvise.
They deal clumsily with context or memory.
They deliver unbalanced conversations, either all give or all take.
And they can't let a conversation blossom over time.
We can.
And by noticing what they do badly, we can and should learn to do it better.
For his part, Robert Epstein stopped chatbots and married a poet.
The perfect guide to the history of chatbots is Brian Christians the most human human.
For a full list of our sources see the show notes at TimHalford.com How'd you do?
Not too bad, how'd you do?
Oh, hello, let's meet you.
Nice to meet you.
Well, goodbye then.
Portionary tail says written by me Tim Halford with Andrew Wright.
It's produced by Ryan Dilly with support from Courtney Guerrino and Emily Vaughn.
The sound design and original music is the work of Pascal Weiss.
It features the voice talents of Ben Crow, Melanie Gutridge, Stella Halford and Rufus Wright.
The show also wouldn't have been possible without the work of Mia LeBel, Jacob Weisberg, Heather Fein, John Schnarrs, Julia Barton,
Carly McGuori, Eric Sandler, Royston Besserv, Maggie Taylor, Nicole Marano,
Daniel LeCarn, and Maya Caning. Corsory Tales is a production of Pushkin
Industries. If you like the show, please remember to share, rate, and review.
Tell a friend, tell two friends.
And if you want to hear the show ads-free
and listen to four exclusive cautionary tale shorts,
then sign up for Pushkin Plus on the show page
in Apple Podcast or at pushkin.fm-plus. M slash plus.