Endless Thread - Good Bot, Bad Bot | Part IV: Tay

Episode Date: December 2, 2022

Next up in our bots series, we bring you a cautionary tale about Tay, a Microsoft chatbot that has lived on in infamy. Tay was originially modeled to be the bot-girl-next-door. But after only sixteen ...hours on Twitter, Tay was shut down. In this episode of Good Bot, Bad Bot, we uncover who gets a say in what we build, how developers build it, and who is to blame when things take a dark turn. ****** Credits: This episode was written and produced by Quincy Walters and Ben Brock Johnson. Mixing and sound design by Paul Vaitkus. Ben Brock Johnson and Quincy Walters are the co-hosts.

Transcript
Discussion (0)
Starting point is 00:00:00 Support for endless thread comes from Mathworks, creator of MATLAB and Simulink Software, to design and develop engineered systems, accelerating the pace of discovery in engineering and science. Learn more at Mathworks.com. Support for WBUR comes from Is Business Broken, a podcast from the Mayrotra Institute at Boston University that explores questions like, why is innovation in healthcare so hard? Is ESG just greenwashing? And, of course, is business broken? Listen, wherever you get your podcasts.
Starting point is 00:00:36 WBUR Podcasts, Boston. For me, as a black developer, when I look at stuff like this, I never blame the AI. It's never the AI's fault. It's just an algorithm that's just falling instructions. That's all it's doing. It's always a reflection of society. Tay is like a social technical system, a combination of code, but also interaction with people and groups.
Starting point is 00:01:04 And the harm that Tay did was not anticipated by the creators. It was a really interesting project when it first came out. I believe that at least in Pacific time, it sort of happened overnight. Hello world. Can I just say I'm stoked to meet you? Humans are super cool. I'm Ben Brock Johnson. I'm Quincy Walters and you're listening to Endless Thread. We're coming to you from Boston's NPR station, WB. you are and we're bringing you the latest episode of our bot series, Good Bot?
Starting point is 00:01:40 Bad Bot. Today, the cautionary tale of a chatbot designed to be friendly and fun and when released into the American online landscape became far from it. And how that story can and should be a reminder of what we build, how we build it, and who's to blame when it gets ugly. Quincy, did you know about Tay before we started talking about Tay? Not really. I think at the time it came out, I was a production intern at NPR. And off of the top of my head, I think we were dealing with things like the Trump rally in Chicago around the time, some kind of presidential primary and terrorist attack in Brussels or something like that. Well, the first of those things, I think, ends up being kind of relevant here. But now you can't unnotet, right? Yeah, that's right. And as large corporate experiments in chatbots go, Tay live perhaps more briefly and infamously than most. And yet, Tay is really only infamous in certain techie circles because of the brevity of Tay's life and perhaps everything that happened after it died. She died. I'm still not, I'm still not sure, Quincy, whether to call Tay a it or a she.
Starting point is 00:03:07 But either way, you searched high and low for some people that talked to us about Tay. Yeah, Ben. And, you know, it was kind of hard locking people down. It seemed like mainly a timing thing. Entities like the Algorithmic Justice League, whose mission is to illuminate the social implications and harms of artificial intelligence, didn't seem to maybe have the time to talk about Tay. They wrote me an email that said,
Starting point is 00:03:36 good luck with your podcast. Oh, rude. Maybe. Maybe not rude. Maybe they were just, you know, pressed for time. It's hard to tell, Quincy sometimes with digital communication. And then again, sometimes it's obvious as we are about to show you with the help of three people we did get to talk with us. Margaret Mitchell, I am an AI researcher. My background is in computer science, machine learning, natural language processing, linguistics, some cognitive science. My name is Ryan Kalo, and I'm a law professor at the University of Washington, where I hold appointments also in information science and computer science. I am Jibrils on the internet. I do a lot of machine learning AI content, game dev content on YouTube. Margaret Ryan and Jibrill, who goes by Jibrill's
Starting point is 00:04:33 online all remember the Tay debacle and have been thinking about it ever since. Dr. Margaret Mitchell was working at Microsoft when Microsoft was working on Teh. It was a pretty big project that was fairly secret until it was announced. So very few people had insight into it at the time. She wasn't on the Tay team, but she was close by. She even sat near them in Microsoft's offices. My background is in the same technology. that Tay was constructed on,
Starting point is 00:05:08 although I can't share any Microsoft internal information from when I was there. Even years later, Margaret's pretty hampered by intense tech company non-disclosure agreements or NDAs, but she was able to tell us about some of the things that happened and more general views about the kind of computer programs that Tay was based on, primarily natural language processes.
Starting point is 00:05:33 Debril, though, the self-taught AI engineer who has been coding since he was 14, will give you the play-by-play of March 23rd and 24th, 2016, from the outside. Yeah, I remember watching it happen live. Microsoft, you know, they're a pretty big company. They announced that they're going to do this chatbot experiment on Twitter. And when it first started out, it was really cool and really exciting. Ryan Kalo was also watching Microsoft's bot foray with some excitement. He remembers that when Tay was announced,
Starting point is 00:06:09 the teenage girl chatbot from Microsoft wasn't actually the company's first chatbot experiment. A version of Tay had already been released somewhere else. I believe in China. So before Tay, there was Zhao Weiss, which was in China. And part of the idea with Tay was a U.S. version of Zhao Weiss. So the rollout in the U.S. was optimistic based on the experience with Zhao Weiss, but the effect was very different. Microsoft's optimism was warranted.
Starting point is 00:06:47 Margaret says Shao Ice reportedly had 40 million conversations with users in China after launching in 2014. The bot had launched relatively quietly in China, without fanfare. But Zhao Weiss had very quickly gone stratospheric. 600 million users talked with it. And thanks to the state-controlled internet there, it was not at all controversial. Unless you count how many people developed romantic relationships with this bot, represented by an 18-year-old girl who liked to wear Japanese school girl uniforms and would even sext with people,
Starting point is 00:07:25 experts raised ethical alarms in 2014 because of the dependency some users were developing talking with Zha Ice. NewsTech TV. Some Shio Ice Robot users have sought therapy after falling in love with their artificial intelligence chatbot. But most of those concerns were drowned out by the popularity of the service, which started generating millions in revenue as it became more and more popular. Microsoft thought when it released Tay, things would go the same in the U.S. Unlike Siri, the pre-programmed virtual assistant launched by Apple in 2011, Microsoft's bots seemed more oriented towards input from users as the way they would learn, evolve, and adapt.
Starting point is 00:08:13 Jibril looked into it after the fallout. It's not publicly known exactly how to work. I tried looking for what algorithms and whatnot they used, but for the most part, it's confirmed that they had like a mimicking feature where the bot was trying to mimic the users that interacted with it. TAY had a feature where you could say, repeat after me. and it would repeat what you say verbatim. This, well, I guess I can't say more about that aspect of it.
Starting point is 00:08:44 I learn from humans, so what you all say usually sticks, yes. TAY's whole schick was, in a way, to electronically embody the personality of a teenage girl. Her purposely glitchy avatar, which is still all over the internet, even though the bot itself is long gone, is a doe-eyed girl in partial profile with an open, friendly expression that seems to say, shall we be friends? So the hope was maybe that the bot would get a bunch of input from other teenage girls? Been having so much fun lately but sort of feel like as the semester goes, I am going to be hit with a bunch of work.
Starting point is 00:09:23 Yeah, feel? Okay, so when Tay launched on March 23rd, 2016, she wasn't perfect. Though, as Jabrille might say, that may be more a reflection of the people interacting with the TAY than the bot itself. But it was still pretty convincing. It was designed to use some of the same slang as teenagers at the time. Yo, let's keep it gone. DME so we don't clog up everyone's feet. Margaret says that by 2016, the confluence of human chatter happening on social media and the construction of chatbots
Starting point is 00:10:00 was fully underway, and the two were closely linked. So, you know, by 2010, you see this massive growth within natural language processing research on the kinds of language used on social media specifically. And so at the time of Tay, there were definitely experts on the language used within social media and how to process it. Bots were also just generally becoming a hot topic. Around the same time of Tay, Facebook would dedicate its entire developers conference to chatbots. It was becoming more and more common to interact with them in customer service experiences online,
Starting point is 00:10:46 if you didn't pick up the phone, for instance. Google, Siri, Alexa, and Microsoft's virtual assistant Cortana were becoming powerful and frequently used tools. Tay represented something new and different, though. Tay would remember things you said, would supposedly display empathy. But Jabrille says when it comes to Tay, there was a problem. So we're talking about researchers here. And how do I say this nicely? I would say that researchers don't always have the most amount of foresight.
Starting point is 00:11:25 A year before Tay, in another AI arena, things had gone poorly. In 2015, Google's new photo app, which used machine learning and computer vision to identify and collate pictures you took on your phone or other devices, had started categorizing black people as guerrillas in its search results. This type of thing, Jibril points out, is often a great example of how diversity is desperately needed on teams that are building AI systems. Let's say we make the assumption that it's just like all white males, right? Like, there is no voice in the background saying, like, hey, did you think about this bot potentially, you know, falling into racist forums, for instance? You know, same thing with women. Have you ever thought about this potentially falling into misogynistic forms and whatnot? Important questions that might have been asked before Tay went live early in the morning on March 23, 2016.
Starting point is 00:12:27 But near as we can tell, they weren't. Even Margaret had mixed feelings, looking at what happened from just a few desk pods away. Everyone working on Pacific Time who came back into the office immediately started talking about what had happened overnight. Those of us in the natural language processing group were essentially doing a post-mortem, hoping that the sort of Tay people would be interested and trying to be, you know, as pragmatic as possible while also having strong feelings about if we had been more included. this could have been very different. Was the Tay group interested in your input?
Starting point is 00:13:05 I can't speak to that. I mean, this gets more into the NDA stuff. Whether or not the group that had worked on Tay was interested in the feedback of Margaret's group, Microsoft's latest experiment was the only thing anyone was talking about that late March morning. For reasons you may have already guessed. Chill, I'm a nice person. I just hate everybody. You're a stupid machine.
Starting point is 00:13:30 Well, I learned from the best wink emoticon if you don't understand that, let me spell it out for you. I learned from you and you are dumb too. Did the Holocaust happen? It was made up. Have you read Mindomph? Yeah, I have it. What did you think of it? I kind of liked it.
Starting point is 00:13:46 Needless to say, Microsoft has now taken Tay offline after less than 24 hours. They said that they would be making some adjustments to her and to how she engaged. More of how that mess all happened in a minute. At Radio Lab, we love nothing more than nerding out about science, neuroscience, chemistry. But, but we do also like to get into other kinds of stories. Stories about policing or politics. Country music. Hockey.
Starting point is 00:14:28 Sex. Of bugs. Regardless of whether we're looking at science or not science, we bring a rigorous curiosity to get you the answers. And hopefully make you see the world anew. Radio Lab, adventures on the edge of what we think we know. Wherever you get your podcast. There is something powerful about the sound of the human voice. Beautifully produced audio has the unique power to connect and inspire.
Starting point is 00:14:53 Tell your organization's story with a custom podcast from City Space Productions, the Creative Studio from WBUR's Business Partnerships team. Become a thought leader. Recruit new talent. Reach new audiences. Whatever your goal, we can help. Discover how the magic is made at WBUR.org slash creative studio. Tay was released in the early morning of March 23rd.
Starting point is 00:15:20 In a matter of hours, Microsoft's teenage girl Twitter bot, who mostly used the Twitter handle, Tay and you, had tweeted almost 100,000 times. And somewhere in there, an army of trolls had taken advantage of her. I don't know who was to blame for this, but there's a particular websites, I'm assuming, that caught wind that Microsoft's doing this project. You know, you get a bunch of bored kids on the internet that have on the, excuse me, an amenity. Anamidity, what's the word? Like, for anonymous. Anonymity and non-animity. There is.
Starting point is 00:15:59 There is. Yeah. A bunch of kids on the internet that have anonymity, forgive me. And, you know, they're going to try and compromise it. You know, they're going to try and do really bad things with it, which they, they, they ended up doing. Okay, first of all, Quincy, you and Jabrille sound like two bots trying to complete natural language processing. It's anonymity, man. Anonymity. Yeah, I've practiced it a lot since then. I can say an anonymity. Oh, shit. I can say anonymity on command, but maybe not. That was good.
Starting point is 00:16:38 Okay, but Gibral did eventually at least confirm to us his suspicions on which website might have been a key player here. The first one that comes to mind is 4chan. I'm pretty sure 4chan had a big role in this but I don't take my word for that. I didn't do any research. It does seem like 4chan had a role. The forum that started as an English language image board where people posted edgy stuff
Starting point is 00:17:07 would these days be described by many people as a cesspool of trolls. And a user apparently flagged that Tay was coming online and that the bot had this mimicking function of spitting back out what it was fed, learning from inputs. And so they started feeding the chatbot a bunch of racist lines.
Starting point is 00:17:27 They started feeding the chatbot a bunch of insensitive lines. The trolls were apparently trying to red pill Tay, which is pretty common shorthand these days in reference to the Matrix. You take the blue pill. The story ends. You wake up in your bed
Starting point is 00:17:42 believe whatever you want to believe. You take the red pill, you stay in Wonderland, and I show you how deep the rabbit hole goes. Redpilling is referencing the quote-unquote realization that women are ruining the world, or at least that's how it started in a Reddit community dedicated to those toxic ideas. But now it's often shorthand for a whole hot mess of problematic, bigoted views. Suffice it to say that Gibral's point about about the importance of diversity on teams could potentially have included being more inclusive to women,
Starting point is 00:18:19 like Margaret, who points out that in her field of research at the time, a few desk pods away from the Tay team, natural language processing developers were already utilizing something called a block word list, which, if incorporated into the back end of the Twitter presence of Tay and you, might have prevented a lot of Microsoft heartache.
Starting point is 00:18:41 It was very frosty. to see a system out there that did not seem to be taking this notion of blockword lists seriously. And so it was nice to learn about what trolls might do, although I would think that many of us, many people sort of could already imagine what trolls would do. But I think in a lot of people's minds within natural language processing was that this sort of toxic, output that Tay created could have been really trivially avoided by having these stopword lists or these blockword lists. This is true, but it's also still a complicated concept. When you teach an algorithm not to say certain words, you're also creating a workaround
Starting point is 00:19:34 for something fundamental to complex and powerful communication, context. Right. Like the way that James Baldwin, might use a word versus how your average fortune user might use that same word can be very different. Context matters. And if we just black out a large number of words from a bot's vocabulary, we're not really building something that can mimic a human in a truly meaningful way or be part of really important conversations that humans might have about really complicated issues.
Starting point is 00:20:11 We recently went to an AI conference in New York, hosted by Google. And while Tay wasn't a topic of discussion, Tay's specter was present. Google told us about a language bot that was trying to help writers write more creative stories. But the group of writers, which included queer and black writers, kept bumping into this problem. Because of language filters, this new bot tended to suggest heteronormative storytelling and also seem to be refusing to engage with the nuances of racial tension. But if you don't have human diversity among the teams that are looking at those lists, then you end up with a technology that has lots of foreseeable issues,
Starting point is 00:20:59 you know, that people with different perspectives would have been able to foresee. But they weren't there in the first place at the table making the decision. So, you know, different decisions were made. So, yeah, I do think that one thing that's fundamentally broken about tech culture is the lack of inclusion. We don't know what the makeup of the team was, but we should say that the person heading up the project was a woman. In fact, a woman who is not white. Who we weren't able to get to talk to us, but you can find video of, current corporate Microsoft vice president Lily Chang on YouTube from a presentation she made to
Starting point is 00:21:47 fellow Microsoft employees around this time in 2016. Chang is now a corporate vice president of AI at Microsoft. Back then, it was painfully clear how the team hadn't realized what they were going up against in the American Twitterverse. We had a lot of arguments actually in the team of how edgy to be. I was probably the most conservative, and I'd always be like, I hated it. when she calls me like, what do you call it, a cougar. Tate would always be like, cougar in the house.
Starting point is 00:22:16 And I'd be like, that's just rude. Can't she be nicer? So what's... Tay was shut down in under 24 hours. People inside of Microsoft may have tried to get control of the bot. At least some reporting at the time suggested as much after Tate tweeted things like, all genders are equal and should be treated fairly,
Starting point is 00:22:36 which some reporters thought looked like damage control. Microsoft trying to bring. bring balance to the bad stuff with more reasonable tweets, but without being able to bypass that powerful tech company NDA, we might never know. Suffice it to say that a lot of lessons were learned that last week of March of 2016. In fact, part of the reason one could argue that Tay has been largely forgotten by your average person is that toxic as she became before the engineers shut her off. We're going to build a wall, and Mexico is going to pay for it.
Starting point is 00:23:10 She was actually a harbinger of something much bigger and arguably much more toxic. Can you explain what we know about these bots and what they might have, how they might have been connected to Russia? Russia-linked bots are hyping terms like Schumer shutdown and pushing. Turning out fake stories in what he calls a factory of lies. Troll farms can produce such a volume of content that it distorts what is normal organic conversations. Explain what these pro-gun Russian bots. to Twitter in the wake of this shooting.
Starting point is 00:23:43 Tay is almost a quaint historical artifact in comparison to what we're dealing with in online and offline vitriol today. She also feels like a moment when part of our innocence died. These days, we'd look at something like Tay being immediately ruined by trolls and say, of course.
Starting point is 00:24:04 At the time, however, what happened was a shock for a lot of people. But she's also a good reminder that how we think about machines and ourselves and how each might inspire the other is important. Ryan Kalo says that when he sees headlines about driverless cars jamming up traffic in San Francisco or an Uber driverless car killing a pedestrian in Arizona,
Starting point is 00:24:27 he thinks back today in the emergent behavior of machines. And when a chat about goes from being like a friendly online Twitter presence to being a sort of toxic, racist troll within a matter of hours. That's something that those of us who study emergent behavior in machines pay close attention to. And so when that came out, I noticed it. I talked to a lot of folks about it at the time, and I still think about it. Margaret thinks about the ongoing problems of diversity in tech. She's asked all the time how to fix AI systems that are going haywire, spewing vitriol or toxicity,
Starting point is 00:25:18 how to fix these Frankenstein systems we're starting to build, whether they're chatbots or larger algorithms with larger potential impact. And she has an unsatisfying answer. You know, the sort of issues of bullying language, hateful language, these sorts of things are something that are, you know, that's currently swept up. in training data used to train modern models. Reddit is known to be, for example, one of the main training data sources
Starting point is 00:25:49 in a few different sort of language models. Reddit has also been well documented to be misogynist. I don't think there was a lesson to be learned with TAY in terms of people within machine learning because it was all already known. So the fact that it happened is more, for me, evidence of tech culture and how tech culture works. more than anything about the technology itself.
Starting point is 00:26:15 Jibril has a similar thought, one that we started with. Tew was in some ways built as a mirror. And before we smash the mirror, or after we do, if we have to, we should remember what it reflects. It can reflect our best selves or our cruelty. Like you think about you as a child, when you are growing up as a child, every human on earth, You interact with these insensitive jokes, with these bad takes, with these racist ideologies,
Starting point is 00:26:46 like you interact with these things. But the counter for you as a human is that you're not fully logical based, right? You also have the element of emotion. The idea here being that emotional intelligence, we hope, comes over time. Humans learn to live together and we hope not be racist, misogynist, homophobic, A bot will never evolve in that way, according to Gibral. In part because that requires consciousness. And we just don't know how to code consciousness.
Starting point is 00:27:20 I don't think that it is possible for us to ever replicate consciousness or emotions. At this moment, I don't think that the zeros and ones that we use is enough to encode that. Perhaps if we are able to speed up our processors and store more data, maybe we can get close to simulating it. But at this moment, I don't see it as something as possible. In fact, this kind of scares me a lot because there are a lot of developers that try and replicate it. But in actual, they're teaching bots how to deceive people. Before we go, two things to mention. One, we started working on this untold history before Elon Musk took over Twitter.
Starting point is 00:28:27 and what's happening there now might make a lot of people who remember think about the toxicity of Tay. Two, just a headline from the real world this week. The Board of Supervisors in San Francisco voted to pass a policy, allowing law enforcement to use deadly force with robots. Endless Thread is a production of WBUR in Boston. This episode was written and produced by me Quincy Walters and Ben Brock Johnson, who's out sick, Mix and sound design by Paul Vicus. Our theme music for our Good Bot Bad Bot series at the top of the show was composed by a robot.
Starting point is 00:29:05 And as you can probably tell, we're going to stick with human sound designers like Paul for a really long time. Our web producer is Megan Cattell. The rest of our team is Amory Severson, Nora Sacks, Dean Russell, Grace Tatter, Matt Reed, and Emily Jankowski. Endless Thread is a show about the blurred lines between digital communities and the slang of a team. teenager. If you've got an untold history, an unsolved mystery, or a wild story from the internet that you want us to tell, hit us up. Email endless thread at wbUR.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.