Radiolab - The Echo in the Machine

Episode Date: May 23, 2025

Today you can convert speech to text with the click of a button. Youtube does it for all our videos. Our phones will do it in real time. It’s frictionless. And yet, if it weren’t for an unlikely c...rew of protesters and office workers, it might still be impossible. This week, the story of our attempts to make the spoken visible. The magicians who tried. And the crazy spell that finally did it. Signup for our newsletter!! It includes short essays, recommendations, and details about other ways to interact with the show. Sign up (https://radiolab.org/newsletter)!Radiolab is supported by listeners like you. Support Radiolab by becoming a member of The Lab (https://members.radiolab.org/) today.Follow our show on Instagram, Twitter and Facebook @radiolab, and share your thoughts with us by emailing radiolab@wnyc.org.Leadership support for Radiolab’s science programming is provided by the Simons Foundation and the John Templeton Foundation. Foundational support for Radiolab was provided by the Alfred P. Sloan Foundation.

Transcript
Discussion (0)
Starting point is 00:00:00 Oh, wait, you're listening. Okay. All right. Okay. All right. You are listening to Radiolab. Radiolab. From WNYC.
Starting point is 00:00:12 So, let me just, we are recording. Good. This is Radiolab. I'm Lula Miller. And today, producer Simon Adler brings us a story from... My mother's living room, watching the television with her. This is what we love in our reporting. They scour the earth far and wide. Oh, yeah. Going to unknown, exciting places like the shag-carpeted living room of my mother.
Starting point is 00:00:45 No, and so we're sitting there and you know, my mother's hearing. It's not what it once was. And so like most nights she was watching with the closed captioning on. Oh, absolutely same. All right, right on. Anyhow, I think it was the local news, literally talking about things like filling up potholes. Okay.
Starting point is 00:01:04 And as I'm sitting on the floor, sort of bored out of my gourd, I have one of those moments where a genuine question popped into my head, which was those closed captions on the screen, you know, how did those get there? Like is there someone in Sandusky typing as fast as they can? Exactly. Right. Like, is it a human sitting in an office? Yeah.
Starting point is 00:01:30 Or, and this was sort of my real question, like, is this one of those jobs that AI has already taken and replaced us? Okay. Okay. Did you have a hunch? I thought it was probably AI. Okay. Just based on my own real world experience.
Starting point is 00:01:44 Based on life right now. Right. And I also thought then that, you know, like one quick question to chat GPT and we're going to get to the bottom of this. Turns out that was not the case here. As I looked into this, yeah, I found it was cacophonous in a way I didn't expect. I found ladies swearing at their televisions, students demanding to be heard, and maybe oddest of all, a whole chorus of voices offering us a path through the strange future we seem
Starting point is 00:02:22 to be walking into. Okay. Bruce News broadcast is being closed captioned. Subtitles called closed captioning. Access to television. It puts words into your world. Closed captioning. So Greg is signing, I will be speaking.
Starting point is 00:02:41 Okay, just speaking. And not signing. Seems like the best place to start is, you know, all the way back at the beginning. One, two, three, four, five is fine. That's great. Okay. Great. With this guy.
Starting point is 00:02:54 My name is Greg Leibach. And his interpreter. Brenda Kelly Frye, certified interpreter for the deaf. Today Greg is an attorney with a coif of silver hair, thin rimmed glasses. He was born deaf in Queens, New York. Less than, I don't know, one and a half miles from the New York Mets stadium near the airport. And I come from a deaf family, my parents and two brothers and one sister. And growing up, he says, you know, during the daytime, he felt pretty darn integrated
Starting point is 00:03:21 into the larger hearing world. I had neighbors who were hearing and we all associated with each other, communicated with each other. We stayed outside all day until the dinner bell rang. But in the evening, you know, when all good Americans turned on their TVs, he was not. It was pointless to watch except for maybe football and baseball games because there was basically no captioning. We'd have to look at the TV guide and you know they had a symbol that said CC. News broadcasts, the occasional special, that was about it. And it very well might have
Starting point is 00:03:58 stayed that way if it hadn't been for Greg. I guess you could say so. And so, fast forwarding, it is the spring of 1988 on the campus of Gallaudet University. The campus is beautiful and gated. The students are in lots of denim and oversized sweatshirts. I mean, it is your standard-looking college. With one exception. It was nearly 100% deaf. In fact, it was basically the only four-year liberal arts college for deaf students in
Starting point is 00:04:32 the world. This is disability rights attorney Karen Peltz-Strauss. I was on the staff of Gallaudet's National Center for Law and Deafness at the time. She is fluent in sign language and in 1988 on campus she says tensions were high because the position for Gallaudet's president opened up and until that time Gallaudet it had always had a hearing president. Yeah in its 124 year history all of them were hearing. Missed opportunity in the leadership department.
Starting point is 00:05:00 Oh absolutely and so the students on the campus said you have got to choose a deaf president and the faculty said the same and the staff said the same. And according to Greg... I was a junior. I was in my junior year. Who was actually the the student body president as all this was going down. He and his classmates... We were very optimistic. Because of the three finalists for the job, two of them Harvey Corsons and I. King Jordan, they were deaf. And the third candidate, a woman by the name of Elizabeth Zinser, not only was she not deaf, she didn't even know sign language.
Starting point is 00:05:37 No. Okay. Okay. I mean, Zinser had no support on campus. And so, March 6th, 1988. Pat, students here behind me have been waiting all day outside the gates of Gallaudet University, while inside the board of trustees have been meeting, trying to pick a new president. We were all gathered in the gym, in the field house, waiting for them to make the announcement.
Starting point is 00:05:59 And around, say, seven o'clock, It happened. We picked Dr. Elizabeth Ann Zinser as the seventh president of Gallaudet. No! Because she is a very talented educator. Oh no. Oh no.
Starting point is 00:06:21 Yeah, they went with the hearing lady. Elizabeth Ann Zinser, she is the new president of Gallaudet University. Dr. Elizabeth Zinser. Who is neither deaf nor able to speak sign language. Why? Did they say why? Well at least one of the explanations was pretty darn ugly. University trustees chairman defended this election saying deaf people are not ready
Starting point is 00:06:44 to function in the hearing world. And the students, well, they go berserk. Why, why here, why? We were all upset. What? God damn it! I'm so damn angry that this makes me sad. Very upset.
Starting point is 00:06:59 We've been here for a hundred and a half years. We've just felt like somebody just slapped us in the face. Things were escalating. Everybody was on the streets. And escalating. I mean, people were throwing things. And escalating. Then I told them, stop.
Starting point is 00:07:15 Do not damage. Do not vandalize anything. No violence, please. Because I knew that many people don't have the experience of seeing deaf people. We were sending the wrong impression. We're sending the wrong message. Sometimes the first impression is the lasting impression. So I didn't want the hearing people seeing us as a wild bunch of people.
Starting point is 00:07:42 So we gathered at the front of the gate in front of campus, said, let's get organized. And that's when we started making plans. First things first, Greg and a couple of the others. We drove to buy a chain from the hardware store. We brought the chain back and we locked all of the gates on campus. They hotwire some of the school buses and drive those in front of the gates. Blocking those entrances.
Starting point is 00:08:13 Huh. So now it's really blocked off. Fattened down the hatches of the whole university. Yes. And in the morning. As the administration arrived. We don't want the university to open. We want a university to open.
Starting point is 00:08:25 We want a deaf president first. 99 Acres was totally shut down. The students vowing to keep it that way until the board replaced Zinsser with a Deaf President Now. Deaf President Now! That was it. Students succeeded in shutting down the school
Starting point is 00:08:40 in peaceful protest. Deaf President Now! Picture folks on each other's shoulders school in peaceful protest. — The school is officially official. — Step president now! — Picture folks on each other's shoulders waving signs, banging drums. Almost immediately, faculty and staff like Karen joined the cause. — We had a great time. It was a party. — We marched around, we had different presentations, and we had doughnuts. — And at least once, they pulled the fire alarm, which, you know, didn't bother the students
Starting point is 00:09:07 but bothered anyone who could hear. Oh, metal! That's awesome. By the end of the first day, Greg had become the official spokesperson. We have the president of the student body, Greg Leibach. And by the second day, media from all over the country had poured in. In their signing and in their faces, you see their convictions. PBS, ABC. Demonstration.
Starting point is 00:09:30 Demonstrations continue here. I mean, this became a national news story, culminating with Greg appearing a nightline to debate the incoming president, Elizabeth Zinsser. Really? Dr. Zinsser, please go ahead. Thank you, Ted. As president of Gallaudet University, I want to indicate that the university
Starting point is 00:09:49 is an extraordinary institution. It deserves to have the continuing strength into the future in its mission as an educational institution. Excuse me. Are you implying that a deaf person can continue that for the future? Not at all. Okay, so that's Greg.
Starting point is 00:10:07 So Greg in the red-tied gray suit. Yeah, yeah, yeah. We've got like a split screen going on. There are captions on the bottom of the screen and Greg, who you're about to hear again, talking through an interpreter is on the right side of the split screen. ... fail then. If they haven't provided any deaf leaders, then obviously Gallaudet hasn't done a good job.
Starting point is 00:10:23 If they haven't done a good job, then there should be a deaf president someone qualified to do this so intense yeah there it gets heated like as Zinzer tries to get going here again what i'm really saying is that i do believe that deaf individuals have great capacities uh he cuts her off again i truly believe that a deaf individual one day will be the president no that's old news i'm that i'm tired of that statement one day will be the president. No, that's old news. I'm tired of that statement. One day, again and again. Alright folks, let me, excuse me one second. Let me ask. Okay, so this debate was captioned.
Starting point is 00:10:53 Do you think that was like a special move? Yeah, so this broadcast was actually open captioned, meaning that everybody who tuned in saw the captioning on the bottom of the screen. However, that was not the case for the vast majority of the coverage of the Deaf President Now protests. And in fact, even the broadcasts that were closed captioned, to receive those closed captions, to get them to show up on your screen,
Starting point is 00:11:21 you needed to have one of these very expensive, clunky decoders. So ABC's sending... Oh, like in your house. In your house, connected to your television. Think of it like a VCR, but it's a VCR that just allows your television to receive the closed captions. So very few people of just like the general American public would be seeing these captions.
Starting point is 00:11:43 Oh yeah, like nobody. Yeah. Which like, it's so frustrating to think that was like the day-to-day norm for deaf folks at that time. But I mean, there's just something like particularly frustrating to imagine, like the folks who can't access a broadcast that is literally concerning their rights and their access, you know? Yeah. And I think that's probably part of why you see this sort of chain reaction of events coming out of this moment. So less than a week after the protest starts, Zinzer resigned and was quickly replaced by one of the deaf finalists, I. King Jordan.
Starting point is 00:12:28 Everyone was just signing and jumping and cheering and screaming and everybody was so happy. But then you also have a whole bunch of laws get passed in the years following. This thing called the Decoder Act that required all televisions to have that closed captioning decoder built into it. A little thing called the Americans with Disabilities Act. And eventually the 1996 Telecommunications Act. And that bill basically is what brings captioning into living rooms everywhere. And the mandate is what? It's that by the early 2000s, all new English language broadcast television had to be closed captioned.
Starting point is 00:13:11 Oh, everything that goes out. With very, very few exceptions, everything has to be captioned. And I mean, Karen and Greg, they were central in pushing this requirement into the bill. Wow, like that is such a, like go Greg, like go Greg, go Karen, I mean that's a huge win. Yeah, yeah, yeah, yeah, yeah, and they say, like it all sort of started at Gallaudet. That's absolutely correct. Once more Karen Peltz stress. The protest introduced society to the way that deaf people communicate.
Starting point is 00:13:50 They introduced society to captioning and sign language interpreters and they impacted congressional votes. Yeah, so that's the why. Yeah. Right? Yeah. Right? Yeah. Like why we have all of these closed captions today. But the how, like how they were going to make all of these hours and hours and hours of
Starting point is 00:14:14 those closed captions, well, that's where this story gets just delightful, number one. And number two, I think starts to say a bit about what the future of access to information and media is going to look like for all of us. Okay. And we will get to that in a moment. But first. I mean, up to that point, live closed captioning had only ever been produced through highly trained, specialized, stenographic shorthand.
Starting point is 00:14:47 Imagine a court reporter with a strange keyboard. Just like fire fingers. Exactly. Okay. That is how captions are being made. So you've got dozens, perhaps hundreds of people sitting in offices with the television being pumped into their ears through headphones and they're just typing away at lightning speeds. But by the beginning of 2003 it was becoming apparent that not enough Steno writers were available to match the growing
Starting point is 00:15:18 amount of content that needed to be captioned. This is Meredith Patterson, president at the National Captioning Institute. And back when she joined basically as an entry-level employee, she was handed this problem. Yes, at the very beginning. Okay, so you are there. You're this like junior member of staff. I was very junior, and maybe that's why I was tasked
Starting point is 00:15:43 with experimenting with some software that we called the Black Box. It was basically a very simple early-day speech recognition technology. You know, like a speech transcriber. And her hope was that she could just take a live television feed, Good evening everyone, I'm Kevin Christopher. Plug it in and create the captions that way. However, when she tried that,
Starting point is 00:16:15 It was inaccurate. It would miss a lot of content. Little things like the news broadcaster throwing to the weatherman would totally trip it up. It didn't include punctuation. And accents of any kind were an issue. However, what it could do pretty darn well was transcribe her voice, which led to a sort of crazy idea. Could you just like, could you do the thing that we're about to talk about? Could you do the thing that we are about to talk about?
Starting point is 00:16:49 Okay, so let's try it a little bit faster. So let's try it a little bit faster. I won't be stopping so much. We're talking about the news! It's going to be a very interesting day with the news today. It is going to be a very interesting day today. What if she just echoed every word said on television into the computer? Maybe she could close caption that way.
Starting point is 00:17:11 Period. Okay, yep, you can do it. Wow. Oh my God. She called it. Voice writing. Voice writing. Huh.
Starting point is 00:17:24 That's a funny name for like being a parrot Why is that so common it's just so funny We are just getting started here. So first things first to see if this was even possible I would sit in the back of the room during internal meetings picture Just a sterile conference room with a drop ceiling. Trying to be innocuous, repeating everything they said, everything. I practiced at home sometimes on just random newscasts or people on TV. And well, she got really, really good at this.
Starting point is 00:18:01 Like, that didn't mean that the captions were coming out really good, really well. As she started doing this, echoing into the computer over and over again, it would miss words or have trouble understanding her. Her English, her voice. And so Meredith decided to meet the machine where it was at. She set out to learn to speak computer. And we are going to get to that. Right after a quick break. Stephanie, I'm going to call you right back to see if that fixes the echo, okay? Sure, sure, Simon. Call me right back. Okay.
Starting point is 00:19:00 Radiolab Lulu here with Simon Adler, who is telling us a story about how student protests led to a mandate that closed captions be beamed through all of our screens. And we were just moving on to the wild echoey way that captioners hoped to actually get them to us. That's right. Voice writing. And along with Meredith, who you heard before the break, let's see, one, two, three, four, five. I don't know why that fixed it, but it did. Yeah. Strange. Okay. Okay. This lady right here, Stephanie Vaverka.
Starting point is 00:19:34 Director of production at the National Captioning Institute. Set out to figure this out. Okay. So here's my first question for you. Yes, sir. And I noticed this with Meredith as well. I think your voices have been forever changed by the work that you have done. There is a precision and a spacing that makes sure that not a single syllable goes by without the listener being able to catch what it was. Do you think I'm right? I think you are mostly right, yes.
Starting point is 00:20:10 Okay. Well, let me back up. When we began with this voicewriting line of work, the computer software wanted to hear you sounding like a computer. What would that sound like? Can I get a demo? Absolutely, that would sound like something like this, something that is very articulate
Starting point is 00:20:31 and also very robotic, hyphen sounding period. Very quick, sometimes clipped. I hear you laughing, I know. This is how we spoke for hours and hours of our day. She says her vocabulary had to change as well. Yes, because there were certain difficult words That's how we spoke for hours and hours of our day. She says her vocabulary had to change as well. Yes, because there were certain difficult words for the software to distinguish. For example, in, and, and, and.
Starting point is 00:20:56 Like she'd say into the computer, and, but it would hear an, or she'd say in, and it would hear and and so the workaround she found was To train the computer to hear a specific real word When she would say a totally made-up word like a little code Yes, so she instead of saying the word in I n she was a in Lee in Lee in Lee I N IN, she would say... INLY. INLY. INLY. I-N-L-Y.
Starting point is 00:21:26 INLY. Which the computer would then hear and print on the screen as in. Well, how did you go home at the end of the day and start talking like a normal person again? It could be difficult to speak like a normal person after leaving. This job really did change me. Because INLY was really, was really only just the beginning. I mean, once she figured out this hack, she began developing and deploying hundreds and hundreds of code words to work around the software shortcomings.
Starting point is 00:21:56 Homophones could be very difficult for the software. Tootoo and Too, for example. The fix? Tooku for T-W-O.-Koo for T-W-O, 2-D-L-OO for T-O-O. So if a sentence is, she has two daughters in college 2, I would echo that as, she has 2-Koo daughters in Lee College, comma, 2-D-L-OO, period. So that is... Wait, wait, wait. Say that once more. Say that one, say it again.
Starting point is 00:22:27 She has two coup daughters in Lee college, comma, toodaloo. I mean, it's a whole language that you then have to remember and follow. As Stephanie's brain melded further and further with her machine, she figured out she could trick it in other ways to make her life easier. So for example, My fellow Americans.
Starting point is 00:22:46 Back when George W. Bush was still in office, that's how he was referred to on the air. George W. period Bush. Eight syllables way too many to spit out over and over again. And so, I trained my software to print George W period Bush when I said GB Hillary Clinton became Hill Co Barack Obama became bombo Rudy Giuliani at the time was Ruju Question mark that is too many syllables again Meredith Patterson. I trained the system every time I said poof, it would print the question mark symbol. They learned they could trick it into not hearing and printing certain words, both the
Starting point is 00:23:35 obvious ones. The software had a bit of a naughty side and would produce the most inappropriate choice when it had the ability to do so and So I spent an entire day of work saying every profanity word you could come up with into the system Programming them out. So if it hurt it would just do nothing exactly Neat and then there were some weird ones that they had to program out as well. The word garage.
Starting point is 00:24:07 Because when you're captioning local news, a lot of things happen in the garage. The fire started in the garage, the man hid in the garage. But when Stephanie would echo the word garage into the computer, the software would, nearly without fail, print crotch. Creating some wonderful misunderstandings. A Moline couple has transformed their crotch into a haven for rock climbers, helping to address a community need that we weren't even aware of. And I mean, this thing, voice writing, well, it became the industry standard for closed
Starting point is 00:24:46 captioning. I mean, if you ever saw a closed caption after 2003, it was probably put there through this technique. At our peak, we had over 150 voice writers. And that was across the country. We had a lot of people in California aspiring actors and they were probably captioning 400 to 500 hours a day. A day?
Starting point is 00:25:13 A day, yeah. Meaning thousands and thousands of hours of television each week were accessible to the deaf and thousands and thousands of hours of work were spent by these voice writers really forming relationships with their machines. I think the best voice writers really learned how to move with the software, almost like dancing. almost like dancing. Because it wasn't enough to just tell the software how to respond to you.
Starting point is 00:25:53 You needed to respond to it to really achieve the highest accuracies when you were on the air. So is this how we are still doing it? Are the captions going through this like anonymous like office building full of human parrots? Well it's no longer really an office building because the pandemic has made a lot of this work remote now. And the pandemic changed more than just where the captioning was being done. When the pandemic hits, due to everything going online, due to all of the constant press conferences happening, there is once more a flood of stuff that needs to be closed captioned.
Starting point is 00:26:45 And now they don't have enough voice writers to cover all of this stuff. And so they are once again in this position of, oh man, how do we keep up? And by 2020, that technology they had started playing around with back in the early 2000s, just like the black box, AI running the feed directly into the computer. It works pretty damn well. It works well enough that you basically no longer need a human in the system at all. Meaning this dance, it's winding down. It's coming to an end.
Starting point is 00:27:26 Today Meredith says AI is doing around 50% of the closed captioning the National Captioning Institute is hired to do. Wow. And they haven't hired anyone to fill any roles that have become vacant in the last two years. Another human bites the dust. Yeah, I think... It's tough because I, as a person, I, as a professional, am thinking and worrying a lot
Starting point is 00:27:59 about how these new AI tools are going to impact me, my livelihood and my craft. And, well, this is in one way a story about a bunch of people being replaced by those sorts of tools. It's also a little bit of a story about how to use those same tools with a smile, to like approach those tools with some excitement and with some creativity. The tools that are replacing you. They may eventually, but yeah, why shouldn't you enjoy your time with the hand grenade
Starting point is 00:28:35 before it goes off? Okay. Wait, wait, wait. But what's your analogy here? Sure. I think what I'm trying to say is that our voice writers, they were trying to get their machine to produce accurate text. And of course now, we are asking AI to do all sorts of other things for us. From designing a drug to helping us process our feelings,
Starting point is 00:28:59 to making a picture, to writing a song. But, like, it can't do those things well without us. It needs us to help it, to play with it. Yeah. And I mean, well, it is so easy to just be down or scared or turned off by these new tools. Or opposed to them for running on stolen human work and guzzling energy. Sure, yes, that too. But I think regardless of how you feel about these tools ethically, what these voice writers
Starting point is 00:29:33 show is that back and forth, that dance, it can yield some very unexpected and world-changing results. Positive world-changing results, like millions of people having access to information they otherwise would not have had. It is pretty tremendous what it's done for the disability community. And I do have to say, like just a few weeks after chat GPT came out, this one professor I talked to who worked at a community college was just like, you know, for my ESL students, this is a game changer. Like it's just awesome. This is an access thing. It's an empowerment
Starting point is 00:30:10 thing. It is good. It is opening doors, you know. So the access point of view, that is a nice way to not just feel afraid. I'll give you that. Yeah. And to be clear, I'm not here to say don't be scared or that the machine isn't going to eventually steamroll all of us. But we're not there yet. All we have is now Lulu. And so maybe we should do our best to take a cue from these voice writers and, you know, dance with the machine for a bit. This episode was reported and produced by Simon Adler, with original music and sound design by Simon Adler.
Starting point is 00:31:14 It was edited by Pat Walters and fact-checked by Anna Pujol-Mazzini. Special thanks to Elsa Soonason. And by the way, if you'd like to read this week's episode or pass a more accessible version along to a friend, you can, as always, find a transcript on our webpage or a closed caption version on YouTube. Hi, I'm Jonathan, and I'm from St. Louis, Missouri. Radio Lab was created by Jad Abumrad
Starting point is 00:31:40 and is edited by Soren Wheeler. Lulu Miller and Latif Nasser are our co-hosts. Dylan Keefe is our director of sound design. Our staff includes Simon Adler, Jeremy Bloom, Becca Bresler, W. Harry Fortuna, David Gabel, Rebecca Lacks, Maria Paz Gutierrez, Sindhu Nanyo Sambadam, Matt Kielty, Annie McEwan, Alex Niesen, Sara Khari, Sarah Sandbach, Anissa Vita,
Starting point is 00:32:08 Arianne Wack, Pat Walters, Molly Webster, Jessica Young, with help from Rebecca Rand. Our fact checkers are Diane Kelly, Emily Krieger, Anna Pujol Matsini, and Natalie Middleton. Hi, I'm Daniel from Madrid. Leadership support from Radialab Science Programming is provided by Kelly Middleton.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.