Close All Tabs - Do You Hear What I Hear? Audio Illusions and Misinformation

Episode Date: May 28, 2025

Are you old enough to remember the “Magic Eye” optical illusion mania that gripped the nation in the 90’s—random patterns that you had to squint at just right for the 3D image to pop out?  It... turns out it's not just our eyes that can be fooled. Our ears can play tricks on us too. There's a whole world of auditory illusions that seem to say one thing when they're really saying something else, and that matters, especially in our age of digital misinformation. In today’s episode, Morgan talks to KQED Digital Community Producer Francesca Fenzi about why we hear  what we think we hear, and how understanding the limits of our perception might actually make us better at spotting dis- and misinformation online. Guest: Francesa Fenzi, KQED Digital Community Producer  Read the transcript here Want to give us feedback on the series? Shoot us an email at CloseAllTabs@KQED.org You can also follow us on Instagram Credits: This episode was reported and hosted by Morgan Sung. Our Producer is Maya Cueva. Chris Egusa is our Senior Editor. Additional editing by Jen Chien. Sound Design by Maya Cueva. Original music by Chris Egusa, with additional music from APM. Mixing and mastering by Katherine Monahan and Brendan Willard. Audience engagement support from Maha Sanad and Alana Walker. Katie Sprenger is our Podcast Operations Manager. Holly Kernan is our Chief Content Officer. Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Support for KQED podcasts comes from Star One Credit Union. Give your savings account the love it deserves. When you keep your money with Star One, you keep more of your money. Star One Credit Union in your best interest. So good, so good, so good. Everything you want for summer is at Nordstrom rack stores now and up to 60% off. Stock up and save on the brands you love like Vince, Sam Edelman, Frame and Free People. Join the Nordy Club to unlock exclusive.
Starting point is 00:00:30 discounts, shop new arrivals first, and more. Plus, buy online and pick up at your favorite rack store for free. Great brands, great prices. That's why you rack. From KQED. My colleague Francesca Fenzie, digital community producer at KQED, spends a lot of time online. I'm on KQED's Discord server on Reddit, all the various chat threads and usual places. After President Trump was sworn into office earlier this year, she noticed a new one. She noticed a new wave of viral content about him and his administration. There were clips claiming all kinds of things. One of these viral clips shows Musk and his son, X, during a visit to the White House.
Starting point is 00:01:13 X was four at the time. While Musk speaks to the press, X runs around the Oval Office and makes his way to the president's desk. At one point, he peels off and he makes a side comment to President Trump. This is all happening while Elon Musk is speaking, so it's very hard to hear what he's saying. But a lot of people started interpreting what the boy was saying as saying, you're not the president and you need to go away to Donald Trump. Part of the presidency is to restore democracy. Again, X is very young. This was a four-year-old babbling at the president.
Starting point is 00:01:51 Another moment from the same meeting seems to show X telling the president to shush his mouth. This is from one and years the next are the same. I did watch this clip. It does seem like he might have said something like this, but my audio producer brain also lit up right away because I can tell you that that clip was not clear. It was not the kind of thing that you can clean up in post. So when people started quoting this as fact as something that was happening,
Starting point is 00:02:21 my interest was piqued. The whole thing set Francesca down a research rabbit hole, which relatable. This clip felt different from other pieces of viral misinformation. she'd see it online. We've been really focused on AI's potential to spread false information and how it can trick people into believing things that never actually happened. But even without AI, our senses can be fooled, the old-fashioned way. And what I was interested in in this situation was just trying to understand how our brains
Starting point is 00:02:49 process what we see in here and how those senses can be manipulated, not by AI, but just by old-fashioned audio and video tricks. and so by research spiral led me to this whole world of auditory illusions. Remember those magic eye posters? They were those optical illusions that looked like random patterns until you unfocused your eyes just right, and a hidden image popped out. Well, it turns out it's not just our eyes that can be fooled. Our ears can play tricks on us too.
Starting point is 00:03:25 There's a whole world of auditory illusions that seem to say one thing when they're really saying something else. And that matters, especially in an age of misinformation. In today's episode, we're looking at some phenomena that can completely change what we think we hear. And we explore how understanding the limits of our perception might actually make us better at spotting disinformation online. This is close all tabs. I'm Morgan Sung, tech journalist, and your chronically online friend here to open as many browser tabs as it takes to help you understand how the digital world affects our real lives. Let's get into it.
Starting point is 00:04:14 Okay, so we're going to start with something that feels super commonplace, but might not be as reliable as we think. New tab. Can we trust lip reading? Okay, so in this video with Elon Musk's son and President Trump, a lot of people were really relying on a combination of very poor audio and lip reading to decipher what he was saying. So let's start with lip reading. Francesca, how reliable is it? Well, it's estimated that only 30 to 40 percent of speech can actually be lip read, and that's even under the best conditions. Lip reading is a really useful tool for people who are hard of hearing because it helps to piece together context around other pieces of information, right?
Starting point is 00:05:03 Like partial audio or even like hand movements in sign language. So some lip readings seem really good when we think that we have the context associated with them. One example of that is there's a TikTok creator who I really like. Her handle is It's Jackie G. And she interprets celebrity red carpet moments. So she'll take moments of celebrities being recorded and will lip read the conversation that's too far away from the camera for us to be able to hear accurately. Here's Jackie G. Lip reading Zendaya at this year's MetGala.
Starting point is 00:05:39 She's so fab. So fabulous. I love it. It's so funny. She's, I would say, when we, then it cuts off. The reason that those work is because as a fan of a certain kind of celebrity, you are probably a little bit aware of how they feel about the movie that they're promoting or their relationship to other stars who they might be interacting with.
Starting point is 00:06:03 And they seem really plausible because of that. But the more removed lip reading is from its context, the less you understand about the true nature of the relationship of the speakers, the more likely you are to be misled. That's part of what makes the YouTube series bad lip reading so possible and successful. Bad lip reading is a YouTube channel that intentionally misinterprets what people are saying in movies or TV shows and then voices them over. Like this scene from Star Wars, a conversation between Obi-1 Kenobi and some stormtroopers. Hey, guys, we're collecting donations for the Java Orphanage. Do you have any spare change?
Starting point is 00:06:39 Hey, you should know that you stink kind of like fish. Wait, what? Everyone knows it except for you. You're removing context from the audio that you're hearing and you're replacing the story with sound that maybe mirrors some lip movements, but is totally nonsensical to the scenario. And that's the source of the humor in those videos. I spoke to Nicholas Davidenko, who's a researcher at the high-level perceptions lab at UC Santa Cruz. He studies auditory illusions, and I asked him why bad lip reading videos look so convincing. The reason they work so well is because lip reading is a much more ambiguous cue.
Starting point is 00:07:22 So there's actually a lot of words that could fit the shapes of my lips as I talk. Davidenko studies this type of phenomena in his lab. He's been researching how to help people with something called misophonia. That is when you have an extreme negative reaction to certain sounds. There's some sounds that all of us find a little bit unpleasant. but these are folks who have a really extreme triggered reaction to sounds like chewing or teeth mashing. Those are some common ones. And Davidenko found that you can actually pair a different kind of video with the audio that would normally trigger misophonia for somebody. So, for example,
Starting point is 00:08:05 if the sound that triggers you is that sound of chewing, you can replace an image of somebody chewing with another plausible sound source. In his lab, they use an example of somebody stepping on leaves to kind of mimic that crunching sound that might originate with chewing on food. And by swapping that image out, people start to interpret that sound differently. If the visual signal is telling you something,
Starting point is 00:08:32 you trust it more than the auditory signal. And when there's a conflict, you tend to go with whatever visual system is selling you. Got it. So the thing we're seeing with our eyes is overruling what we're hearing. And that's because of something called the McGurk effect. There was this famous study in 1976, McGurk and McDonald, and what they found was that what we see can actually change what we can hear. In fact, I'm going to demonstrate. Can I play you a video, Morgan? Please do. I'm going to play you a clip. And in this first one, I just want to hear what sound you hear.
Starting point is 00:09:06 Ba, ba, ba, ba. It's a close-up of a person's mouth. Ba, ba, ba. Okay, so I'm hearing ba with a B, like baby. Yeah, yeah, that's right. Okay, so now I'm going to play you a different clip and tell me this time what you hear. Ba, ba, ba, ba, ba, ba. Okay, now I'm hearing fa, like with an F, like fabulous.
Starting point is 00:09:31 Right, so it's actually the exact same audio. So if you were listening and you heard exactly the same thing both times, you're not crazy. Morgan's being tricked. The audio, that is wild. Yeah, it's crazy, right? It happened to me too the first time I watched this. The audio is actually exactly the same. But what listeners aren't getting in this case that you are is the lip motion is different from one to the next.
Starting point is 00:09:58 And that's actually changing the way that you're hearing the audio. So when you see that B shape being made with a mouth, you hear B motion. you hear bah, and when you see that F shape being made with a mouth, you're actually hearing it like Fah. Right. Like I saw the person's bottom lip hit their teeth. It's like, yeah, that's the F shape. That's the effect that bad lip readings take advantage of is they're taking those lip shapes with plausible sounds and they're kind of swapping them for things that are similar phonetically. So experiencing the McGirk effect in a bad lip reading video is pretty funny. But I can imagine. imagine that if this falls into the wrong hands, it can go very poorly. For sure. Something else that Davidenko explained is that we can be misled by someone telling us
Starting point is 00:10:46 what to hear or see ahead of time. That's kind of playing into this idea that contextualizing those clips change the way we hear it too. So it's not just the McGurk effect. It's also the expectation we have coming into a video. In this case, Davidenko also worked on an experiment in his lab called Mind Control Motion. That's how he named it. mind-controlled motion. And in that experiment, what researchers did is they showed people a set of randomly refreshing images that were just pixels on a screen. So they're just truly randomly refreshing pixels popping up and disappearing. There's no logical motion behind them. But when researchers said something like left, right, left, right, or up, down, up, down, over and over,
Starting point is 00:11:31 when people viewed these images, then people were actually seeing the motion that they were told to see. And he said that there was a 90% compliance rate, meaning like 90% of the people who watched these and got those prompts saw the motion that they were being told to. Yeah, this really makes me think of those like kind of rage bait body language reading videos that we always see online. Like an infamous example is couch guy where a girl walked into her long distance boyfriend's apartment and he just didn't seem as excited to see her as people thought he should have been. All right, there his arm goes to the side of his pants, grabs his phone from O'Girl, acts like he's laughing to pull it up through the middle, and then boop, there went her hand.
Starting point is 00:12:15 And some people were like, no, that's a totally normal reaction. Like, look at his body language. He's just surprised. Well, as other people were like, this is, like, he hates her. Look at his body language. Yes. Does this explain, like, why people see completely different things in the same viral videos? Yeah, yeah.
Starting point is 00:12:31 It doesn't have to be a deep fake to be misleading. It can be a real situation that really happened. But how something is presented has a lot to do with influencing our perception of the relationship of the before and after that surround that moment. Okay. So when I'm watching something and also listening to it at the same time, it's like, yeah, my eyes can deceive me. But at least audio on its own is safe, right? Unfortunately, not exactly. Morgan, do you remember the whole Laurel Yanny thing that broke the internet a few years ago?
Starting point is 00:13:07 Oh, it feels like a lifetime ago. Yeah, it was like the dress, but for your ears. And we're going to hear about that after this break. Support for a key QBD podcast comes from Xfinity. Thanks to the Xfinity five-year price guarantee, you're guaranteed five years of reliable Wi-Fi with our best equipment, no annual contracts, and no fees. Plus, get online in minutes with same-day Wi-Fi.
Starting point is 00:13:36 Lock in your price and unlock the possibilities. Xfinity, imagine that. Restrictions apply. Select plans only. You also having trouble with scammers trying to poke holes in your dam? We need a phone plan that stops these pests at the perimeter. That's why I switched to GoogleFi wireless, a wireless plan built with industry-leading security.
Starting point is 00:13:58 Google AI helps block pesky scammers so my info stays secure. And best of all, unlimited plans start at just $35. a month. Whatever you do, your site with Google. Explore Google file wireless plans today. Plus taxes and government fees. Block spam known to Google may not detect all spam calls. Okay, welcome back. Time to open a new tab. Can people hear different things in the same audio? So I remember this whole Laurel versus Deany thing back in 2018, it feels like forever ago in internet time. So can you remind us what it was all about? Yeah, so this actually started with a group of high school students who were studying vocabulary words for their English class. They were on vocabulary.com. And they were sending
Starting point is 00:14:45 recordings of different words to each other on Instagram. And they discovered that when they recorded one word, they were hearing totally different things, one person to the next. So some people were hearing the word laurel and others were hearing the word Yanny, which are so different that it kind of kicked off this debate in their friend group. And then, eventually someone posted to Reddit and strangers started weighing in as well. And it became this divisive litmus test of sorts where people were hearing either Laurel or Yanny and then being fiercely adamant that it was not the other. Actually, I'm going to play the clip for you now. I want to hear what you hear in this. Yerry. Yerry. Yerry. Yerry. Yerry. Yerry. You know what's crazy. When I first heard this,
Starting point is 00:15:35 what was this, like seven years ago in 2018, I swore he was saying Annie. And now I'm hearing Laurel. Okay. I think I may have an explanation for that. I hear Laurel, too. And the unflattering reason behind that, or one of the theories, is that it may be related to our age. So I hate to tell you, but I think you've arrived in Millennialville. It's time. It's coming for us all. So when this was for circulating the New York Times and Wired and a bunch of other. news outlets took it super seriously getting to the bottom of what was happening. And the New York Times actually created this tool to help people hear both sides. So if you only heard one or the other in this clip, I can play a clip of audio that shows you what it should sound like moving from the Laurel to Yanny Spectrum. Larry.
Starting point is 00:16:28 Yerry. Yeah. Yeah. Okay. I was hearing, I was hearing like hints of Yanny. It was so weird. It was like my brain was fighting the Yanny. Yeah, it feels like it's like fighting to come to the surface. Yeah. But eventually, like, it kicks over. Exactly.
Starting point is 00:16:55 So there's some actual science behind why some people hear Laurel and others hear Yanny, right? Yes. So it's still hard to know for sure what caused it, but there's a prevailing theory. And essentially two things are what people think is happening in this clip. One is that the recording of the recording has added and introduced new frequencies to the audio. So remember, these were high school kids who were recording. a clip played from online of a vocab word. The original word is Laurel. So if that's what you heard, you were correct. But when Laurel was recorded through computer speakers into a phone and then sent
Starting point is 00:17:30 across the internet, it introduced some additional frequencies to the audio. And our brains are choosing which of those frequencies to prioritize. And this is where the age part comes in. Older people tend to hear lower frequencies and less of the high frequencies. And then younger listeners, they have a broader range of those high frequencies available to them. So the theory is that if you hear Laurel, you're probably prioritizing those lower frequencies. And if you hear Yanny, your brain is prioritizing those higher frequencies. And younger people may be more inclined to prioritize those because they can actually hear more of them. My God. Sorry, I'm still coming in terms with my ancient decrepit ears. I know. It's the worst. So are there any bigger implications
Starting point is 00:18:14 for this phenomenon, or is it just like an oddity of the digital age? So it means that there's just more opportunity for ambiguity and misinterpretation. We're listening to audio if you think about it in all these different forms all the time. Now we're playing them through computer speakers from our phone speakers on crowded buses, in our car stereos, which means that there's a lot of opportunity for us to hear things differently. So is this why I sometimes miss hear song lyrics? Like I swore, God, back in the day, Taylor Swift's blank space, I swore she was saying Starbucks lovers. Yes, that is a common one. The actual lyric is long list of X lovers, but I get why people hear that.
Starting point is 00:19:00 And here I am. Like, I knew it was the wrong lyric. And for the last, I don't know, 10 years, I've just been like, I don't need to learn it. It's Starbucks lovers. Well, that's actually a different phenomenon, but just as fascinating. It turns out our brains do much. make up words that aren't there. And that's what's happening when you're listening to song lyrics sometimes. Let's hear about that in a new tab. Can my brain make up words that aren't there? You're definitely not the only person this happens to. I actually went around and I asked our colleagues about some of the songs that they've misheard. My brother and I, when we were little, used to play what we called the wrong song game. One that I remember was Bonnie Tyler's. It's a hard. It's a hard heartache.
Starting point is 00:19:50 It's a heartache. And we heard that as It's a Hard Egg. I learned this very, very late in life, which I'm embarrassed to admit. But yeah, the song lyric is revved up like a deuce from Blinded by the Light
Starting point is 00:20:07 by Manfred Man. Blinded by the light, wrapped up like a douche another room. For the vast majority of my life, I thought it was wrapped up like a douche. And I know I'm I'm not the only one because if you Google that lyric, you get a ton of hits for it.
Starting point is 00:20:24 Beat it by Michael Jackson. I always thought was beat it, just beat it. You don't want to beat it, beat it. But it's actually no one wants to be defeated. When Chapel Rowan song Hot to Go was really popular, my then five-year-old, like love that song, would always sing along to it. And then one time I was singing along to it and I said, Hot to Go. And she was like, mom, that's wrong.
Starting point is 00:21:02 It's out to go. She wasn't like catching on to the spelling. So she was just like, you're wrong. It's out to go. Do it right. Growing up, one of the childhood bangers was Tea Payne, Buy You a Drink. I saw a T-Painty tweet about it in like 2017. And then I found out that instead of just harmonized some random lyrics,
Starting point is 00:21:20 ooh-wee, it was actually, and then. Wait, what? Exactly. What? I'm glad I'm not the only one. So is there a name for this phenomenon too? Very perceptive. I feel like you're getting the hang of this.
Starting point is 00:21:38 It's called the Mondagreen effect. And that's when you mishear phrases or words and so assign them a new meaning. Sometimes it's hearing words that you do know, but in an order that's not what was actually being said, like Starbucks lovers. Like those are real words. And then sometimes it's just inventing a totally new word, which is what happens to me the most. I just make up a new thing. So this is most common with song lyrics. And it comes from the mid-century American writer, Sylvia Wright.
Starting point is 00:22:08 She coined this term based on her childhood. She remembered mishearing the line in a Scottish ballad called the Bonnie Earl Amore. And there's a line in the song that goes, laid him on the green, which she interpreted as Lady Mondegrine. which is another really common way that we do this, is we just make up formal nouns or things that feel like names when we don't really know what's being said. So Morgan, researching the Mondagreen effect actually led me to researcher Diana Deutsch.
Starting point is 00:22:53 She is at Stanford and UC San Diego, and she is like the audio illusion researcher, been doing this for decades. She discovered that when you take two audio sources and you play the same word or syllables slightly out of sync, after about 10 seconds of listening, people start to invent phantom words in that overlap. They start to hear different things. Okay, here's a clip from one of these audio experiments. And even with the same audio, the phantom words that they're hearing are often unique to the listener.
Starting point is 00:23:34 So here's Diana listing some of the words that people have reported hearing in the same piece of audio. Window, welcome, love me, run away, no brain, rainbow, raincoat, Bueno, Nambri, when or when, mango, window pain, Broadway, Broadway, even Rogaine. That is fascinating. So why would people hear totally different phantom words in the same audio? This is similar to something called periodolia, where people perceive familiar patterns in random or ambiguous stimuli. This happens a lot with visual things.
Starting point is 00:24:17 So like seeing a face on the moon or Jesus in a flower tortilla. It turns out that that can happen with audio too. Generally, people hear words or phrases that refer to things that are on their mind. So, for example, if someone's on a diet, they might hear. the phrase, feel fat. And it often happens when I present these to a group of students close to exam time, they'll hear things like no brain. So these illusions show that when people believe that they're hearing meaningful messages from the outside word,
Starting point is 00:24:59 their brains are actively reconstructing sounds that make sense to them. Diana says that the patterns we hear are influenced by things like our mood, what we've thought about or discussed that day, whether we're tired or sad or scared. We assign meaning to sounds based on our internal narratives. For example, that's part of what might be going on when people report hearing electronic voice phenomena in ghost hunting. Being scared or heightened or thinking about ghosts may lead you to hear certain phrases in ambiguous audio. Gosh, that's amazing. Strange, right? How it says...
Starting point is 00:25:41 Clearly a voice. I hear something negative, like no or don't. And Diana actually told me that one thing that's quite common is for people who have recently experienced a loss, they're more likely in their day-to-day lives to hear what sounds like words or phrases or even voices associated with their lost loved one. Earlier, you mentioned that having less information, like audio without visual cues, makes us more likely to assign new meaning to what we hear. So let's bring this full circle.
Starting point is 00:26:15 So after all of this research, what would you tell someone who's absolutely convinced that they know what Elon Musk's son said in that clip? I guess I would say that that probably says more about you and how you feel about Elon Musk or President Trump than necessarily what the three people in that audio clip were saying. It doesn't mean that you're wrong. It's just that we don't know.
Starting point is 00:26:36 There's no way to really hear truth in an audio clip that convoluted? And what are some of the ways bad actors can purposely take advantage of how suggestible our senses and our brains are? I think that the easiest way to be manipulated is when somebody is taking a clip or a small snippet of something and then abbreviating the context and telling you what goes before and after. And social media is designed in this way to give us bite-sized samples of the world. But when you're taking just a bite, it means that you might be missing the whole meal around it and you might get the flavors wrong. You might kind of misunderstand what's being served to you. How can understanding all of this brain trickery help us spot
Starting point is 00:27:25 actual misinformation? Yeah, that's the ultimate question, right? We can't trust our eyes. We can't trust our ears. We can't trust AI. I think that the real takeaway, I guess, that I have after this research spiral is trust but verify. Double-check your own thinking when you're encountering one of these clips online. Is it too good to be true? Maybe it takes a little extra Googling to see if you can get to the bottom of it. And maybe, I guess, another takeaway I would have is being okay with a little bit of ambiguity. Sometimes there are mysteries that Google can't answer for us, just like the answer of what was really said in the White House in that moment. No amount of Reddit threads can do it. But knowing that there are unknowns, I feel like being aware that people claiming to have a
Starting point is 00:28:18 definitive answer might not be telling the truth. Well, Francesca, thank you so much for joining us. Thanks so much for telling us all about these crazy auditory illusions. Thank you for having me, Morgan. This was super fun. Voices from KQED staff in this episode included Susie Britton, Mark Nieto, Marlena Jackson Rotondo, Ryan Voh, and Blanca Torres. Francesca runs KQED's Discord server and Close All Taps has its own channel. Come say hi, share your thoughts, and chat with other listeners about the show. Join us at Discord.g.g slash KQED. Now, let's close all these tabs.
Starting point is 00:29:06 Close All Tabs is a production of KQED Studios and is reported. and hosted by me, Morgan Sung. Our producer is Maya Cueva. Chris Aguza is our senior editor. Jen Cheon is KQED's Director of Podcasts and helps edit the show. Sound designed by Maya Cueva. Original music by Chris Aguza.
Starting point is 00:29:24 Additional music by APM. Mixing and mastering by Brendan Willard and Catherine Monaghan. Audience engagement support from Mahas Sanad and Alana Walker. Katie Springer is our podcast operations manager and Holly Kernan is our chief content its officer. Support for this program comes from Be Wrong Who and supporters of the KQED Studios Fund.
Starting point is 00:29:47 Some members of the KQEEE podcast team are represented by the Screen Actors Guild, American Federation of Television and Radio Artists, San Francisco, Northern California local. Keyboard sounds were recorded on my purple and pink dust silver K-84 wired mechanical keyboard with Gatoron Red switches. If you have feedback or a topic you think we should cover, hit us up close all tabs at kQED.org. Follow us on Instagram at close all tabs pod. And if you're enjoying the show, give us a rating on Apple Podcasts or whatever platform you use. Thanks for listening. Support for KQED podcasts comes from Star One Credit Union. Give your savings account the love it deserves. When you keep your money with Star One, you keep more of your money. Star One credit union in your best interest.
Starting point is 00:30:46 Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals because we're built for what you're building. Fit for your ambition for Citizens Bank. Enjoy more ways to save at Ralph's like low prices in every aisle. And when you download the Ralph's app, you can clip and save more with digital coupons every week. Plus, you can earn fuel points to save up to $1 per gallon at the pump. At Ralph's, you can enjoy more ways to save and more rewards every time you shop. so it's always easy to save big every day with savings and rewards.
Starting point is 00:31:24 Ralph's SoCal for over 150 years. Savings may vary by state. Fuel restrictions apply. See site for details.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.