Instant Genius - To become Prime Minister, change your voice

Starting point is 00:00:00 You said this place was steps from the water. We just haven't found the steps yet. How much did we save? Enough. Enough to get lost. Or you could book a stay with Hilton. Welcome to your oceanfront room. Just steps from the water.

Starting point is 00:00:16 The Hilton sale is on now. Book on Hilton.com or the Hilton app and save up to 20% to get the stay you expected. When you want savings, not surprises. It matters where you stay. Hilton, for the stay. Study And play.

Starting point is 00:00:32 Come together on a Windows 11 PC. And for a limited time, college students get the best of both worlds. Get the Unreal College deal, everything you need to study and play with select Windows 11 PCs. Eligible students get a year of Microsoft 365 premium

Starting point is 00:00:47 and a year of Xbox GamePass Ultimate with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer. While supplies last, ends June 30th, terms at AKA.m.S. College PC. When you finally find your thing, you want the whole world to know about that thing. So you use a thing called Canva to make it an even bigger and better thing.

Starting point is 00:01:10 Whether you want to create flyers for that thing, make presentations for that thing, or design merch for that thing, you can do anything. So people can see your thing, feel your thing, love your thing. The next thing you know, it's a thing. Canva, the thing that makes anything a thing. is sponsored by Name, Audio and Focal. Streaming has made music more accessible than ever, but true listening is about more than ease. It's about quality.

Starting point is 00:01:42 British audio experts name Audio, alongside French acoustic specialist Focal, combine handcrafted tradition with cutting-edge innovation and high-end materials, delivering digital precision with analogue warmth, so you can experience exceptional sound at home. Music just as the artist intended. Visit name audio.com to learn more.

Starting point is 00:02:03 Take your voice assistant on your iPhone or your Google system or your Alexa Echo. Already those voices are pretty good, but you know they're not quite right. What Google added was to add some of the sort of back-channeling we do as we talk. You go, uh-huh, mm-hmm, some of the sort of disfluences that you naturally say. You're listening to the Science Focus podcast from the BBC Focus magazine team. We're the UK's best-selling science and technology monthly, available in print and in several digital formats throughout the world. Find out more at sciencefocus.com or look out for us in your app store.

Starting point is 00:02:40 Hello and welcome to the Science Focus podcast. I'm Alice Lipscomb Southwell, the production editor of BBC Focus magazine. If you've ever felt the shock of hearing your own voice played back to you, you'll realise how important your voice is to your identity. We judge others based on their pitch, intonation and accent, and even use that to decide whether or not to trust them. Trevor Cox has traced speech from its beginnings in Neanderthals all the way through to its adoption by machines.

Starting point is 00:03:06 He'll explain the development of speech in the womb and why the mother's voice is so important and you'll look at the voice in old age and tell us why we should all be joining choirs. Editorial assistant Helen Glennie chats to Trevor to find out more. Can you start off by telling me what your area of expertise is? My area of research is acoustic engineering, so I'm interested in making things sound better

Starting point is 00:03:36 and that might be designing a concert hall so the music is enhanced by. like the acoustics, or it might be designing a classroom where you're trying to make the speech intelligible, so the pupils and the teacher can hear each other. Or it could be everyday products like, I don't know, designing a washing machine, so they're less annoying. Nice. And how did you become interested in sound engineering? I think like a lot of my colleagues, you start off being both a scientist and a musician, and you kind of, in the end, you combine your two interests. And in my case, I was interested, I was a classical musician. I was interested in

Starting point is 00:04:11 and how concert halls were built. And so I got into architectural acoustics, and that was my way into this field. And so you still do a little bit of that stuff, the architectural acoustics? Yeah, I got into designing treatments that are used to improve the sound in concert halls and in studios.

Starting point is 00:04:27 And my designs are still heavily used around the world. And when people within acoustics know me, I've written a book that is very commonly used to design these treatments. Now, another book that you've written really recently is called Now You're Talking. Can you explain to me what the book's about? I've done research on speech before in terms of how to make it intelligible, but I kind of realised I didn't really understand that much about verbal communication. And I got really interested in the sort of thinking about, well, when did we learn to speak?

Starting point is 00:05:01 How do we learn as individual to speak? But also, speech is changing. If we think about what we're doing now, we're saying, things like, I love you to Alexa and sorts of really strange things. Artificial intelligence is changing our communication. And it just struck me it was a really good time to be writing a book about this subject. And why do you think all of this speech science is important? Well, in the end, it's, you know, our primary way of communicating is to talk. We're chatting now, you know, we're many hundreds of miles apart, but we're chatting by speech.

Starting point is 00:05:34 And particularly in ancient times before I was writing, speech was the only real way. of communicating. And it's, no, it's about us, isn't it? It's a very personal way of communicating. If we think about how we interact, we give way so many signals as we're talking. It's not just about the words I'm saying. You're starting to pick up on my accent, how I'm saying the words, do I sound excited? Do I sound happy, sad? Voice is such an important thing for us to sort of get along as a social species and to communicate. Now, you talk about the three ages of the voice, and the first one of those is infancy. Now, I was really surprised to learn how much babies can hear in the womb. So can you explain a little bit about what you found out about that? Yes, well, from about the third

Starting point is 00:06:18 trimester, your baby has hearing. And so it can start hearing the mother's voice. So when I'm talking here, my body is vibrating, you know, bits of my skeleton are vibrating. So if I was a mother, that vibration would pass to the infant through the amniotic fluid. So it can start hearing the sort of vague sounds of the mother's voice. And in fact, when a baby is born, it can recognise the mother's voice in preference to others. It doesn't recognise the fathers because it hasn't had that intimate sort of connection with the mother's voice. And I guess there's other sounds in there as well. There must be lots of fluid, slushing sounds.

Starting point is 00:06:53 And of course, the sound of the heartbeat, you know, the maternal heartbeat must be one of those dominant sounds that you hear for the end of your time in the womb. So it sounds like quite a comforting sound for babies to hear. Is that then used in medical practice? Can you use it to soothe babies? You can buy CDs, which play maternal sounds, in an effort to sort of kind of soothe your baby. And I've heard anecdotal reports that they work quite well. But I didn't try my children, so I can't tell you.

Starting point is 00:07:20 And I don't think anyone's done these scientific studies. But certainly if you have premature babies like I have, there's increasing knowledge that actually sounds are really important to the babies while they're lying in the incubators in special care. If you don't play vocal sounds to these babies, you're going to inhibit their development of speech initially because, of course, in the womb they're starting to pick up on speech. But also, neonatal special care is a really noisy and not very nice place. Incubators are noisy with the ventilation systems.

Starting point is 00:07:53 And actually playing some of the sounds that you might get in a maternal womb can also help soothe. That's amazing that you say that babies are saying to start to learn about speech while they're still in the womb. So do you think that means that, you know, talking to your, you know, wife's pregnant belly and that sort of thing, you think that is all really important? I think it's an important part of bonding, but obviously fathers don't do it often enough for the babies to sort of recognise it. So you'd have to be quite committed, I think, to playing, you know, the father's voice a lot. I think we've got to be slightly careful here as well because you can you can buy devices which are, you know, strap onto your belly and things like that to play music at your baby. I'd be a bit worried about doing too much augmentation of what normally happens. One thing we don't know how sense the baby's ears are,

Starting point is 00:08:41 so I certainly wouldn't be playing very loud sounds at them. But also we're kind of evolved to learn the starts of speech from our mother's voice. And so if we just overload the baby with lots of extra sounds, is that going to be optimal for them to learn? I suspect it won't be. Okay, and there are two other ages of the voice, adulthood and older age. what happens to our voice as we progress, you know, through the years? Well, the big change in, I guess, is at puberty where we become adults.

Starting point is 00:09:10 So that's when we get our adult voice. And for men, it's really obvious, isn't it? You hear that sort of what's commonly called the voice break where the adolescent boy, their voice drops by about an octave. But actually women's voices change as well. But it's less, kind of less dramatic. It's only a few semitones. So people don't notice it quite so much. And you get other changes in the voice.

Starting point is 00:09:31 after all, what we're turning into adults, we're turning into sexual beings. And so our voice changes, so you get a base, manly voice as a man, because that's about trying to attract a female, I suppose, a sort of signalling your fitness for your voice. And there's other changes to the female as well. So one common thing you get with teenage girls is that kind of quite whispery voice. And that's called by the vocal folds, the things that make your voice actually in your larynx,

Starting point is 00:09:58 actually not quite meeting and getting a little what's called a glottled chink in it, which means you get a slightly breathy voice. And that's kind of one of those sort of features a lot of teenage girls has as they develop their adult voice. And then getting further on into old age, what happens then? Well, I think what's remarkable about the voice is it kind of doesn't deteriorate very quickly. So I'm in my early 50s and I'm covered in wrinkles and various bits of my body are falling apart with old age.

Starting point is 00:10:24 But my voice is actually not that different when I was 20. Ironically, as I talked to, I got slightly croaky voice today, but normally it's kind of quite healthy. Eventually, it will succumb to ageing. And in a man, what happens is that, particularly, the vocal folds sort of fail to meet quite so well. So when these vocal folds are opening and closing and giving you sound in your larynx,

Starting point is 00:10:50 if they don't quite meet, your voice becomes quite inefficient because you're leaking air. And so what you'll notice in very older men is they tend to talk in much shorter sections. They take a lot more breaths, and that's one thing that changes in the voice. The other thing happens in the male in the older male, once you get maybe into your 60s and 70s and 80s,

Starting point is 00:11:09 the pitch starts to rise up. And that, again, is to do to the vocal folds, the actual shape of them, well, actually the sort of material of them changing and how they vibrate back and forth gives you a rising pitch. And does anything different happen in women as they age? Yeah, their voice age is, differently. So what generally happens is their voice pitch tends to fall over their, over their adult life. And it just keeps on going down and down and down because their configuration, their vocal folds, is

Starting point is 00:11:37 different. And you can see that from the outside. I, as a male, have an Adam's Apple, and you don't as a female. And that's to do with the way that your larynx reconfigures itself during puberty, and that's all to do with, in my case, signaling I'm a man by making my voice lower. And you say in the book that joining a choir is, the sonic equivalent of applying anti-wrinkle cream, which I quite liked. Why is singing so beneficial for the voice? Well, using it in any way is really beneficial for your voice. But choirs have a couple of things, I think, that's a particularly good. There's more and more evidence showing that people who are very trained in music

Starting point is 00:12:15 actually deal better with things like the deterioration of hearing when they get older. So one of the effects that I've started to notice is in pubs. It's harder for me to pick the speech out from the background noise. but musicians are slightly better at this. But you're also learning things like breath control. You've got to learn to hold the line over a long time. And the other advantage of choirs is it's a very social thing. And we all know that isolation is very bad for you as you get older.

Starting point is 00:12:40 But ultimately, the voice is controlled by a lot of muscles. And so by using your voice in a controlled way, you're using those muscles, you're using the neurological pathways that control those muscles. And therefore it's like kind of like taking exercise. you know, use it or lose it kind of thing. And I think that's one of the reason the voice kind of maintains itself to quite older life, unlike my other muscles of my bodies,

Starting point is 00:13:03 which are less strong, is because I use it every day and therefore I never fail to exercise it. Now, you talk about music quite a bit in the book, and especially how technology has influenced music. What new developments, technological developments, do you think have had the biggest influence on the music that we listen to? I think the biggest sort of change in music happened when microphones and amplification came in.

Starting point is 00:13:25 And particularly, I guess that's around them about the 1920s when it became widespread. And you suddenly have a change in singing style. So if you're singing to a large audience and you haven't got any amplification, then you have the problem of how do you reach the back of the audience without sounding really weak. And there's kind of singing, there are singing styles that do that. So the operatic style is one way of doing that. And essentially it depends on males and females use slightly different techniques.

Starting point is 00:13:52 but for a male, they'll tend to sort of lure the larynx, and they'll get that sort of rather plummy booming voice. And that gives you a particular sound, at particular frequencies, which kind of project above the audience and gives you the power. You also change other things as well, but there's various techniques you can use to reach the back.

Starting point is 00:14:11 But it gives you that very particular kind of singing style. Whereas as soon as you've got amplification, and I'm here talking into a microphone, I can talk really quietly, I can talk loudly, I can talk in lots of different ways. as my microphone picks it up, I can sort of create sound any way or like. And it's true of singing. So you get this very wide variety of singing styles because it doesn't matter how loud you are. So if you look at modern pop, there's a huge range of different voices out there,

Starting point is 00:14:39 most of which wouldn't survive going to a big audience if they didn't have the microphone. Now, you talk as well about the relationship between voice and identity. And you show that listeners make assumptions about a person based on their voice. But those assumptions can be wrong quite a bit of the time. Can you give us some examples of when we get those things wrong? Unfortunately, there's many examples. It really comes down to the fact that you're trying to make snap judgments about people and you're using the voice to trying to make judgments about who they are. Are they a member of my tribe or not? Those kind of in-group, out-group decisions you're trying to make with the voice. You might also be making decisions, but are they fit in a sort

Starting point is 00:15:18 of evolutionary, do you want to mate with them way? And so we make assumptions Like we assume men with big booming voices tend to be bigger and taller, though there is only a very weak correlation between voice pitch and the size of an adult male. We make assumptions about age. We assume that if we hear someone whose voices in good nick, that they're not as old as they really are. So we kind of assume that voice must deteriorate in the same way as we get wrinkles. And therefore, if their voice is still in good nick, they must be younger than they really are. So we tend to under-essing. estimate the age of older people. And then there's lots of sort of cultural factors that go along as well. One of the interesting ones I sort of researched and was fascinated by was the gay male

Starting point is 00:16:03 voice. So we have this situation where people assume that gay males automatically talk with a higher pitch. And they actually define this with actors. So if you get an actor asked to portray a gay male character, they'll automatically kind of raise their voice and talk in a kind of slightly raise pitch. But if you actually compare straight and gay men and actually do measurements, there is no difference on average between a straight and a gay man in terms of their pitch. And so the actors are signaling to what the listeners are expecting, but the listeners have kind of got this stereotype, which is wrong. And so there's lots of cultural factors involved on top of kind of the science. You talked on the book as well about a woman who is helping transgender

Starting point is 00:16:44 people adjust to their new gender identity by changing their voice. But You say that's quite a hard thing to learn. Why is it so difficult? The people who have probably the hardest job are the ones who are transitioning from male to female as an adult. Because during puberty, testosterone will have changed their vocal folds. And so they're trying to speak with their new identity or their identity they've always had as a female, but they're actually using vocal folds which are like mine, which are designed for a male voice. and it's not something you know it's something you have to train yourself to do you have to learn to make yourself talk in a different way and it's not as instant as changing your clothing for example it's not something as instant as going to get your hair cut in a different way or you know it's something that has to be learnt and it's a skill so like all all skills it takes time and it's hard to do and some of the some of the issues that come up when you talk to a speech therapist is it's of course

Starting point is 00:17:45 Transitioning is a huge kind of changing of your psychology. So not only are these people sort of trying to work together to improve the speech, but often they're having to act as counsellors because they're having to deal with some of the issues that are around the transitioning. Going back to the assumptions that we make about people based on their voices, you talk about a really interesting case, the one of Dwayne George, and it's probably the most disturbing example of vocal stereotypes causing real problems. Can you explain what happens in that case?

Starting point is 00:18:15 Yes, so sadly Drain George went to prison for 12 years incorrectly, and so he was convicted of a murder that he didn't commit. And it was all based on voice ex evidence, which was backing up other rather dodgy forensic evidence. So at the time of the actual shooting, he was sort of kind of heard to shout something out. And another witness claims to have recognized him from that. And it's got all those kind of things you often see in Mr. discourages of judgment. So the witness originally talked about it being a colored person, didn't say

Starting point is 00:18:50 who it was specifically. And then only later named Drain George as being the person who said this phrase. And there's all sorts of alarm bells should have been ringing. The first of all is if someone's in a highly tense situation, going to murder something, they're going to talk differently to maybe if you've overheard them in a conversation in the past. That changes the voice radically. So the jury should have been advised at that point that the recognition was likely to be very poor, but we're also seeing, you know, a generic coloured voices it was put in the trial, then being kind of changed to a Pacific person. So we have witnesses kind of honing in on something which isn't quite right. And there was also a huge time difference between when the

Starting point is 00:19:33 phrase was said originally and when the person was remembering having heard the person taught before. And our recognition of voices is pretty remarkable, but it's not that good. So he ended up going to prison for 12 years and was he later exonerated? Yes, the Cardiff University Innocent Project took up his case and showed that the voice evidence and evidence was flawed and he was released, but he did spend 12 years sadly in jail for it. So do you know if voice identification, does that have a role in court proceedings now? Sometimes is it reliable?

Starting point is 00:20:10 Yes, I mean, you do have voice identity parades and they can be very important. but they have to be done in a very controlled and strict way. And you have to be really careful using it in the same ways. You have to be very careful how you set up a sort of visual identity parades as well. So they're not used as often as you might think. But yes, forensic analysis of voice is really important. We think of famous cases like the Yorkshire Ripper, where someone who was a hoax who actually rang up and left a message.

Starting point is 00:20:42 there would have been a lot of work done on analysing the voice in that case to work out who it might be and where it comes from. So it's a very important area of forensics. Okay, and you talk about vocal charisma in the book too, and you say that politicians are masters of this because they're able to adapt their delivery to different situations. Now, if either of us wanted to run for Prime Minister, what would we have to learn to do with our voices? Well, I've got an advantage to start off with because I'm a man. That's going to go down really well with the audience, so saying it in the next, that way. But unfortunately, the brain uses what called heuristics, and therefore, kind of,

Starting point is 00:21:18 there's an assumption, which is, shouldn't be there, that if you have a male voice, it's more likely to be a leader. And why does our brain kind of think that? It's because, unfortunately, females still haven't reached parity in politics. There's still many more men than female. So there's a sort of immediate issue that my voice being at a lower pitch is going to actually be potentially more appealing to voters, which is very unvoted. fair on you because there's nothing wrong with your voice. But they've found that liring pitch actually helps with votes. They've done an analysis on Senate elections and shown that females and males with lower voices tend to get more votes. Okay. So do you think that's part of political

Starting point is 00:21:58 training for people who are hoping to enter a career in politics? Do people actually try and do this on purpose? Oh, people definitely have done it on purpose. So Margaret Thatcher, it would be the famous example in the UK where she deliberately lowered her voice. I mean, one of the sad things about the prejudice against women's voices in power is that if as a woman you talk in your natural frequency range and you try and do it in a thought of way, you might get labeled as being strident, which is not a very nice way of describing your voice. Whereas as a male does that, you don't get accused of, you know, the adjectives used are not so pejorative. So by lowering the voice and Maggie Thatcher learned it quite a lot, you avoid what, you know, this stridentcy trap

Starting point is 00:22:42 But that's not how it should be. And actually, Mary Beard, the classicist, has written quite a bit about this. You know, we haven't learned to hear a voice of a female in their natural range talking authoritatively as an authoritative voice. And hopefully over time, as we get more female politicians and more females in power, that stereotype will wane. But it's definitely there at the moment. Okay, and what about accents? There's such a broad range of accents in the UK. Which one should people be trying to master if they want a career in politics?

Starting point is 00:23:12 Well, I just think it depends on where your constituency is. And across America there's been some very famous cases, wasn't it? I think Hillary Clinton was accused of putting on a southern accent when she was campaigning in the southern states. And we all do this to a certain extent. We kind of change our accent a bit depending on where we're talking. So you can hear I've got a southern voice, and therefore I tend to see things like Bath. But every so often here, living in Manchester, have done for many years, I just switch to Bath. And so we all kind of have this sort of kind of tendency to switch our accents a little bit,

Starting point is 00:23:45 maybe tone it down, turn it down, turn it up a bit to try and match where, you know, where we're talking. And politicians do that. I mean, they're not necessarily trained to do that. It might be just something they just do naturally because we do it ourselves. We do it to try and fit in, to try and make a conversation flow. And so for politicians, it's not just axing and pitch, you know, speech delivery is important too. So what public speaking techniques are politicians using to sort of gain favour with their audience? Well, there's lots of techniques.

Starting point is 00:24:16 And I guess the classic one is the speech in three. So when Tony Blair was here, we had education, education, education. So things in threes are one of the kind of techniques that people use. And quite often if you're giving a big political speech, that's about signaling when you want people to applause. You may have seen videos of a politician who gets it wrong, and they're sort of pausing and waiting for the audience to clap, and there's no one responding. And that's because they're not orchestrating it right.

Starting point is 00:24:44 So if you have something in threes, your audience knows after the third point, that's the point to clap. So there's quite an interesting interplay between politicians speaking and actually audiences responding. And it's kind of, it's almost like the speaker is conducting the audience. And if they get it right, they get a huge round of applause. And if they get it wrong, the audience doesn't know whether to clap or not, and less people clap and it's less impressive.

Starting point is 00:25:06 Ah, that's a good tip. I feel like that's, you know, we should have been taught that in school when we were doing public speaking. Could have avoided quite a few awkward ends of speeches. Yeah, there's actually a technique. And it's not just about what you say. It's also in gestures.

Starting point is 00:25:21 So you'll see politicians sort of waving their hands around and, you know, I think those ones called like chopping the lettuce and stuff where they're sort of waving the hand. I'm doing it here, but on radio it doesn't make much good radio, doesn't it? But they're sort of kind of waving their hands around, putting it up in the air.

Starting point is 00:25:33 you know, there's all this kind of signaling, which is physical signals, go, you know, if you've got your hand out and your palm is down, that's kind of saying, don't clap now, we're not ready for this. And then you change your hand gesture and there are that this is the point I clap. So there's actually, there's physical conduction going on as well that you could have tried. Okay, so speech isn't purely the domain of humans anymore. In the book, you talk about robots that have been developed to imitate human speech and even some that are imitating humans singing. So where are we at with that at the moment? Well, I guess the most successful one is a system called Vocaloid, and you can download the software and use it now to actually create singing voices.

Starting point is 00:26:14 And there are, you know, superstars who use it. So you have to have it like a particular kind of sort of music. So the best example is probably Hatsuni Mickey, who's a Japanese pop. But it does other voices as well if you want. And, yeah, people turn up to Hatsuni's concerts and listeners. to sing, even though she's just a synthesizer and light projection, and are real big fans of it. So I think we've got some way with singing symphys, but we're quite a long way from getting the real voice of a soulful singer. So the thing about Hatsuni Miku is she's doing Japanese pop, that's slightly robotic singing. So it doesn't matter if her voice sounds a bit robotic. But if you

Starting point is 00:26:57 take a great singer-songwriter or a great singer like Adele, there's so much to her voice that we can't do with synthesis yet. So listening to Hatsunimiku, does she sound like a human at all? Or is she very identifiable as a synthesized voice? I think you would struggle against other Japanese pop to say she was a synthesizer. And that's because there is a robotic aesthetic already in the music. They also do some clever tricks. They use a real live backing band. So there's some humanity in there already. And if you listen to the top 10, you'll find there's quite a lot of voices, human voices, which have been added so many effects on them. It's quite hard to tell a human anyway, or there's certainly a lot of robotic aesthetic around in pop music. And so she's sort of

Starting point is 00:27:42 in a genre where it's a bit easier to hide the fact that she's a synthesizer. Now, last week, Google demonstrated its new voice assistant and showed that it can make appointment booking, so it sounds like a real person. And it seemed like the person on the other end of the phone had no idea that they were talking to a robot. Do you have any idea how they did that? Well, if you take your voice assistant on your iPhone or your Google system or your Alexa Echo, already those voices are pretty good,

Starting point is 00:28:16 but you know they're not quite right. What Google added was to add some of the sort of back-channeling we do as we talk. You go, uh-huh, mm-hmm. Some of the sort of disfluences that you naturally say, you know, when you're thinking, you're doing that kind, of thing, and that makes it sound more natural. So I think that was the first thing that Google did was to add in some of those. And the other thing that they did as well, which I think is quite a clever trick, maybe tricks are one word, is they did it down the phone line. So again,

Starting point is 00:28:46 if there's any roboticism on the voice coming out of their AI, then someone's going to just assume it's a mobile phone issue. So one of the things about faking voices and all this which is going on and synthesise voices, is when we hear a voice which isn't quite right, we don't know if it's the voice or it's actually what it's been transmitted by. So you have this, if you go down a telephone line, I bet that assistant just thought,

Starting point is 00:29:10 oh, the mobile phone line's a bit odd today and just accepted that the voice wasn't quite right. Okay, so this might be harder to do in person. You know, if you had a robot that really looked like a human, the voice could potentially give it away. I think at moment, yes, certainly, especially if you wanted to have a free speech and a free conversation with them. And one of the fields which is kind of worrying people in this area is faking speech.

Starting point is 00:29:34 So we've all had phishing scams by email where people are claiming to be friends who are lost or something and need money mailing to them. It's not going to be very long before we're going to start getting voice messages along the same lines on our mobile phones. And initially, I think we're going to get fooled by these if we're not careful because we're not used to the fact that voices can be faked. But also if it's a voice message, we'll be. kind of assume, well, they're in the, I don't know, they're stuck in Africa and they've had all

Starting point is 00:30:00 their money stolen, which is usually the kind of story, well, they're on a bad mobile phone line. So if the speech sounds a bit sort of disjointed and a bit distorted, well, that's kind of, it's a phone line problem. It's not the fact the voice is a sympathiser. Yeah, and there must be an effect as well of voices being more emotional than, you know, the text that you're writing over email. Do you think that would have an effect as well? Well, they'd have to get the emotion right. And again, with Google, when they were doing that, ringing up for a restaurant booking, you wouldn't expect someone to go into sort of great, I don't know, great bits of anger.

Starting point is 00:30:32 Maybe you couldn't get the booking, maybe. But that's quite a straight transaction where the voice can be quite neutral. The actual ranges of voice you use, you know, whether that's a kind of pleading voice, shouting voice, whispering, speech synthesis can't do that great diversity of vocal types yet. And there's no reason why they couldn't learn to do some of them, but it's not being done yet. So to get the real emotional response, motion right in AI, say if you want to make a robot actor, you would really have to work and get synthesis working much better than it does now. And you've seen some robot actors, haven't you, performing Shakespeare and things like that?

Starting point is 00:31:07 What were they like? I went and saw a couple of bits of robot theatre, one of which, yes, they were doing a last poll, York. I find it quite comical, actually, because I guess, yeah, that speech is all about, you know, mortality. and of course the robot is in a sense immortal, until I suppose I guess the software update means it doesn't work anymore. But I think the robot theatre is really interesting because robots are going to become more common

Starting point is 00:31:34 in our everyday life. And therefore I would expect robot characters to become more and more common in theatre. The difference between theatre and film, we're really used to seeing robots in films. Of course, you think of Star Wars and all those kind of stuff. But actually it's much more rare in theatre, real robots and the reason being is it's really expensive and difficult to do in theatre.

Starting point is 00:31:55 But I suspect we're going to see more and more of it. Okay, and in terms of the science of talking overall, where are we headed? You know, in 50 years' time, what's speech going to be like? Well, it will be different because what we know from having looked over the last, I suppose, 140 years have we been recording speech, something like that, is that the voice has changed over that time period. And I think we can assume that the voice has changed over, since we've had it, with the voice has been involving. So for maybe 500,000 years, the voice has been changing. It's

Starting point is 00:32:26 just only now that we have recordings to see it. So I would expect the next 50 years to sort of kind of see changes. And that could be changes in things like accents. So if you go back and look back 50 years and look at, say, recordings on the BBC, you'll see that sort of very received pronunciation voice, which is no longer used. We see a much more diversity of voices on the BBC. But across England, for example, or across Britain to be more broad, what we're seeing is the accents tending to become more similar. We're tending all kind of move towards the kind of London accent in many respects. One respect where that hasn't happened is in the flat and northern vowel. So the last and last is being maintained. But quite a lot of

Starting point is 00:33:09 other pronunciation features are moving towards London. And the other thing we might see changing in voice in the next 50 years is the influence. Is the influence. of people speaking English as a second language is going to increase. Now, we already have seen this in inner cities in places like London, where in places like Haringay, you've got lots of people who are speaking English as their additional language. So the Bangladeshi community speaks English, but with the Bangladeshi influence in it. And you see this kind of an amalgam accent developing in London,

Starting point is 00:33:39 where you've got people who might be British native English are picking up on some of these attributes coming from the second language. language speakers and in both directions. So we're getting a sort of a blended accent across the different communities within the inner cities. And I suspect those kind of influences to kind of carry on. But I guess the other place we're going to see big changes in AI. So we're going to see more and more voice communication. We're going to talk to computers a lot more. And it really changes our interactions with computers once we start talking to them. It gives them agency. So people are actually going, you know, turn you up and saying, Alexa, good morning. I love you.

Starting point is 00:34:17 kind of things and treating them like they have some kind of autonomy, which of course they don't really do. And voice, when you add voice to a computer interface, it immediately makes it seem like it has this autonomy. You couldn't imagine me, I'm starting to front of a keyboard here, typing, I love you into my Windows PC. I mean, that would just be really weird. So I think we're going to see a change really in our attitudes towards these devices happening because they will seem to have more humanity, even if artificial intelligence isn't really giving it that is just mimicking it. That's got to be good for the companies that are producing these things, right? If we start to give them agency, we'll start to, you know, value these machines

Starting point is 00:34:57 more, they'll become more important to us. Is that something that they're going to try and capitalize on, do you think? Oh, undoubtedly. I think so far if you take something like an Amazon Echo, essentially what it does is it takes your voice, turns it into text. And once it's in text, it's like a search engine, you know, you're typing text into a box, it reacts to it. But it's not really picking up on your tone of voice. So they're, they're madly looking to try and do things like pick up whether you sound frustrated and things like that. Because obviously they can get a better interaction with you. They're trying to pick up those signals that we're giving out about whether we think things are working or not. But I think we have to be quite careful about

Starting point is 00:35:35 unintended consequences. And we've seen lots of those already in AI where systems, which are supposed to do one thing, do another thing. Or data actually gives away secrets about you, that you wish you weren't giving away. And that's going to happen with voice. So as soon as they start trying to decode the voice, beyond the words, but actually the tone of voice, they'll start implicitly having information about you. And the thing about the voice is it's quite ambiguous.

Starting point is 00:36:00 Some of these signals I'm giving off to you and some of the assumptions you're making about me are based on sort of averages, but it may not be true for me. And so the voice is not a really clear signal of what's going on. And that's the risk with AI that they'll make lots of assumptions about who we are. are and what we kind of are feeling, whereas on an individual basis, it might not work.

Starting point is 00:36:21 So what kind of unintended consequences do you think could arise from that? Well, I think we could see problems of them making assumptions about who we are. So we've already seen kind of stuff. There was a very famous case in Target where they tried to work out if someone was pregnant from their shopping habits. And they sent a letter, they started sending coupons to a teenage girl who didn't know she was pregnant because her shopping habits has kind of changed. And the parents were very upset until later on they found out she was really pregnant.

Starting point is 00:36:53 So there's going to be those kind of, they're going to make assumptions about who you are from your voice and start sort of tailoring your experiences. And that won't necessarily be the experiences you want because the voice may be ambiguous. They may be picking up her in the queue that is wrong. So I think there's that kind of problem would be one of the things that they'll get wrong. And I think they really need to be talking to speech experts. So tech firms have fallen into traps in the past. And I remember a case of Flickr, where they started labelling up images.

Starting point is 00:37:24 And they labelled Auschwitz as being a jungle gym, which is obviously incredibly wrong and insensitive. And I don't think it's sufficient for AI firms to turn around and say, oh, well, it was just the data and it's difficult to tell. I think they should now be understanding from voice experts what you can really pick up. and they ought to be putting systems in place now to make sure the mistakes aren't made rather than, oh, we'll tidy this up once we've offended loads of people. Two, actually both of those, the target one and the Ashwitz ones, two very good examples of things going badly. Absolutely, and there's loads of them.

Starting point is 00:37:59 And there will be more in artificial intelligence, because machine learning, we make those mistakes. What's interesting is humans are making those mistakes all the time. But we have a conscious mind that then corrects them. A really good example of that is there's, It's been shown many times that people have subconscious biases like racism within them, but then you bring your subconscious mind to bear and correct that sort of kind of unwanted response. So we've all had that, haven't we? We sort of thought, why are we thinking that? That's really horrible.

Starting point is 00:38:30 But then we correct it. The problem with these machine learning algorithms, it's like the starting part of the brain. They're making these assumptions and putting stuff out there, but they don't have the bit that then makes the correction. And that's where they're falling short. Yeah, and there's been a bit in the news as well about how we're sort of transferring prejudices to AI and creating AI that has those same sort of inbuilt prejudices that the rest of us have. Have you seen that? Yes, not in voice so far. I mean, because voice is relatively young, but we've seen it in things like translation engine. So there's an example you can find in Google Translate where translating from Turkish into English. I'm afraid I don't know the Turkish words and I'm not going to pretend I can pronounce them. But they basically translate the phrase, I'm a doctor and I'm a nurse. And the pronoun in both cases is neutral, but Google translates, basically translates

Starting point is 00:39:21 into a man being a doctor and the female being the nurse. So you can see why machine learning does that because, unfortunately, there is still a bias that more men of doctors than the nurses. So on average, it will get it more right if it guesses, that pronoun is, male, but it's not, you know, it's reflecting societal prejudice. So we have this problem that we have that the data these algorithms are often trained on have cultural prejudices in them. And it's quite difficult, you know, when people rail against AI, it is reflecting culture, but we ought to be intelligent enough to work out how to deal with these prejudices during the training and therefore

Starting point is 00:40:00 not to replicate a sexist translation algorithm. So we've still got a bit of learning to do in that area as well. Yeah, and we see other problems as well. So, for, for example, in voice, when Google's voice search first came out, it worked better for males and females. It's a bit better than it is now, but there's still this sort of kind of general trend that the male voice recognition systems tend to work better with males than females.

Starting point is 00:40:27 I think they may have now corrected this. And you kind of think, why does that arise? Well, part of it might be bias in data, because there might be in their databases, their training, there might be more male voices. But part of the bias is probably because, the engineers actually teaching the systems are male. And they're probably not doing it consciously, but they're probably more concerned if it doesn't work for the male voice than the female voice.

Starting point is 00:40:47 I'm not suggesting they're doing this deliberately. It's just kind of natural subconscious bars we have as humans. So I think there's all those kind of things. So as well as dealing with databases which is skew, we also need to deal with the skew in engineering to get more females involved and more diversity in other ways, because I think that will ultimately solve the problem because the female engineers will say, hey, this isn't working with my voice. Why isn't it? Rather than the mail going, oh, this is good. Isn't this a neat bit of tech?

Starting point is 00:41:15 And presumably that same technique would hold for, you know, people with different accents, people from different places around the world, you know, second language speakers. If we got the input of data into the systems right, then maybe those things would all correct. Yeah, I mean, that's essentially what they've done over time with voice systems. So when Siri came out, there was quite a lot of videos uploaded showing it didn't work with heavy accents.

Starting point is 00:41:39 So Siri is the Apple iPhone voice assistant. And I remember seeing a video of people in Scotland trying to communicate with it and completely failing. And these systems tend to work best with sort of general American accents, because of course they're coming out of American companies. But over time, that can be corrected as we get more and more examples. But it is difficult. If you've got a voice difficulty,

Starting point is 00:42:04 and that might be just because you're speaking in a second, language so you're slightly less fluent or maybe you've got a difficulty like your stammerer, then you're going to struggle with these voice systems unless they're carefully trained. And I think the problem we have with technology is that that's driven by profit. So they're looking at what's the big numbers of people we need to meet. And so hence why, you know, we're getting it to work for American first because that's obviously where our primary first market is. But if you take something like stammering, it's quite common.

Starting point is 00:42:36 and is it sort of acceptable to have people with a stammer who ring up a phone system and then can't make their way through the menu system because the voice recognition system can't deal with the fact that their voice is slightly less fluent than the average person? That's a tough problem, especially when you think of it all being sort of directed by profits. Yeah, and this is where I think actually university research

Starting point is 00:43:02 really comes in because the universities who work in speech are not trying to compete with Google. They're trying to look at the areas that they're overlooking. So as part of writing a book, I went to Edinburgh University who do some great work on people with motor neurone disease and trying to get personalized voice for people with motor neuron disease. And you can imagine it's a horrible disease you get, but one of the things that often happens is that you lose your voice

Starting point is 00:43:30 because you lose the neurological control of the muscles. and they don't all want to sound like Stephen Hawking. They want to actually have a voice which vaguely resembles their original voice because the voice is part of your identity. And again, that's not something you can see technology firms making a big go-for. Maybe they should give them the profits they have, but it's not going to be a very profitable angle. And so you've got great universities doing great research there

Starting point is 00:43:54 to really help out a really important community. That was Trevor Cox talking about human conversation. His book, Now You're Talking, is a very important community. book, Now You're Talking, is available now. In the June issue of BBC Focus, which is on sale now, we find out whether we could travel through wormholes to distant galaxies, we also take a look at how science could solve the gender pay gap, we investigate the machines that you can control with your mind, and we interview Jane Goodall about her role as an eminent conservationist.

Starting point is 00:44:27 Did you enjoy this podcast? If you liked what you heard, then why not subscribe and leave us a review? You can find us on iTunes, Acast, Stitcher, and many. of your favourite podcast apps. Thank you for listening to the Science Focus podcast from the BBC Focus magazine team. We're the UK's best-selling science and technology monthly, available in print and in several digital formats throughout the world. Find out more at ScienceFocus.com or look out for us in your app store.

Starting point is 00:45:05 This podcast is sponsored by Name, Audio and Focal. The texture and emotional depth of music can be lost through digital sources or poor signal. Name Audio believes you can have digital precision with analog warmth. Alongside French acoustic specialist focal, Name creates high-end audio systems, combining innovation with craftsmanship, so you can listen to music, just as the artist intended. Discover more at name audio.com. Ambition comes in all shapes and sizes.

Starting point is 00:45:39 At First Citizens Bank, we roll with your goals, because we're built for what you're building, Fit for your ambition for Citizens Bank.

Instant Genius - To become Prime Minister, change your voice

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.