Today in Digital Marketing - Arm Me with Harmony: How Changing Your AI's Ad Voice Can Improve Sales

Episode Date: May 3, 2024

Is it possible that the perceived gender of the AI voice you choose for your TikTok ads could be one of the most important factors in whether you’ll get a sale? Or, if you market a charity, whether ...you should use one of those "trembling" voices to evoke empathy?In this special deep-dive episode, Tod speaks with Fotis Efthymiou, co-author of two papers on how changing AI voices in video ads can affect sales and donations:The Power of AI-Generated Voices: How Digital Vocal Tract Length Shapes Product Congruency and Ad PerformanceEmpathy by Design: The Influence of Trembling AI Voices on Prosocial BehaviorFotis's LinkedIn📰 Get our free daily newsletter📈 Advertising: Reach Thousands of Marketing Decision-Makers🌍 Follow us on social media or contact usLinks to all of today’s stories hereGO PREMIUM!Get these exclusive benefits when you upgrade:✅ Listen ad-free✅ Back catalog of 20+ marketing science interviews✅ Get the show earlier than the free version✅ “Skip to story” audio chapters✅ Member-only monthly livestreams with TodAnd a lot more! Check it out: todayindigital.com/premium✨ Already Premium? Update Credit Card • CancelMORE🆘 Need help with your social media? Check us out: engageQ digital📞 Need marketing advice? Leave us a voicemail and we’ll get an expert to help you free!🤝 Our Slack⭐ Review usUPGRADE YOUR SKILLSInside Google Ads with Jyll Saskin GalesGoogle Ads for Beginners with Jyll Saskin GalesFoxwell Slack Group and CoursesSome links in these show notes may provide affiliate revenue to us.Today in Digital Marketing is hosted by Tod Maffin and produced by engageQ digital on the traditional territories of the Snuneymuxw First Nation on Vancouver Island, Canada.Our Sponsors:* Check out Kinsta: https://kinsta.comPrivacy & Opt-Out: https://redcircle.com/privacy

Transcript
Discussion (0)
Starting point is 00:00:00 It's the season for new styles, and you love to shop for jackets and boots. So when you do, always make sure you get cash back from Rakuten. And it's not just clothing and shoes. You can get cash back from over 750 stores on electronics, holiday travel, home decor, and more. It's super easy. And before you buy anything, always go to Rakuten first. Join free at Rakuten.ca. Start shopping and get your cash back sent to you by check or PayPal.
Starting point is 00:00:28 Get the Rakuten app or join at Rakuten.ca. R-A-K-U-T-E-N dot C-A. It is Friday, May 3rd. Today, a special deep dive marketing science episode. You don't have to spend long on TikTok, or really any video platform these days, to run into videos selling things using an AI-generated voice. On TikTok, there are a few commonly used voices, but I'd guess that the marketers making these ads don't spend a lot of time thinking about what voice to use. But is it possible that the perceived gender
Starting point is 00:01:06 of the AI voice you choose could be one of the most important factors in whether you'll get a sale? Or if you market a charity, whether you should use one of those trembling voices to evoke empathy, if that even works. All of that is what Fotis Eftimil set out to find out. He is a research associate at the
Starting point is 00:01:26 University of St. Gallen in Switzerland, and he joins me there now. Hello. Hi, Don. Thank you for having me. Not at all. So we're going to be coming back, I think, to a term used in your paper quite frequently. So I want to get that defined first. You refer to the differences in AI voices as the vocal tract length. What are we actually talking about here? Yeah, that's a really good and common question that I often get. So by vocal tract length, we are referring to the anatomical structure between the vocal folds and the lips. And many people tend to confuse it with pitch. But just to clear this already from the beginning, when we speak, the vocal tract length is filtering the voice and a longer vocal tract length
Starting point is 00:02:13 is actually promoting more the lower frequency of the voice. So you end up having a more deep voice and a shorter vocal tract length promoting a more high shrill voice. So even though pitch is one frequency that we have, our voice consists of many different frequencies. And that's what the vocal tract is doing, is filtering all the other frequencies except the pitch. Gotcha.
Starting point is 00:02:39 All right, let's play some examples of this in an AI context so that people know what we're talking about. This is the baseline default voice. Especially reasonable traveling Shisan. This is the one tweaked to masculine. Especially reasonable traveling Shisan. And this is the one tweaked to feminine. Especially reasonable traveling Shisan. Very subtle. Although to me, that last one sounds still sounds masculine, but maybe maybe effeminate or something, but they're very subtle differences. Exactly. And I think we should make clear that it's a continuum of masculinity, femininity within the same biological sex, if you could may say for an AI agent.
Starting point is 00:03:20 But we use a male voice and we manipulate this one vocal parameter, the vocal tract length, in the same male voice. So we mean gender as a continuum and masculinity, femininity as a continuum within the male sex. Okay, those two definitions out of the way. What did you set out to study? Our main research question was how, if we change this one distinct parameter in the voice of an AI how can we influence the human perceptions and also what practical implications this could have and in the beginning we wanted to see what mental images what people envision when they hear this voice so
Starting point is 00:04:01 we ran a study and we see that people perceive the longer vocal tract length voices, let's say the more deep voice, as an entity, they envision an entity behind the AI that is a more big, that has a higher physicality. And by physicality, we mean the weight and the height. So we ask people how do you envision this entity in terms of height and weight and they have to choose then we also ask them about the perceived masculinity of the of the ai agent and again they tend to perceive
Starting point is 00:04:36 more masculine the entities with a deeper voice and more feminine the entity with the more shrill more high voice compared to the baseline condition and then we asked him can you match the voice with certain products we asked in the beginning for some food products we also went into the cars we asked about car products and we saw that people tend to match more the deeper voices with a more masculine product. For example, we use the beef burger as the masculine food product and an example of a masculine food product and a vegan burger as an example of the feminine food product. And we saw that this matching between the voice and the product has also downstream
Starting point is 00:05:24 consequences in advertising. So we ran a YouTube study and we actually saw that people tend to click more on the link when they have a matching between the voice and the product that is being advertised. And also this increased the overall advertising effectiveness of the study. We ran the study with cost per thousand impression and we see these costs are calculated by Google Ads. And we saw that the advertisement campaign when there was matching between the voice and the product was cheaper and had a higher click-through rate. Sure, and likely cheaper because of the higher click-through rate, I would imagine. The ad quality is better.
Starting point is 00:06:10 So does this, I mean, this sort of confirms what my gut would have told me ahead of time, right? That in the other, another study that you did was around cars, you know, so for the male voice, which is, I'll just use that terminology, for the male voice, you paired it with trucks. And for the female voice, you paired it with a smart vehicle and a couple of others. It sounds like you're confirming what I think most of us assumed, that use a male voice for male products, a female AI voice for female products. Right. I would say there are two differences that they are a bit crucial in our research in comparison to other research that are in advertising and masculine, feminine products and all this gender discussion.
Starting point is 00:06:52 And the difference is that it's always a male voice. It's just more on the feminine side or more on the masculine side. And at the same time, an ai voice people can can realize that this is not a human voice but still they tend to make these associations and these are the two critical components that i believe differentiate our research compared to the uh to other research where they take a male voice and a female voice because a male and a female voice is not only different in terms of frequency and vocal track length. It has a bunch of other vocal features that are different. But in our specific voice, there is one vocal parameter that is changing.
Starting point is 00:07:37 And this is what's driving the effect. I see. So you were testing slightly different pitches of the male AI voice. Yes. And again, it's not exactly the pitch. It's this formant frequency that I don't want to go too much into technicality because the pitch will actually remain the same. So we changed the other frequencies of the voice. A voice consists of many different frequencies. The pitch was the same.
Starting point is 00:08:03 The speech rate was the same. The loudness was the same the speech rate was the same the loudness was the same what changed was these other frequencies in the voice that make that give this timber and this color of voice that is more deep and expresses this physical this enhanced physicality when the the vocal tract is longer and this more expressed femininity in the voice when the vocal tract is shorter. Right. I'm just going to play those two again, just in case people sort of lost track of it. Here's number one. Especially reasonable traveling Shisan.
Starting point is 00:08:32 And number two. Especially reasonable traveling Shisan. I wonder sort of what other effects. I mean, certainly you measured CPM, you measured click through. Did you measure or do you have a sense of how this might affect some of the other levers that marketers like to pull? Brand perception, purchase likelihood, things like that. I think it can position a product or a brand into a specific category, let's say again. And not only this voice, I would like to also expand it to other voices.
Starting point is 00:09:01 We know that products and brands have certain personalities and when they are being advertised or generally they're being promoted with a certain persona behind, they also tend to map their personalities into the voice characteristics. And that's what we also try to do. We found from prior literature,
Starting point is 00:09:22 these all stereotypically masculine food products. And we also pre-test some other masculine cars versus feminine cars. And we try to map the characteristics of the product into the characteristics of the voice. And even though we used only products in our research, I'm pretty sure that this can apply also to brands. If you want to position your brand as a more, in our case, as a more masculine voice, use a longer vocal tract length. Or you can also adjust it to the audience, the target audience that you have. Another example could be, and I mean, that's off the top of my head, that maybe you want to target let's say more elderly people then make the voice a bit slower or you want to promote some exciting activities make the voice more exciting something
Starting point is 00:10:12 about the environment i would say make the voice more warm and and that's that's the whole idea behind the this line of research that we need to take into consideration the specific vocal features and we have to uncover the power that the voice has because the voice is a rich source of information. The moment we met, probably you made some implicit associations about my personality or about my physicality, even though we've never met. And I think that's normal and people are hardwired to do those kinds of associations. As soon as we met, I immediately thought that you were a patient person because we spent 10 minutes trying to fix my microphone.
Starting point is 00:10:53 So that was my first impression. Your paper was done, your study rather, was done in English, using English phrases and words for testing. Do you think we'd see the same effect in other languages and other cultures? I wouldn't be surprised if we could find cultural differences because there are different communication norms that exist between cultures. However, I would say that maybe the part where the vocal tract length links to people envisioning more uh bigger entities like uh physically
Starting point is 00:11:29 enhanced i would say that this part i would expect personally to be the same because there there's some evolutionary evolutionary basis there with people we also animals and humans know when to expect the danger and they when they hear like a roar of a lion, we are hardwired to know that this is something big coming. This is something dangerous coming. And I would expect, maybe I would expect the second part when we go from physicality to perceived masculinity to change across cultures.
Starting point is 00:12:02 But yeah, this is what I would would expect but we didn't run it across cultures right and i know it's different my wife is a scientist so i know that asking scientists for their gut feel is sometimes a little foreign because you you prefer to use proven experiments over gut feel and i think the world is a better place overall for it when we first stated my my wife had gone out to my my then girlfriend had gone out to a movie and I asked her whether she liked the movie. And she said, by what criteria? So, yeah, I knew I was dating someone specific. That's right. Yeah, exactly.
Starting point is 00:12:34 You did another paper recently that tested a trembling voice as the narrator for a charity ad. And just so that people know what I'm talking about here, this is a regular read. The reality is that a hungry and sick child need you right now. And this is the trembling version. The reality is that a hungry and sick child need you right now. And this is for a charity. And that sounds like that sounds like one of those SPCA type, you know, Sarah McLachlan sad music kind of things. It was another YouTube test. You measured click-throughs. You had four versions, one with no trembling, one where the tremble was only in the information part of the ad,
Starting point is 00:13:11 one where the tremble was only in the final pitch, what we might call the call to action, and one where the trembling voice was present throughout the entire ad. Which one did best? When the trembling was either partially or the whole message was with a trembling voice, the advertising effectiveness was again higher, better compared to the baseline. Again, lower costs, higher click-through rates.
Starting point is 00:13:40 And that shows that because we also took a dive on the mechanism, on why is this happening. And we saw that people, when they listen to this trembling voice, they tend to perceive these entities as more vulnerable. And that tends to increase their empathic concern. And that also, where we saw in another study, not in the advertising study, not one before, that also increased their willingness to donate in this charity organization. We have to also mention that the message that is communicated has a negative valence, so that matches also the emotionality of the message,
Starting point is 00:14:18 matches the content. And that was actually a very surprising result to see that it not only changes these perceptions where you have the intuitive feeling that, okay, this will change people's perceptions about the AI entity, but it actually has an effect on people's emotional state. The emotional state of the person is changing
Starting point is 00:14:41 and that makes them click on the link. It's so, I hope you won't take offense to this but it's so manipulative isn't it like that the voice yeah it's such a you know it's uh and in a way it almost you know when i heard those first um when you sent them into us i thought there's no way people would fall for this and it is and actually the i make a bigger point in my paper about it and what i mentioned is that imagine that we know from already uh from years ago research happened years ago that people tend to treat different devices and robots and computers as social actors we know that but it was also, I agree,
Starting point is 00:15:25 like just by adding a bit of trembling in the AI voice, you just get more of the attention and elicit some emotions. That's indeed surprising. And there is an ethical aspect that I mentioned in the paper that imagine having an AI working, that can work 24 seven,
Starting point is 00:15:42 can communicate in every language and also uses this, as an example, this trembling voice. This is something that is a bit concerning and needs some type of new regulations. And it's not only with a trembling voice. Now we have all these cloned voices where people can make a clone voice out of a few clips of someone's voice. Now we have all these cloned voices where people can, can make a clone voice out of a few clips of someone's voice. So I think there is, there should be, uh, some extra attention to, to what the, what the voice is
Starting point is 00:16:18 capable of and what the, all these speech synthesis algorithms of today are capable of. And we should really pay attention to create some new regulations about the voice use. We should be transparent, in my opinion. When something like that is being used, there should be explained why it's being used like that. Because in my opinion, it would be okay to raise some funds for a pro-social event, but it wouldn't be okay to do some social manipulation via uh uh yeah via this voice and you know i mean of course that's that's the direction that this will probably go uh you know you could expect to see political ads where if the database tells them that that particular
Starting point is 00:16:57 potential voter responds well emotionally to happy voices then the candidate's voice would be tweaked happy if they would react to someone who sounds fearful, they might add a tremble to it. It's, you know, both exciting as marketers and terrifying as people. It could also have other applications, you know, like, I don't know, you call in a customer support center and then you have to encounter this trembling voice. Probably you won't do all the complaints you have. Probably you will be more mild towards this vulnerable person. And I think it should be given a bit more attention to the specific features that are communicated out there.
Starting point is 00:17:34 You had co-authors on your papers. Who were they? In this, the first, the vocal tract length paper, I have three co-authors, my supervisor, Christian Hildebrand, my co-supervisor, Emmanuel Debelis, and former postdoc at my lab currently working in the industry, William Hampton. And in the second paper, it's only me and my supervisor. And was my production coordinator right in saying, in briefing me that you just received your PhD? Yeah, a few months ago, I received my PhD. And actually, at the moment,
Starting point is 00:18:06 I'm looking for some research and behavioral science and marketing positions in industry. I decided to leave academia and I'm looking for my new role in industry this time. How can people reach you if they wanted to? They can reach me through my LinkedIn. It's 40safetymew if they manage to spell correctly this Greek surname. And I would be very happy if they read my emails and my papers. And in my papers, there is my email. So I would be
Starting point is 00:18:41 happy to receive also questions, comments, or any other feedback that people have regarding my papers. We will put your LinkedIn and a link to the papers in our show notes. Dr. Eftimil, thank you so much for your time. Thank you very much. Fotis Eftimil is the co-author of two papers. The first one is published in the Journal of Interactive Marketing. It's called The Power of AI Generated Voices, How Digital Voice Track Length Shapes Product Congruency and Ad Performance. The second one in the IEEE Transactions on Effective Computing Journal,
Starting point is 00:19:14 that one is called Empathy by Design, The Influence of Trembling AI Voices on Pro-Social Behavior. All right, that'll do it for the week. Don't forget to follow us on social media. You can find all of our links at todayindigital.com slash social, or you can tap the link at the top of the show notes. Today in Digital Marketing is produced by EngageQ Digital
Starting point is 00:19:34 on the traditional territories of the Tsunamik First Nation on Vancouver Island. Our production coordinator is Sarah Guild. Our theme is by Mark Blevis, ad coordination by Red Circle. I'm Todd Maffin. Have a restful weekend and I will see you on Monday.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.