Daybreak - India's AI voice agents can detect your stress, catch your bluffs, and never have a bad day

Starting point is 00:00:00 We're all familiar with this, right? You get a call from an unknown number, you pick it up by accident. A desperately bubbly voice immediately starts selling you, say, a credit card. You hang up in less than three seconds. Everyone has had this experience and everyone finds it annoying. But because it's a process that still works and is pretty lucrative for banks, it continues. Except over the last year or so, there is a new voice at the end of the line. It's patient. It doesn't mind whether you cut the call in three seconds or ten.

Starting point is 00:00:35 When it answers any questions you may have, it's incredibly polite. It's always on script, never late, and it never has a bad day. If you hadn't guessed already, that voice is not human. It's an AI voice agent. And interestingly, in India, any bank, institution or company using them is actually under no obligation to tell you this. These AI voice agents are already live across several Indian banks, e-commerce and healthcare platforms. Think CRED, Practo, Flipcut and even snap it. It's a real moment because the startups building them have raised more than 280 crore rupees.

Starting point is 00:01:16 But as the Kensmron Michael Carney found while she reported on the story over weeks, the technology is more complicated than just a nice sounding robot voice. For example, the AI can detect stress in your voice, though it still can't reliably tell the difference between a stressed Maharashtrian or a stressed Tamilian. If you bluff about your salary, it can catch you because it has access to your account data. And no matter how good it sounds, it seems like customers can almost always tell that it isn't human. Funnily enough, it's the perfection that gives it away. It's an exciting proposition for any business that relies on inbound or online. outbound calls. But deploying them in India is no mean feat. Compliance issues, languages and dialects

Starting point is 00:02:02 and questions of customer trust mean that this new booming sector is navigating several interesting grey areas. So what sparked the sudden funding spree in the space and how is this new tech being adopted in India? Runma joined my co-host Snicktha in the studio to discuss the rise of AI voice agents and what it means for India's BPO industry. Thank you so much, Manmay, for joining us on daybreak. So, you know, there's an argument that voice AI actually may be more honest than a human sales caller, because, you know, there's no pressure, it also does not have a bad day, and also doesn't go off script, right?

Starting point is 00:03:02 At any point in your reporting, did you feel that maybe Ananya, who's the AI voice caller in your story, may be a better option? Hi. So, I did get to talk to Ananya for my story and I got to talk to a couple of other voiceizations also. And you're right in the fact that they don't really get tired. They never get cranky. They never sound different when they've had two cups of coffee as opposed to, you know, very early in the morning. What I do feel though is that that doesn't, that may make them more effective in terms of purely doing the job. it doesn't necessarily make it better in terms of user experience. See, when I speak to customers, people who maybe have an account at the bank

Starting point is 00:03:46 or people who are trying to book an appointment through PRACTO, for example, these people are used to instinctively interacting with a human being. And yes, for certain tasks, like maybe getting their insurance approved or doing some minor checks, they're okay with WISIA agents being there. but when somebody is selling you something, they would really prefer to have a human on the other end of the line. And they can tell, by the way,

Starting point is 00:04:13 that it's not human. Like, no matter how realistic it sounds, I think the perfection of a WISAI agent is really what gives it away because they're not mad enough at you, basically. That's so interesting that you say that, you know, because sometimes when I am recording myself, right, on daybreak, I find myself,

Starting point is 00:04:31 there are these small little things, maybe a breath take in at the wrong place. And earlier, I think I would find myself editing it out. But now I'm like, no, I don't want to sound like an AI voice. So, yeah, that makes a lot of sense. One of the first things that you mentioned in your story that really stands out is how the amount of investment into these voice AI startups has gone up so drastically, right? You said seven crores in 2023 to over nearly 300 crores now. What is there a specific kind of trigger that caused this kind of insane amount of investment?

Starting point is 00:05:11 I would say that most of it has been triggered by how technology has evolved. Voice was always sort of supposed to be the next step for AI agents after text. And right now we are kind of at an inflection point in this sector because latency has improved. and speech quality has become decent. So it's actually getting to be realistic to imagine that a little AI robot it can handle your inbound and outbound calling. Maybe you add a little supervision to it.

Starting point is 00:05:43 Eventually you can just check that off your mind and the AI can handle all your calling systems for you. So I think the main aspect is the technology and it's also that a lot of really major companies, for example, Google or Invadia, are themselves shifting a lot of resources into voice. It makes sense. And India, as we know, is a country with a lot of languages,

Starting point is 00:06:08 with a lot of dialects. Maybe a chatbot which just speaks Hindi and English wouldn't be able to reach as many people as a voice AI agent, which can speak a lot of dialects would be able to. So that's kind of why the funding, I believe, has increased in the sector. You know, when my speaking of languages, in some way India seems like both, the perfect market and the worst market for this because we have 22 official languages,

Starting point is 00:06:36 so many thousands of dialects and also one of the strictest telecom regulations in the world, right? How much of the pitch that these startups are making to investors actually holding up when there is a stress test happening? Okay. So that's a pretty interesting question. And in my story, in my research, I tried to look at it from a, banks perspective, which was hiring these voice AI companies rather than from a startup perspective.

Starting point is 00:07:04 Because all the startups told me was that, yeah, we can deploy languages. Maybe one day we learn to speak Tulu and let's just go ahead with it. I did speak to someone from Kotak Mahindra. And one of the interesting things they said was that their first question when they deploy a voice AI agent is, can this thing get us into trouble? So anything that can talk to customers has to follow RBI and try roles. and they have to strict to approve scripts and they have to record and log every call.

Starting point is 00:07:34 And more importantly, the YZI agents have to have a provision which lets a human step in instantly if something feels off. So the hands-off future, which we do imagine, you know, it's not coming in any time soon. So if a WISAI tool can't survive a compliance review or an internal audit,

Starting point is 00:07:51 the quality of the demo, it becomes like pretty irrelevant. And which is why inside a bank, at first, WISAI is evaluated as a control and governance layer. It's not evaluated as a shiny new toy or a shortcut. I think, yes, we do have an advantage in the fact that we have a lot of dialects and our people and people in the country would respond to maybe a voice request better than they would to a text. But like you said, we do have strict regulations. And when companies hire these voice AI agents, that's one of the first things they check. Also, Muramai, another interesting thing is the emotion bid, right? Apparently, some of these

Starting point is 00:08:27 most more advanced voice AI systems claim to detect customer emotions in real time, whether it's frustration, hesitation, confusion, and then they adjust their tone accordingly. How close is that actually to, in reality, in what Indian startups are deploying today? And did it raise any concerns for you when you were reporting on this? That's actually pretty, it's one of the more interesting aspects I found because that was the first time I realized that, This is exactly what my voice sounds like when it's stressed. This is exactly what I sound like when I'm a little bit sad. Because, again, there is nobody who would sit in front of me and tell me you sound stressed right now.

Starting point is 00:09:09 But a voice AI agent catches that and it kind of modifies it. I think that's more in practice for loan recovery voice AI agents because they are the ones who have to chase up people for their EMIs. And if someone says something like, oh, no, I can't because, you know, I didn't get my salary this month. or I didn't get, I have to pay double rent this month or something like that. The AI agent is technically supposed to modify its tone and everything accordingly. What I did find interesting was that it's not just looking out for your voice inflections. For example, in the same case that I mentioned, in case of a loan recovery agent, because it's being deployed by a bank, the agent also has access to data about whether

Starting point is 00:09:50 salary has really been credited in your account. So yes, it can catch whether you're stressed out in your voice, but it can also tell when you're bluffing because it has data. So it's like if you're in an exam and you're caught cheating. The invigilator can be a little bit sympathetic that you were stressed, but you were caught cheating anyway. So I do, I don't know if I find it concerning that an AI agent can tell whether you're stressed out or not and it can catch voice inflections.

Starting point is 00:10:18 Primarily because it's still kind of a very new, very developing technology. People express stress in very different ways. And another interesting aspect I found in this one was when I was speaking to someone who had worked in call centers for ages and he was telling me that people culturally express stress through their voice in very, very different ways. A person from Maharashtra would sound different. A person from a certain town would sound different.

Starting point is 00:10:42 A person from a different income bracket would sound different. So if a voice AI can actually encapsulate that through a very, very diverse range of people, I'd say that's great tech. I just don't know if it's here yet. Bryn Mawai, in your story, you also quoted a former executive at a major business process firm saying something that was quite blunt that voice AI does not actually get its own budget. It has to displace something that already exists, right? How much of the optimism in this market is actually built on that inconvenient truth being ignored?

Starting point is 00:11:17 To be completely fair to everybody who's funding voice AI, I don't quite think they're ignoring the truth. One of the things I sort of learned when I started reporting this story, it's amazing how every different story gives you a perspective in very different people, is that when a VC firm or any, or an angel investor is funding something, it's not because they expect the company to last forever and ever. They do expect the company to do very well in a short period of time. And after that, maybe the company gets acquired, maybe it merges, maybe it becomes a feature in a bigger company's process.

Starting point is 00:11:53 and, you know, that's completely fine too. And the company makes money, the VC would make money, that would be a thing. And as for what the WNS executive was saying in my story, I have found that sort of skepticism to be common across people who are hiring voice AI agents as well. Like, for example, the bank which hires Ananya, which has deployed Ananya in its system. my friend who works with startup just deploy such voice AI agents

Starting point is 00:12:28 she introduced me to her boss and her boss pretty much told me that the rule they follow is differentiate or suffocate meaning that your voice AI agent has to either provide you a very different service maybe it has to be a very efficient voice AI agent maybe it has to require minimum amount of supervision

Starting point is 00:12:47 maybe it has to you know do a lot of auditing processes along with, you know, just calling customers and stuff. But this isn't something which people are blindly optimistic about. They are actually taking it with a pinch of salt and they are really evaluating all the options they have before them. Got it.

Starting point is 00:13:06 Also, you know, startups talk a lot about scalability, right? But in your reporting, you describe something that is very different because, you know, there's so much of pilot going on. There's custom scripting. You know, you're constantly tuning the model. then of course like you mentioned there's so many compliances to meet at what point does that stop being like a software business and become you know a services business

Starting point is 00:13:31 it's interesting that you would call it a services business because I remember in the edit meet when I had pitched this story one of the feedbacks I got from people when I was expressing a little bit of skepticism about startups was that why is this even a startup why is this not just a feature in an existing company already And like you said, the process of deployment is complicated. It's not just, you know, you join and suddenly your functions are taken over. Like I spoke to Grey Labs, a voice AI startup, and one of their founders told me that they start with a pilot.

Starting point is 00:14:05 Okay. You take an enterprise's existing call recordings. You train an AI agent on those and you deploy that in a controllable way. Once they are comfortable with how the bot sounds and behaves, they run like a slightly larger experiment with maybe a thousand customers. customers. And that to get to that level for the company to be comfortable with how a bot sounds is a process in itself because I think I've mentioned this point in my story, bots have to sound different for different types of businesses. A bot which is maybe working with banking customers has to sound different. A bot working with e-commerce has to be more familiar with regional dialects,

Starting point is 00:14:39 more familiar with processes of returning a product. And maybe a bot working in, say, a medical field has to be more familiar with practices of confidentiality, with not asking maybe weird questions like, hey, where's your rash? So, once a board does that, once it's approaching a thousand customers, at that stage, the agent is generating leads at roughly the same level as human callers, and then they deploy a voice AI solution. So, yes, it is quite an extensive process that actually goes in there.

Starting point is 00:15:13 So also, Muran, in your story, it's very clear that, seems there seems to be an over-crowding problem across this space, right? A lot of these startups, they seem to be solving nearly the same problems for the same type of clients. What generally separates the ones which are likely to survive from the ones that are just burning away investor money on the same pitch? So, you're right when you say that there is an overcrowding problem in this sector. I would make the argument that whenever a new technology does come up, there is always a phase

Starting point is 00:15:45 where there is an overcrowding problem in a sector. I mean, that is exactly what happened to grocery startups. And I think this is an example I have probably mentioned in the story right now, that this is what happened when the idea of a cloud came up. There were a lot of companies bringing in tech, and only a few of them are existing right now. Most of them are consolidated under big names. So consolidation, I think, is the future.

Starting point is 00:16:12 And interestingly, even though YZI companies, WISI startups right now are saying that you know, we have this level of technology, we have that level of technology, we have suddenly handled about more than 200 million calls in the past year and this is how we're going to go forward. I think what differentiates a company,

Starting point is 00:16:33 I'm not even saying a startup at this time, what differentiates a company is the amount of distribution that they have. So, and that distribution right now does lie with call centers, with BPMs. So eventually I believe there's going to be a point where the best startups, the ones with the most adaptable technology, the ones with calling adjacent technologies like handling workflows, those startups may survive on their own a little bit.

Starting point is 00:16:59 Maybe 10 or 15 of them would maybe try something new, five of them would scale it. But most of the other voice calling technologies would get absorbed by BPMs, would get absorbed by large-scale enterprise players. and that's how it would function. So, you know, in that case, obviously you spoke to a lot of these YSI startup founders, right?

Starting point is 00:17:23 When they're raising Series A in this space, what is their end goal? Is it actually to maybe have an IPO at some point or just get acquired? I did try asking that question, both directly, both indirectly. And in both circumstances, I was met with the statement of, bro, we just raised series A.

Starting point is 00:17:45 We've just come up with the technology. We have just started our journey. We have no idea where exactly we're going to end up. But what I can say, though, is that most of these startups are built on really good tech. And they are really, really enthusiastic about the work they do. And objectively speaking, these are good companies. I think obviously they are like the market dynamic of a lot of companies. space is not something that one startup can control.

Starting point is 00:18:14 And eventually a startup would do whatever is most profitable for them right now, whether it's being acquired, whether it is actually diversifying into a new technology so that you become a major player on your own. That's really up to them. But for now, they are beginning and they are very excited about it. Speaking about excitement, you know, there was this Swedish fintech company that you spoke about in your story that was also very excited about voice AI. its voice AI assistant and it announced,

Starting point is 00:18:44 and this made headlines, that its AI assistant was doing the work of 700 customer service agents. And then very quietly they went and started hiring humans again because they just realized that kind of quality was not there, right? So what's happening in India? Are we learning from these experiments that have already happened in the West? Or do you see us repeating the same mistakes again? Yeah, that's a pretty.

Starting point is 00:19:10 interesting story. It's about the company Klarna. I think it was a few years ago when they came up with their own voice AI agent and they suddenly decided they don't need their people anymore. They fired a lot of people and deployed voice AI. And obviously they had to backtrack after that. But I do see a lot of very healthy level of skepticism across, you know, adapters in Indian enterprises. And As one of my startup founder friends told me, nobody is firing entire floors of call center agents right now. They don't see them getting fired in the future either. There are functions which are shifting predominantly towards voice.

Starting point is 00:19:55 I can tell you that. Like maybe credit card sales, for example, is something where you do have a very fixed script. You don't need a very particular read on the customer's situation. And again, this is a volumes game. You call a lot of people. You get a chance of converting a lot of leads. That is somewhere a YGI agent would function very well.

Starting point is 00:20:14 So I can see some functions shifting away from human roles to maybe AI rules. But yes, we aren't repeating the same mistakes. And I don't think we'd ever be in a situation where we fire 700 people. Then we have to bring them back with a very nice sorry. We don't have to do that. Right. That sounds very hopeful. Thank you for that ray of sunshine in these bleak times of AI,

Starting point is 00:20:40 where I'm thinking about, am I going to lose my job because of voice AI? But, okay, on a serious note, in many countries, there are very strict legal requirements that require voice AI agents to identify themselves to the caller, right? We don't have that rule in India yet.

Starting point is 00:20:59 Did you encounter any conversation in your reporting about whether customers even have the right to know or who or what they're speaking to? Okay, we do have an interesting predicate in India where there are obviously rules from the government saying that AI generated content has to be marked clearly as AI generated content. And one can make the argument. Like I did speak to a lot of lawyers and IT experts for the story, for a bunch of my stories as well,

Starting point is 00:21:33 who make the argument that technically this isn't AI generated. This is an AI board speaking in a script which you have generated for it. So there is a little bit of a gray area as to whether a voice AI agent has to say that it's a voice AI agent. For example, when Ananya is talking to you and she's asking you, have you thought about getting a credit card? She hasn't generated that sentence. Like, we are telling her to ask that.

Starting point is 00:21:59 It's essentially a company replacing a human talking to you on the phone with a bot talking to you on the phone. Then again, do I feel like ethically people should know that they're talking to a bot? Yes, obviously. Like, the more information you have, the better. And there are, like, people who are obviously very good at also telling, like, you're talking to any agent, like I explained before.

Starting point is 00:22:21 But we do exist in quite a few gray areas in this country. And I do think we are coming up with new AI regulations, with AI rules to get there. But it'll take a bit. All right. On that note, thank you so much, Manmay, for joining us. Looking forward to your next story. Thank you so much for having me and no, I don't think an AI agent could ever replace you.

Daybreak - India's AI voice agents can detect your stress, catch your bluffs, and never have a bad day

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.