In The Arena by TechArena - Scaling Multilingual GenAI with Ishween Kaur

Starting point is 00:00:00 Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome in the arena. My name's Allison Klein, and I am so excited to have Ischween Car with us. Eshwin has an incredible background in the tech sector and is bringing her experience to Tech Arena. She currently serves as generative AI lead at Salesforce. Yeshwin, welcome to the show. Thank you. Thank you, Alison. It's a pleasure to be here. Ishwin, why don't we just start? I know that you work at Salesforce.

Starting point is 00:00:42 Why don't you tell us a little bit more broadly about your background and tech and what brings you to this moment? Definitely. So as you have already mentioned, my name, I'm Ishwin Corr, and I work on AI platform at Salesforce. I work on the engineering site to build those systems that actually help enterprise build multi-Ei agents on Salesforce. platform. And one key thing is that making it globally available, so making sure that AI works fairly for everyone and not just English speakers. Because also I believe truly that technology should respect how people actually communicate coming myself from India and speaking different languages. It's been quite near to myself also. And that is what brings me to Salesforce and AI. Yeah, this is such an exciting charter that you've got. I love talking to you about this topic when we met before the

Starting point is 00:01:36 show, and I've been thinking about it. One thing that I think about is just like the path-finding innovation that that requires and really thinking through from a user perspective what the next generation of development really needs to entail specifically around language. Why is it so critical for innovation and your experience in a landscape like yours needs to have the core thinking that I just described and what other qualities is required for that innovation to thrive? Sure, that's a very deep question, Alison. So I'd say that Salesforce is a CRM landscape

Starting point is 00:02:16 and really focused on enterprises that want to build that deep, meaningful relationship with their customers. So for them, their customers are first, and for us, these are our customer-first companies. And what we are doing now is taking them that one step closer through personalization, that intelligent support. That's like instant and always available for them. So we are making these enterprise AI first by making AI directly into their platform.

Starting point is 00:02:46 So this ensures that customer-first approach while keeping trust as our top value, which is Salesforce topmost value. And to be honest, I'd say that AI has raised the bar. And customers now expect that instant, personalized and intelligent, which is incredible to observe, but in my true honest perspective, innovation isn't just about speed. And we have seen that, that it's also about balancing that speed with the trust and compliance.

Starting point is 00:03:15 So the companies went after shipping fast, but we need to ship fast reliably. And that's a real challenge now. And I believe that's what scale enterprises to make systems reliable. And that is going to come at the cost of speed. You talk about a really interesting interplay there. Compliance being something that's incredibly important for a lot of companies, the speed in which we are all being trained to do and then trust with the valued solution. And I think that we all know from AI that things can go wrong.

Starting point is 00:03:51 When you introduce multilingual and the nuance of language into that scenario, how do you ensure that that trust is maintained? So when we bring our big multilingual to actually have it globally available to our customers, what we want is to, like I mentioned, ship fast, but ship fast reliably, and that is what we want to strive for. And what I advocate is that, and it is from the hard-earned lessons because we are the early adopters of AI, that customer notice it immediately when you don't speak their language, when you don't speak the cultural nuances of their language, because idiom in English might sound offensive in Japanese. And today I actually bought a research with me to actually mention about that 5 billion people speak languages other than the English as a primary. language. And that's according to the

Starting point is 00:04:51 CSA research. So 76% of consumers prefer to interact in their native language. And yet only 25% of internet users are native English speakers. Isn't that incredibly not aligned with what we are baking? And I think that, you know, one of the things

Starting point is 00:05:09 that I think about is that we are biasing towards English to the point that it's really holding certain parts of the world back in terms of their full use of AI, just from a standpoint of what AI al-alms are trained on, right? And so how do you see organizations like Salesforce bringing this to customers with their own data, you know, training on their own environments? Yeah. So the approach is multilingual evaluation and not just the translation.

Starting point is 00:05:42 So we can use human in the loop validation with cultural test cases. We can have observability. We can have observability for both hallucination and toxicity specifically for those languages. And critically, we roll out in stages. So before making them generally available for audience, we are going to have our pilot, internal customers, localization experts to understand those cultural events, and enable only then those globally languages and actively foster to monitor and strive to improve that accuracy for the existing and the newly available ones. Now, I know that in this multicultural realm that we're talking about,

Starting point is 00:06:26 we're also talking about a vast landscape around the world. What did you learn between the interplay between distributed computing, underlying infrastructure, in the delivery of services without flinching for your clients? Oh, that's near to me because I have worked across distributed systems in data centers. storage systems and writing those complex algorithms to understand and now AI engineering. But I shared something that what I've learned over this course is that customer don't really care about the complexity or the jargon we speak. What they care about, the simple business impact that we can provide with focus on the

Starting point is 00:07:10 consistency, accuracy, reliability, and I'd say availability of your system. So distributed systems do fail in non-obvious ways, latency, spikes, partial failures or some regional outages, and we have seen those deterministic failure over the past decades. Now we are layering it with AI's probabilistic behavior. So we all know that NLMs hallucinate. They are non-deterministic by nature. But then reliability also comes with defensive designs, observability and guardrails, and they are not going to go away. Now, I know that your most recent work has been focused on Salesforce chatbots and

Starting point is 00:07:51 integration of LLM models into them. I think that we've all had bad experiences with earlier iterations of chatbots that can lead you into circular conversations or not satisfy from a customer service perspective. What have you seen in terms of the advancements of chatbots within Salesforce to meet the needs of customers more accurately? And why is that so important to customer ROI. That is important. I'll just answer the last part. It was that it is important to the customer's ROI to provide them personalization, to bring them closer to their customers. And definitely from my perspective and at Salesforce, I've seen the tangible ROI. Think about the traditional pinpoints, like long handle times, customers stuck in that endless loops, agents spending weeks

Starting point is 00:08:40 ramping up, and also the company's knowledge being buried in documents. And no one can literally find it. But NLMs are at least surfacing and addressing all of these. And the adoption number do tell the story because we have seen the reduced handle time in customer support, higher self-service resolution rates and faster onboarding for our agents from sandboxes to production. And then knowledge retrieval also at scale. So AI at fundamental level, we have moved from that scripted flows to that adaptive reasoning of agentic systems. And the shift that I have seen is, to be honest, massive because a customer now doesn't have to learn your system's language anymore. All they have to do is to lean on your system and trust it to match your words.

Starting point is 00:09:30 Now, I know that we're in the early days of enterprise scale AI adoption, and Salesforce is certainly a leader in this space. What have you learned along the way about integration of LLM capability to your customers worldwide? And where are your customers at in their own journeys of AI adoption? I'm going to be honest here. What I've learned is that integrating NLMs globally is less about the model and more about the systems around it. So only on the teams, so we assume that one good model will work everywhere. But the language, culture, compliance, and expectations started to vary widely.

Starting point is 00:10:13 And the biggest breakthrough came when we started to treat LLM as capability and really not a feature. So adding your guardrails. As I mentioned, observability is a big one. Monitoring them. Fallback paths is a big one to make your systems available and human in the loop workflows. So customers who were only adopters have seen those issues but have worked with us, trusted us to bring that breakthrough. to make these AI systems reliable and explainable and culturally aware, even if it's less impressive,

Starting point is 00:10:51 but at I think the global scale, it just trust that we have beats that novelty every time. That's awesome. Now, this is a topic that we've actually discussed a lot on Tech Arena before, and it's something that I actually built a platform around, but I've never had a chance to talk to someone like you about it. Can you tell me what risk? there are imbiased from one language or culture from an element,

Starting point is 00:11:17 how do you see that playing out with your customers worldwide? First of all, it's commendable to hear that your entire system has been baked around that. It's definitely not an easy sense to bring that up in the world. And personally, I do advocate for languages that are not equal culturally, and English first models have failed silently, and customers notice that. So as I mentioned earlier also about the CISO, as they research that 76% of consumers prefer to interact in their native language. And that's true because some examples do sound offensive in Japanese or in any other language,

Starting point is 00:11:54 which is fine in English. And what I've observed is that prompting alone is insufficient here. So you need evaluations, localization exports, and feedback loops to understand where the system is at, to understand its true picture of the system. Because fundamentally billions or millions of people are going to communicate with your system and it's going to impact them. Yeah, and I think of also just LLMs are defining what truth is.

Starting point is 00:12:26 And if that is the case, much like the Internet has been doing for a long time, we need to define whose truth. And is there room for cultural competence within that lens? I think that what you're onto here in terms of offensively, language is the tip of the sphere as we start peeling this back and seeing all of the ramifications of being trained on cultural history. I think it's a fascinating topic and one that frankly is not getting enough attention. I do want to know how the sales firm team has tackled this. And obviously,

Starting point is 00:12:59 a Salesforce instance is going to have a different lens than some other instances that are utilizing LLM. So what is the impact been in the work that you've done in terms of your customer response? Yeah. So I'd say I'd do like a two part to this, that the first approach is, as I mentioned, multilingual evaluation. So we use human in the looph, validations and cultural test cases, having observability for both, as you mentioned, hallucination, toxicity, and then critically lowering out in stages, being mindful of what goes when and before making it generally available, that will actually help understand the impact that it can have

Starting point is 00:13:41 before the impact that it creates. And we have been able to enable 17 plus languages and actively foster to incorporate more languages to strive to improve that accuracy also for the existing available ones. Now, one thing that I would like to highlight is that I was surprised and happy to observe projects like Swiss. for instance, which is a specific model for underrepresented European languages. And I believe we need more of those to build or bridge that gap among these culturally available families.

Starting point is 00:14:19 When you think about the results that you achieved, you know, I think that one of the things that I think about is it means something for model development moving forward. You talked about English first models. You talked about the need for human in the loop, which I've read a lot about and it makes total sense. What would you like to see from LLM providers in tackling how they make it easier for teams like yours moving forward? Yeah. So LLM providers do provide multilingual support and parity. And what I would like to see, it becoming better in those multilingual. with parity. Currently, the benchmarks are like 80%, 60% for the languages, and it creates that gap.

Starting point is 00:15:07 It's widening the gap of AI adoption. I would like to see cultural context embeddings, the transparent model behavior by language, and also stronger evaluation to link out of the box. So currently, we need to ourselves bake or understand those manual evaluations or evaluations based on existing models which are for English heavy domains. And the moment you start evaluating for other languages, you would see 5 to 15% drop in the quality. So we need that shift from that benchmark first also to real world first. Nice.

Starting point is 00:15:50 Now, what advice do you have for other practitioners? who might be a bit behind in terms of their adoption curve on tackling a multilingual, multicultural implementation of Gen AI, what are the key learnings that you've got that you think are out of the gates they need to be thinking about? Yeah, I love this. I'm just reflecting on everything that I've done so far to give a practical advice here. I'd say start with one language and one workflow. Keep it simple initially. Measure the trust before the purpose. performance of the language. And I'd say add observability early because when we add later stages, it becomes complex. Involve the real users first. And I'd say like assume that whatever you

Starting point is 00:16:38 are doing is wrong and design according to that feedback loop. What I want like early practitioners to understand is you don't need perfection to start. But having that intentionality to where you are starting and where you are going is what matters in this path because the companies who are succeeding aren't the ones actually who waited for that perfect model or the perfect data set to be out. They are the ones who just started learning from real users immediately and then iterating and experimenting in this world. I love that. I think that that's so true. And it's true beyond just what you're talking about. Always assume that what you're doing is could be wrong. I love that.

Starting point is 00:17:24 Where can folks find out more, Ashwin? I've loved this conversation, and I'm sure our audience has to, where can they find out more about your and your team's work at Salesforce and some of the broader topics that you're talking about and learn more from you? Definitely. So they can definitely find me on LinkedIn. I'm there. They can find Salesforce blogs.

Starting point is 00:17:43 They are out there to understand Salesforce learning. Realhead is a great place. If anyone is starting to learn on Salesforce and how to build with Salesforce, Agent Force, You can find me on YouTube at the date I am Eswin. And you know what? I'm actually thinking of Ellison to create a practical series. I myself come from distributed systems, software engineering, backend systems background. And I really want to break down and reflect to share and guide what's the shiny versus what actually matters when building these scalable AI platforms, either with or without AI.

Starting point is 00:18:16 So, yeah, I believe that would help a lot of software engineers who are trying to make a leap into AI engineering. I love it. I think it's such important wisdom to pass on. You really are doing some interesting path finding, and we would love to have you back on the show when you have more to share, Ashwin. Thank you so much for the time today. It was such a pleasure. Absolutely. It's pleasure for me. All the pleasure is my medicine. Thank you for having me. Thanks so much. Thanks for joining Tech Arena. Subscribe and engage at our website, Techorina.com. All content is copyright by Tech Arena.

In The Arena by TechArena - Scaling Multilingual GenAI with Ishween Kaur

Ishween Kaur, Generative AI Lead at Salesforce, shares practical lessons on building multilingual AI systems, closing language gaps, and scaling with trust and real-world feedback....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.