Daybreak - All you need to know about India's most-hyped GenAI company

Episode Date: October 7, 2024

Sarvam, a generative AI startup based out of Bangalore, managed to raise more than $50 million from investors like Peak XIV and Khosla Ventures, in less than 6 months after it was launched la...st year. Last month, Sarvam released a range of new multilingual products—Al agents, voice and text models, and a workbench aimed at legal professionals. Enterprise customers who used Sarvam's services are satisfied with the performance of its products. But developers have flagged issues with its voice-based models. Even the text model is primarily trained on synthetic data which could lead to nonsensical answers if left untested.With increasing competition in this space, surely, Sarvam is going to  address the product issues in later releases.Tune in.Don't forget to send us your recommendation for this Thursday’s Unwind segment. The theme is “your favourite murder mystery.” Send them to us on WhatsApp as a voice note or as a text message. The number is +9189711-08379 Daybreak is produced from the newsroom of The Ken, India’s first subscriber-only business news platform. Subscribe for more exclusive, deeply-reported, and analytical business stories.

Transcript
Discussion (0)
Starting point is 00:00:01 Hi, this is Rohan Dharma Kumar. If you've heard any of the Ken's podcasts, you've probably heard me, my interruptions, my analogies, and my contrarian takes on most topics. And you might rightly be wondering why am I interrupting this episode too. It's for a special announcement. For the last few months, I and Sita Raman Ganeshan, my colleague and the Ken's deputy editor, have been working on an ambitious new podcast. It's called Intermission.
Starting point is 00:00:28 We want to tell the secret sauce stories of India's greatest companies. Stories of how they were born, how they fought to survive, how they build their organizations and culture, how they manage to innovate and thrive over decades, and most importantly, how they're poised today. To do that, Sita and I have been reading books, poring over reports, going through financial statements, digging up archives, and talking to dozens of people. And if that wasn't enough, we also decided to throw in video into the mix. Yes, you heard that right. Intermission has also had to find its footing in the world of multi-camera shoots in professional studios, laborious editing, and extensive post-production. Sita and I are still reeling from the intensity of our first studio recording.
Starting point is 00:01:21 Intermission launches on March 23rd. To get an alert as soon as we release our first episode, please follow Intermission on Spotify and Apple Podcasts or subscribe to the Ken's YouTube channel. You can find all of the links at the ken.com slash I am. With that, back to your episode. If you're like me, not too familiar with the inner workings of artificial intelligence or AI, our story today may be a little complex. But it is important to understand what is.
Starting point is 00:02:00 happening with AI in India, so I thought I'd try and break it down for the both of us. Now, as you know, the generative AI space in our country is still at a baby stage. But there is one company that seems to be catching everybody's attention. Sarva. This Gen. AI startup based out of Bangalore managed to raise more than $50 million from investors like Peak 15 and Kosla ventures in less than six months after was launched last year. And within a year, the company has already launched five products. Now, let me quickly give you a sense of what they are. The first one is Sarvam agents, which is
Starting point is 00:02:42 basically like a voice-enabled service for things like customer care and sales scores for other companies. It is available in 10 Indian languages. Then we have Sarvam 2B, which was launched just last month. Sarvam says that it is India's first foundational open source Indic small language model. A small language model, by the way, is an AI model that runs on lesser parameters than large language models or LLMs. These require just a few million or billion parameters. Now, if you're wondering, what the hell are parameters in AI? I got you. An AI expert described it perfectly in an article. He says, in essence, parameters are the knobs and dials that an AI model learns to adjust during training. They're like the ingredients in a recipe that determine
Starting point is 00:03:35 the final flavor of the dish. Okay, so now that you know what small language model and parameters are, let us get back to Sarvam's second product. Sarvum 2b can do things like translation, summarization in vernacular languages of India. Sarvam co-founder Vivek Raghavan said that it is going to be better at Indian language tasks than international models like Meta's Lama. Next in Sarvam's range of products, we have Shuka 1.0, which is basically an open source model. It is an audio extension on the Lama 8b model that will support Indian voice in and text out. Sarvam says that it is India's first open source audio language model.
Starting point is 00:04:20 You can think of it as a bit like Siri. The other two products are Sarvam Models and A1, which is made for lawyers to help them with things like drafting. Right. So now that we know exactly what Sarvam does, let me get straight to the point. The thing is, these products have fundamental glitches. My colleague Abirami spoke to seven people, including two developers who tested Sarvam's model.
Starting point is 00:04:48 For example, Shuka 1.0's accuracy in transcriptions is pretty low and so is its ability to differentiate between different speakers in an audio. So let's look at what is going on. Welcome to Daybreak, a business podcast from the Ken. I'm your host, Nagar Sharma and I don't chase the news cycle. Instead, every day of the week, my colleague Rahil Filippoz and I will come to you with one business story that is worth understanding and worth your time. Today is Monday, the 7th of October.
Starting point is 00:05:48 Okay, first some essential background on the company. One of the reasons for the hype around Sarvam is the technical jobs of its founding team. Its co-founder Vivek Raghavan worked on the development of the unique identification authority of India that runs Adhahr. He also worked on UPI or the Unified Payments Interface. Pratiyush Kumar, the other co-founder, is a former faculty member of IIT Madras and the co-founder of AI4 Bharat, which is the Institute's research lab. Also, everybody kind of knows that Sarvam has the support of its tech partners like Meta,
Starting point is 00:06:28 Microsoft and Nvidia. Plus, also from Nanda Nilakani, who is the architect of Adhar and also the co-founder of Infasis. Now, at this point, I'm sure there are much. many of you who are thinking, this is a young company that we're talking about. Maybe we should not be so hard on it. Sure, I see your point. But the thing is, the reason why Sarvam got that funding was because it is the ecosystem's first research and development or R&D intensive company. And yes, its initial corporate services have worked for early clients, like Payment Solution Platform Pine Labs and Faith Tech Company apps for Bharat.
Starting point is 00:07:10 both of which are also part of the Nilekani ecosystem. But like Abirami says, a closer look at the technical aspects of its recent product offerings may just keep your screen buffering for a long time. So let us try to understand what is the issue here. But before that, do you know what is synthetic data? Any data that is artificially generated using algorithms is synthetic. data and it can be used to validate mathematical models and also to train machine learning models.
Starting point is 00:07:48 Now, Sarvam reportedly uses 100% synthetic data in its yet to be released text model. An employee of the company told us that they believe in synthetic data and that it is the future. You see, synthetic data is quickly becoming the hottest commodity in the AI space. More and more new models, including metas, text model Lama 3 and OpenAI's video model SORA are using it. Orko Chhatupaday, the founder of LLM deployment company Pipe Shift, explained it to us. He said real-world data is just too noisy. So you tag some of the high-quality data and generate some more alongside to avoid waiting
Starting point is 00:08:32 to collect enough data. This is especially important for Indic languages, since it is impossible to get high-quality data of the internet. But it is rare to have a 100% synthetic data model. Larger models like Lama 3 do not reveal the amount of synthetic data that they use, but Sarvam employee told us that it is likely to be about 50% of its total data set. Now, to get past the language barrier in a country like India, Sarum fetches synthetic data for coding, reasoning and science. And as we know, these are domains whose data is not readily available in real-world dialects. In fact, an employee of Sarvam told us that synthetic data from its pipeline is better and more
Starting point is 00:09:19 diverse than AI for Bharat's data. So, using 100% synthetic data, is it really the right way forward? Researchers seem divided. It has been observed how even large language models, LLMs, start producing nonsensic data if they are fed with too much AI generated or synthetic data. And now let's come to Sarvam's audio model. The company claims to have built it frugally by training these AI models for just under 100 hours. But that just might be too frugal in the generative AI context.
Starting point is 00:09:59 You remember how early on I mentioned that Shuka 1.0 has very low accuracy in its transcription. Plus, one needs to specify the language before feeding the audio into the model. But the developer who tested these models pointed out that the audio they tested had multiple languages, making it very difficult to arrive at results. The thing is, unlike with large language models, audio models have not hit the data wall yet. So, the frugality approach doesn't really make sense. Sarum claims it will make their models faster and, and also reduce the cost of running them.
Starting point is 00:10:38 One can also demo Sarvam's voice model through its website. Our reporter Abirami tried it and heard a female voice with a metallic twang pitching the firm's latest line of products. The prompt requested a voice response in any popular Indian language. The Ken chose Malayalam. But the prompt kept repeating itself. After trying it several times in varying volumes, with and without earphones,
Starting point is 00:11:03 and even switching to Hindi, there was no result. Of the 10 attempts that Abirami made, the model only gave a correct response twice. Varshal Gupta, who is the founder of a Gen A.I. dubbing startup called Dubverse, told us that this may be because of the relatively low amount of voice data, which is 100 hours, that the model is trained on. He said, and I'm quoting, in my opinion, the data that's used to train Sarvams' audio model lacks in quality more than diversity. End quote.
Starting point is 00:11:37 You see, Sarvam's models were built in record time, in less than a year. And people who work in this field will know that almost more difficult than collecting capital is sometimes data. Not to mention collecting the amount of data required to effectively train models that translate not just between two languages, but 10 major Indian languages is a Herculean task. But with all the competition in this space, surely Sarum is going to address these product issues in its later releases.
Starting point is 00:12:11 After all, it is known for constantly engaging with the developer community. Daybreak is produced from the newsroom of the Ken India's first subscriber-focused business news platform. What you're listening to is just a small sample of our subscriber-only offerings. A full subscription unlocks. daily long-form feature stories, newsletters and podcast extras.
Starting point is 00:12:39 To subscribe, head to the ken.com and click on the red subscribe button on top of the Ken website. Today's episode was hosted by Snigda Sharma and edited by Rajiv CNN.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.