Daybreak - The new tech job you’ve never heard of and why India’s leading it

Starting point is 00:00:01 Hi, this is Rohan Dharma Kumar. If you've heard any of the Ken's podcasts, you've probably heard me, my interruptions, my analogies, and my contrarian takes on most topics. And you might rightly be wondering why am I interrupting this episode too? It's for a special announcement. For the last few months, I and Sita Raman Ganeshan, my colleague and the Ken's deputy editor, have been working on an ambitious new podcast. It's called Intermission.

Starting point is 00:00:28 We want to tell the secret sauce stories of India's greatest companies. Stories of how they were born, how they fought to survive, how they build their organizations and culture, how they manage to innovate and thrive over decades, and most importantly, how they're poised today. To do that, Sita and I have been reading books, poring over reports, going through financial statements, digging up archives, and talking to dozens of people. And if that wasn't enough, we also decided to throw in video into the mix. Yes, you heard that right. Intermission has also had to find its footing in the world of multi-camera shoots in professional studios, laborious editing, and extensive post-production.

Starting point is 00:01:15 Sita and I are still reeling from the intensity of our first studio recording. Intermission launches on March 23rd. To get an alert as soon as we release our first episode, please follow intermission on Spotify and Apple Podcasts or subscribe to the Ken's YouTube channel. You can find all of the links at the ken.com slash I am. With that, back to your episode. Welcome to the world of AI trainers.

Starting point is 00:01:47 This is a growing army of freelancers working behind the scenes to shape the way large language models think. They're hired by data training companies like Turing, Merkaw and Deckin AI, and their job is to find blind spots in models built by the likes of Open AI, Meta Anthropic and Google, and of course fix them. That means fewer hallucinations, smarter, more coherent answers, and a model that gets just a little closer to sounding, well, human. It's a noble endeavor, and of course, it's also billable. Until recently, few outside the AI world had heard of Mercor or its ilk, but that changed in June.

Starting point is 00:02:26 when Meta paid over $14 billion to acquire a 49% stake in Scale AI, one of the earliest and biggest names in AI data training. The backlash was immediate. Within days, Open AI and Google cut ties with Scale AI. And competitors rushed in to fill the gap. It was a feeding frenzy. Everyone wanted a slice of scales business. And for some, it paid off.

Starting point is 00:02:51 Like Rukesh Reddy, the founder of Deccan AI, told us, Within a week of the meta deal, they were seeing crazy traction. But here's the thing. This is a murky business. AI has already automated away many entry-level software jobs. And in return, it's created a new, if temporary, kind of white-collar gig work. One-ware software engineers, STEM researchers, even voice actors now offer up their skills to the AI gods. And quietly, India has emerged as a major beneficiary of this.

Starting point is 00:03:24 new labour arbitrage. Welcome to Daybreak, a business podcast from the Ken. I'm your host Rahil Filippos and I Don't Chase the News Cycle. Instead, every day of the week, my colleagues, Nickda Sharma and I will come to you with one business story that is worth understanding and worth your time. Today is Tuesday, the 15th of July. You've been paying attention to how AI is changing the world. You'll know it's not just the technology that's evolving, but the very nature of work itself.

Starting point is 00:04:10 Jobs are becoming shorter, more fluid, and in some cases, entirely new. It's something I've been thinking a lot about while working on my upcoming podcast 90,000 hours, which drops on July 22nd. It's a show about how work is changing faster than ever before and what that means for the careers we are trying to build. And there's no better example of this shift than the quiet rise of a new kind of worker, the AI trainer. You see, the nature of AI training has fundamentally changed and there's no better example of that in action than with the OG, scalar AI. It got a little bit of a bad reputation back in the day for hiring a bunch of freelancers or click workers as they were better known

Starting point is 00:04:53 from countries like the Philippines and Kenya for basic kinds of data labelling. This could be as simple as drawing a box around an object and identifying it as, say, a car or a tree. Sounds like something straight out of the Apple show, Severus. right? Well, there were reports of these cheaply paid workers being mistreated, which led to a lot of backlash from labour groups. In fact, the company's subsidiary Remo Tasks was given an abysmal one out of ten scored by Labor Rights Watchdog Fair Work in 2023. But that was essentially the nature of work back then. It was pretty easy and straightforward. Today, that isn't the case

Starting point is 00:05:33 at all. Data is at the core of how an AI model functions. Essentially, it's fed large amounts of structured and unstructured data and then makes connections between them to produce outputs. But the kind of data involved has been changing. The way these AI models are trained has changed just as rapidly as the models themselves. The AI trainers of today are far from simple button pushers. Instead, they actively evaluate prompts, push the model's limits to improve its reasoning. and train it on domain-specific, particularly STEM-centric data. Reddy explained to us how challenging and fast the work is. For instance, if a trainer starts off week one by feeding the chatbot

Starting point is 00:06:16 undergraduate-level physics problems and seeing it fail, by week five, the chatbot will have reached a stage where it can solve even advanced PhD-level questions. Now, this is the kind of thing that the likes of Mercor, Turing and Deccan AI are in the business of. Their added positioning as job platforms for skilled workers helps them attract the right kind of talent. For instance, Turing pivoted from being a job platform that placed high-quality software engineers as contractors with tech companies to a player in the AI data space after its partnership with OpenEI in the lead-up to Chad GPT's launch in 2022. Now, in the process, India has very quickly emerged as the ultimate destination to hire this kind of talent.

Starting point is 00:07:00 That's because people here have strong stem skills and are also decent as speaking English. The two prerequisites for this particular job. Stay tuned. Hi, this is Rahel, the co-host of Daybreak. I'm quickly pausing this episode to share something very exciting with you. We are hiring. We are looking for someone in the early stages of their journalism career, maybe your fresh out of journalism school,

Starting point is 00:07:27 or you've done a couple internships and projects and are eager to take on your first full-time role in audio. I'm looking for a co-host to help launch a brand new podcast from the Ken. This is a full-time role and you'll get to work closely with me as well as the rest of the audio team right from story pitches to interviews to shaping how the final show sounds. If you're curious, ambitious and absolutely love audio storytelling, we would love to hear from you. All the details will be in the show notes of this episode.

Starting point is 00:07:58 Though a lot of these AI data training companies are headquartered in the US, they hire most of their trainers from India, where talent is abundant, cheap and of good quality. They aren't just hiring coders anymore. They're looking for people with master's degrees and PhDs in STEM and humanities. They're even hiring voice actors in some cases, people who can provide voice data in various accents and tones.

Starting point is 00:08:28 So the nature of the game is fundamentally different today. So much so, that it's hard to figure out what exactly to even call this industry anymore. Reddy explained how things like annotation and labeling are somewhat misleading. For a while, he says everyone was calling it RLHF, which is a technical definition, but conveys the distinction between this and bounding boxwork. Now he says they call it Gen AI data. But even that doesn't really give you the full picture.

Starting point is 00:08:58 Indians are now increasingly flocking to these jobs. And why wouldn't they? They're lucrative and flexible, which are two things that are very hard to find in today's job market. And the pay is great too. Depending on the company, trainers are paid anywhere from $12 for generalist tasks to $40 to $50 for specialized ones requiring domain expertise. People in the industry told us that in STEM 2, these AI training jobs pay a lot better than what researchers usually earn, even in the US and UK. Now, while it's easy to compare what AI data companies are doing today with what IT services companies have done for years, meaning higher cheap labor from India

Starting point is 00:09:36 and service wealthier companies in the US, Reddy believes that this time around, the playbook is different. For starters, he says most clients don't really care about the price. Their price agnostic. They're already spending a ton of money on GPUs. And compared to that, data money is minuscule.

Starting point is 00:09:55 The cut taken by data companies also varies with each project because they don't pass on the data from freelancers to clients as is. They usually weed out redundant data and organize it. Despite the extra work, the margins are higher than most IT services companies. So the focus is on hiring people that produce quality data. One way of doing this is by hiring from the best. Nearly 40% of Deccan AI is freelancers sourced from their job platform being sole AI are from the top 1% of Indian colleges across disciplines like engineering, medicine and law.

Starting point is 00:10:30 Another way is having a rigorous selection process that weeds out sub-par candidates. According to an employee at Turing, it too selects only 1% of those who apply to be trainers on its platform. Mercor, meanwhile, has a fully AI-based screening tool to evaluate candidates and give them real-time feedback. Now, while this may seem like a pretty sweet deal for these freelancers, this line of work may not be the most sustainable. All these jobs are ultimately tied to LLMs. And the day the broader LLM bubble bursts, well, they're done for. But that's fine by these 8%. AI data freelancers.

Starting point is 00:11:06 They're more than happy to make a quick buck while they can. Some do have qualms about the eventual implications of the work they're doing, but dearly not enough to stop them from doing it. Daybreak is produced from the newsroom of the Ken, India's first subscriber-focused business news platform. What you're listening to is just a small sample of our subscriber-only offerings. A full subscription unlocks daily long-form feature stories, newsletters and podcast extras.

Starting point is 00:11:38 Head to the Ken.com and click on the red subscribe button on the top of the website. Today's episode was hosted by Rahil Filippos and edited by Rajiv Sien.

Daybreak - The new tech job you’ve never heard of and why India’s leading it

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.