The Good Tech Companies - Modulate’s New Voice Intelligence API: Smart Transcription, Emotion Detection & Deepfake Defense

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Modulates new voice intelligence API, smart transcription, emotion detection and deep fake defense. By modulate. In the last few years, there's been a wave of interest in voice-based AI whether to understand us human beings, or to interact with us directly. Vute organizations using this newest wave of AI face a challenge, because understanding voice is hard. We've spent years processing and analyzing real-world speech, to give insights into user behaviors. Now, we're excited to nouns early access

Starting point is 00:00:34 to test out our underlying voice intelligence models to see just how powerful and flexible our tech can be. Read on to find out how to get involved. The challenge of effective speech analysis. We know speech analysis is not a matter of mere transcription. People inject emotion into the way we perform our speech that carries deep significance. Sarcasm, friendly banter,

Starting point is 00:00:55 and other nuanced speech patterns require a level of contextual understanding that even the best ice have struggled to reach. But even when it is a matter of mere transcription, that problem is hard enough in its own. Sure, plenty of companies have built transcription models that support nice, clean audio recordings made by someone trying to be understood, for instance,

Starting point is 00:01:14 someone enunciating crisply to be heard by their home assistant, or intentionally altering their speech patterns to ensure an AI agent gets what they're trying to say. But accurately understanding speech the way we humans talk to each other, filled with sharp emotional turns, mumbled comments, background noise and multiple speakers, and all often being shouted at a half-decent microphone struggling to pick up the full range of frequencies is another story entirely. From the beginning, Modulate's goal has been to crack the code here. We don't just want to make AI tools. We want to make tools that actually understand the ways

Starting point is 00:01:48 real people socialize, conduct business, and learn about the world. And we've had tremendous success in doing so, helping top gaming platforms including Call of Duty and GTA online recognize the difference between friendly banter and harmful intent, and working with global B2C brands to recognize frustrated callers or spot and prevent would be fraud. We're extremely proud of the products we've built to unlock this value, including ToxMod and Voice Vault. And we've recently been thinking, what if we could give everyone the tools to do the same, introducing Modulates voice intelligence API. Under the hood of ToxMod and Voice Vault are unique, custom-built models for transcription, emotion modeling, deep fake detection, and much more. And the more I've learned, the more

Starting point is 00:02:32 we've realized that these models exceed what's on the market today in crucial ways. Now, we're not just saying that as a brag about our machine learning team, though they are incredible. Our data is actually critical to our success. Thanks to our work in both gaming and enterprise, we've been able to analyze hundreds of millions of hours of real conversational audio, showcasing the full range of how people speak to each other both professionally and socially. Take transcription as one example. Most modern transcription models are trained either on overly pristine datasets, built out of studio recordings or other similar environments, or are simply scraping everything they can find from platforms like YouTube or Spotify, which don't actually reflect real-world conversations so

Starting point is 00:03:13 much as a certain type of performance. Top AI companies have been able to make great strides with these datasets, but still tend to struggle on noisy conversations and variable audio quality. On these kinds of messy datasets, modulates transcription substantially outperforms, for instance, our word error rate, WER, exceeds open AIS Le Test Whisper Large V3 model by 40%, with roughly 15x faster inference to boot. This is why we're so excited not just about the potential for voice vault and talks mod alone, but we also believe our underlying models have the potential Tommosively improve AI systems across the board, helping all of our agents and classifiers understand real human beings, in real conversations, like never before. Try it out yourself. If this gets you

Starting point is 00:03:59 excited, we'd love to hear from you. We're in the process of opening up APIs to our underlying models to join the wait list and share more about how you hope to use next level transcription, emotion analysis, deep fake detection, voice-based age estimation, or more, please fill out the quick form here. Thank you for listening to this Hackernoon story, read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Modulate’s New Voice Intelligence API: Smart Transcription, Emotion Detection & Deepfake Defense

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.