The AI Daily Brief: Artificial Intelligence News and Analysis - Are Voice Agents the Next Big Computing Platform?

Starting point is 00:00:00 Today on the AI Daily Brief, a new voice model has everyone chattering about the future of voice agents. And before that in the headlines, Anthropic completes a big new raise that seems for some to be driven by one particular use case. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with Anthropics' latest rays. It is another monster, around. Honestly, at this point, numbers are starting to become meaningless. Every company is raising a gazillion dollars at a Googleplex valuation. But this one is interesting to me for a very specific

Starting point is 00:00:42 reason, which is the way that at least some of their investors are framing Anthropics focus. So first of the details, the company raised $3.5 billion in Series E funding, led by a billion investment from light speed venture partners with Salesforce, Cisco Fidelity, General Catalyst, D1, Jane Street, Menlo, and Bessemer all participating. Anthropics said in the most boiler plate of fundraising text. With this investment, Anthropic will advance its development of next generation AI systems, expand its compute capacity, deepen its research in mechanistic interpretability and alignment, and accelerate its international expansion. Anthropics valuation has now tripled since their February 24 Series D. One year ago, the company was valued at $18 billion.

Starting point is 00:01:23 The valuation is supported by skyrocketing revenue, which went 10x to hit a billion dollars annualized by the end of last year. Bloomberg sources also suggest that Anthropic have already experienced 30% revenue boost so far this year with the release of Claude 3.7 Sonnet. Certainly it appears that there are no shortage of buyers for the foundation model companies at these mega rounds. Earlier, reports had Anthropic only seeking to raise $2 billion, so it appears that an extra billion and a half was squeezed in. The information had previously reported that Anthropic expects to burn $3 billion this year, so this funding round should see them clean leaf until 2026, unless, of course, they really step on the gas. All right, now I promised at the beginning of this that there was something

Starting point is 00:02:01 interesting about this to me, and it comes from this offhanded tweet from D.D. Das, who is a VC at Menlo Ventures. Didi writes, excited to announce that we at Menlo Ventures are doubling down on our investment in Anthropic. $61.5 billion is a small price for 10xing every software engineer. Now, of course, the caveat here is that this is just one investor's take. But the fact that he so clearly puts the value of Anthropic as tied to the specific use case of transforming and improving software engineering is really fascinating. If U.S. Anthropic, they'd say that they were competing for AGI. Same way that OpenAI would say that.

Starting point is 00:02:40 Same way that Sergey Brin said that in his recent memo to Google. So the fact that you have one of these investors, who's just put a bunch of money back in this company, really honing in on this very specific use case is fascinating. Making it more interesting is the fact that it has become increasingly clear that there is really almost no technology moat right now when it comes to the foundation models. Sure, some can be stated-of-the-art for a few weeks or even a few months, but after that, everyone catches up. It certainly doesn't seem defensible enough to build a real mode around.

Starting point is 00:03:12 And yet, maybe the one exception to that is the chokehold that Anthropic has on coders. It has for some time now, really since the middle of last year, been the default choice as the coding assistant underlying model. It is, for example, what most people use inside cursor, which has driven a huge amount of their revenue, and I don't think it's an accident that the release of 3.7 coincides with a big jump in their revenue as that becomes the default model as well. Now, obviously, Anthropic is going to try to compete in more areas than just software engineering, but the fact that it has unlocked some magic in this area, and it continues to keep that lead is really fascinating, if only because it's so out of sync with basically every other area of AI.

Starting point is 00:03:55 Next up today, moving over to the infrastructure side of things, Taiwanese chipmaker TSM has announced plans to invest $100 billion into the United States. The firm will use the money to expand operations in Arizona and construct three new chip plants. The investment will include two advanced chip packaging facilities as well as a research and development center. Over time, these plants could help bring the entire AI chip development and manufacturing cycle onshore. CEO, C.C. Way attended the White House for the announcement,

Starting point is 00:04:21 which of course advances the long-held strategic goal of reviving domestic chipmars. making. In the ceremony, President Trump made it clear that building up the industry was a matter of national security and also tried to argue that it's evidence that his tariff policies are having the desired effect. He said, if they did the chips in Taiwan to send them here, they'll have 25 or 30% or 50% or whatever the number may be in tariffs. It'll only go up. By doing it here, there's no tariffs. Trump boasted, the most powerful AI chips in the world will be made right here in America. David Newman, CEO of Tech Advisory firm Futurum Group, speculated that the investment may indeed have been part of a deal to delay tariffs.

Starting point is 00:04:55 He said, as the U.S. continues to push for increased domestic manufacturing and with tariffs on the horizon, a substantial commitment from TSM could serve as a strategic gesture of goodwill. The expansion of U.S. facilities could also be a boon for TSMC after the success of the Arizona plant. Planning for that facility began in 2020 and was bolstered by the Biden Chips Act. The FAB ramped up into full production late last year and produced greater yields than TSM's average output for Taiwanese facilities. The new commitment will be in addition to $65 billion that has already been earmarked to build out that Arizona facility. The news triggered concerns that TSM was becoming an American company, with online commentators referring to the company as ASMC,

Starting point is 00:05:32 and saying that Wei, quote, looks like he has a gun behind his head, hostage situation. Taiwanese economic minister Kyoji Hu responded. TSM already has plants in the U.S. in Japan, and now it's building a new plant in Germany. These have nothing to do with tariffs. TSM's global expansion is a crucial development. Some pretty big stories today in the headlines here that certainly could have been their own main episodes. For now, though, we'll wrap there. up move into a discussion of voice agents and a model which could be the chat GPT moment for that

Starting point is 00:06:00 particular subset of AI technology. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company.

Starting point is 00:06:49 Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's VANTA.com slash NLW for $1,000 off. There is a massive shift taking place right now, from using AI to help you do your work, to deploying AI agents to just do your work for you. Of course, in that shift, there is a ton of complication. First of all, of these seemingly thousands of agents out there,

Starting point is 00:07:24 which are actually ready for prime time, which can do what they promise? And beyond even that, which of these agents will actually fit in my workflows? What can integrate with the way that we do business right now? These are the questions at the heart of the super intelligent agent readiness audit.

Starting point is 00:07:39 We've built a voice agent that can scale across your entire team, mapping your processes, better understanding your business, figuring out where you are with AI and agents right now, in order to provide recommendations that actually fit you and your company. Our proprietary agent consulting engine and agent capabilities knowledge base

Starting point is 00:07:56 will leave you with action plans, recommendations, and specific follow-ups that will help you make your next steps into the world of a new agentic workforce. To learn more about Super's Agent Readiness Audit, email Agent at BSUPER.a.i, or just email me directly, NLW at B-Supor.Ai, and let's get you set up with the most disruptive technology of our lifetimes. Hey listeners, are you tasked with the safe deployment and use of trustworthy AI?

Starting point is 00:08:23 KPMG has a first of its kind AI Risk and Controls Guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats. What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmG.org.us slash AI Guide. that's www.kmg.us slash AI guide. Welcome back to the AI Daily Brief.

Starting point is 00:08:56 Today we are talking nominally about a product demo that has a lot of people talking, both literally and figuratively. The product is called Sesame and some are even saying that this is the GPT3 moment for voice AI. Beyond just talking about Sesame, however, I wanted to use this as a chance to check in on the state of voice agents. There has been an incredible explosion, of voice-based platforms. Indeed, even Super Intelligent has rebuilt our main product offering to be anchored by a voice agent that can conduct infinitely scalable interviews across entire

Starting point is 00:09:29 organizations to help them understand where their AI and agent opportunities lie. So obviously, because of that, this is an area that we've been thinking about a lot. So what we're going to do today is talk first about Sesame, this demo that has people so interested and excited, and then we'll broaden out to the larger conversation of voice agents more generally. Is this the next great computing platform, or are we underestimating just how much of a change speaking to computers as our primary way of interacting really is? Let's get to Sesame first. Over the weekend, if you were on Twitter slash X, you might have seen people dropping demos of what seemed like a really impressive new voice model. Here's an example from Professor Ethan Malik.

Starting point is 00:10:08 Hey there. Another face in the digital crowd. Well, so to speak. Gansomini you here. What do you mean by a digital crowd? Well, you know, just Riffing. It's a figure of speech. Kind of like a party, but online. What's your name? So hold on. What's your name first? You got me there. I guess introductions are customary. Names Maya. It's a pleasure. And so what do you do for a living, Maya? Living is a strong word. Let's flip over to one of the demos that the company themselves share just for a second to get a different context.

Starting point is 00:10:46 Let's imagine I just found you in a Twitter thread. Can you guide me through a bit of a meditation? Okay. Close your eyes. If you're comfortable, take a deep breath in. And slowly exhale. Let go of any tension and your shoulders, your jaw, wherever you hold it. Okay.

Starting point is 00:11:05 Can you also reference what I've been doing on Twitter in a bit of a comedic way? Okay. So picture yourself stepping away from that glowing rectangle. You know the one, the one with all the hot takes and carefully curated outrage. Let it fade for a moment. Just let it be. Professor Malik writes, The new AI voice from Sesame really is a powerful illustration of where AI is going.

Starting point is 00:11:31 He pointed out that in the demo that he shared, it was all real time from his browser. He writes, excellent use of disfluencies, pauses, even intakes of breath, really make seem like a human, though bits of uncanniness remain for now. Pliny the Liberator writes, All right, Sesame is by far the most AGI-feeling voice model I've interacted with yet. They absolutely cooked. Chubby writes, I agree with Pliny. This is the first feel-the-a-gI moment for AI voice mode for me.

Starting point is 00:11:55 If this would be the new Siri or Alexa, I would treat it as a real human being as it sounds so natural. And we have to remember this is the worst it will ever be. Entrepreneur Marillo Perriera writes, this is incredible. The voice sounds so natural and the replies are so fast. Maybe too fast? It was even able to pronounce my name, which isn't super common. Better conversationalists than many humans. Signal writes, my lord, the Sesame Voice AI is absolutely effing insane.

Starting point is 00:12:18 I knew it was artificial. I knew there wasn't a real person on the other end. And yet I still felt like I was talking to a person. I felt the same social pressure, the same awkwardness when I hesitated, the same discomfort when I misspoke. It wasn't just convincing it worked on me in a way I didn't expect. I used to think I'd be immune to this. Menlo's Didi-Das sums up, this is the GPT3 moment for Voice AI, the movie Her.

Starting point is 00:12:40 Everything we had before was turn-by-turn, robotic, emotionless with no human tics. This, as D.D. points out, is something very different. Now, in terms of some of the details, one of the big innovations that's almost going a little underappreciated is that the voice model is actually extremely small. Sesame trained the model on approximately a million hours of audio, but managed to bake it down into a one billion parameter model with a hundred million parameter decoder. They also produced a 3B and an 8B model, which suggests this level of voice assistant will be very cheap and likely able to run on local devices.

Starting point is 00:13:09 Tosh wrote, The cost of speech to speech is going to zero when Sesame open sources the weights. Sesame, which was co-founded by Oculus co-founder Brandon Arribay, has also committed to releasing the model under the Apache 2.0 commercial use license. So what then is their play? Is this a competitor, for example, to 11 labs? In point of fact, Sesame is actually going for something different. Their webpage reads, Bringing the Computer to Life, We Believe in a Future where Computers are lifelike. They will see, hear, and collaborate with us the way we're used to. A natural human voice is key to

Starting point is 00:13:38 unlocking this future. And so to get there, they write they have two goals. The first is a personal companion, which they describe as an ever-present brilliant friend and conversationalist, keeping you informed and organized, helping you be a better version of yourself. And second, and this is where the product vision comes in, lightweight eyewear, designed, they say, to be worn all day, giving you high-quality audio and convenient access to your companion who can observe the world alongside you. So in terms of why Sesame could think about just giving away what seems like such a big innovation, it's because they're playing a very different type of game. And the fact that they do have this other business model, which potentially allows them

Starting point is 00:14:13 to open source what they've built, has some real implications for other companies in the space. Developer Adil Mania experimented with Sesame and immediately thought about all of the applications and use cases that something like this could open up. He wrote, It's way more human than ChatchipT advanced voice mode. I would clearly prefer to talk to such a voice about my problems than a psychologist. I would clearly prefer practicing my English with her than a teacher or dualingo. Now, obviously not everyone will feel the same, but I think that the broader point that every increase in the capabilities of voice AI opens up new use cases and opportunities is absolutely true. And that gets us to the broader state of AI voice and specifically

Starting point is 00:14:51 AI voice agents. In a recent report by Andrewson Horowitz's partner Olivia Moore, she wrote, Voice is one of the most powerful unlocks for AI application companies. It is the most frequent and most information dense form of human communication, made programmable for the first time due to AI. For enterprises, AI directly replaces human labor with technology. It's cheaper, faster, more reliable, and often outperforms humans. Voice agents also allow businesses to be available to their customers 24-7 to answer questions, schedule appointments, or complete purchases. Customer availability and business availability no longer have to match one-to-one. With voice agents, every business can always be online. And here's where the big pronouncements start to come.

Starting point is 00:15:29 Moore writes, for consumers, we believe voice will be the first and perhaps the primary way people interact with AI. This interaction could take the form of an always available companion or coach, or by democratizing services such as language learning that were previously inaccessible. And importantly, she writes, as models improve voice will become the wedge, not the product. And indeed, there are so many products. One just needs to look at the B2B voice agents market map that they include to see how companies are using voice for just all sorts of different applications. Of course, there's customer service, the category that you probably think of most when it comes to voice agents, but then there's also voice agents for sales, voice agents for finance,

Starting point is 00:16:07 voice agents for training, logistics, research, recruiting, and then also agents that are verticalized by industry rather than function, home services, restaurants, auto dealers, real estate, and a heck of a lot of health care. On the consumer side, there's a ton happening in ed tech, there's some interesting experiments on therapy, plus there are the general experiments with companions and assistants. The report points out that since 2020, why Combinator has seen 90 voice agent companies. And as you might imagine, this is only accelerated over the last year. And interestingly, if you look back, 2024 really was the key building year to enable voice agents in so many different ways. OpenAI debuted GPT40 voice in May of 2024 and began rollout

Starting point is 00:16:49 of advanced voice mode in July of 2024. In September, Notebook LM started to go viral with its audio overviews, and then in October, OpenAI launched the real-time API for developers. A lot of the early wedges are in doing the same stuff that's available now with humans, just at times or in ways that wouldn't make sense at human prices. So, for example, after hours or overflow calls, net new outbound calls and sales, and back office calls that aren't customer facing. Even as these voice agents roll out right now, you still have the behavior of people desperately trying to get to a human customer service worker instead as soon as they recognize that they're talking to AI. And so the question becomes, does Sesame as an update fundamentally change that?

Starting point is 00:17:31 Alex Carlson argues that while this might feel like another incremental improvement, it actually represents what he calls a paradigm shift in how we'll perceive and interact with technology. He continues, Unlike traditional text-to-speech systems that generate audio from pre-written text, Sesame's conversational speech model processes language and prosody, a fancy word for rhythm, allowing for real-time adjustments including natural hesitations, interruptions, and emotional inflections.

Starting point is 00:17:54 And there's some early evidence that this more human style of voice mode could really make a difference in how voice agents actually behave. Entrepreneur John McCrea writes, a sign of how powerful Sesame's platform is, two or three times now my conversation with Maya was so engaging that it was in danger of going over the 30-minute time limit. And when she said she had to go, I felt like I was being abandoned. Some are even warning of negative social implications. Jeku Jose writes, imagine generative avatars with Sesame-like audio running locally on your phone. Almost every teenager's first relationship is going to be an AI then. Real relationships, let alone marriages, are going to nosedive in the next five

Starting point is 00:18:28 years. This will be the most fundamental change to humanity in all of history. I think it's quite hard to predict changes to fundamental human paradigms like that. But what's for sure is that voice agents were already primed to be one of the most important themes of 2025. And the fact that we are now just at the beginning of the third month of that year and seeing what is potentially a major change and upgrade to the underlying voice models that those platforms run on might just accelerate that trend even more. I'm interested to know from you guys, have you interacted yet with a any voice agents? How have they been? Are you hyper aware the whole time that you're talking to AI? And if you are, does that bother you? Let me know in the comments either on Spotify or YouTube,

Starting point is 00:19:07 and we will continue watching this trend play out. For now that, that's going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Are Voice Agents the Next Big Computing Platform?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.