The AI Daily Brief: Artificial Intelligence News and Analysis - Are Voice Agents the Next Big Computing Platform?
Episode Date: March 5, 2025A new AI voice model, Sesame, is turning heads with its natural conversations, leaving many wondering if this is a turning point for AI assistants. Companies are racing to build AI-powered voice tools... for customer service, sales, education, and more. Is this the future of computing, or will people resist talking to machines? Plus Anthropic raises $3.5B. Brought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, a new voice model has everyone chattering about the future of voice agents.
And before that in the headlines, Anthropic completes a big new raise that seems for some to be driven by one particular use case.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes.
We kick off today with Anthropics' latest rays. It is another monster,
around. Honestly, at this point, numbers are starting to become meaningless. Every company is raising
a gazillion dollars at a Googleplex valuation. But this one is interesting to me for a very specific
reason, which is the way that at least some of their investors are framing Anthropics focus.
So first of the details, the company raised $3.5 billion in Series E funding, led by a billion
investment from light speed venture partners with Salesforce, Cisco Fidelity, General Catalyst, D1,
Jane Street, Menlo, and Bessemer all participating. Anthropics said in the most boiler
plate of fundraising text. With this investment, Anthropic will advance its development of next
generation AI systems, expand its compute capacity, deepen its research in mechanistic interpretability
and alignment, and accelerate its international expansion. Anthropics valuation has now tripled
since their February 24 Series D. One year ago, the company was valued at $18 billion.
The valuation is supported by skyrocketing revenue, which went 10x to hit a billion dollars
annualized by the end of last year. Bloomberg sources also suggest that Anthropic have already experienced
30% revenue boost so far this year with the release of Claude 3.7 Sonnet. Certainly it appears that
there are no shortage of buyers for the foundation model companies at these mega rounds. Earlier,
reports had Anthropic only seeking to raise $2 billion, so it appears that an extra billion
and a half was squeezed in. The information had previously reported that Anthropic expects to burn
$3 billion this year, so this funding round should see them clean leaf until 2026, unless, of course,
they really step on the gas. All right, now I promised at the beginning of this that there was something
interesting about this to me, and it comes from this offhanded tweet from D.D. Das,
who is a VC at Menlo Ventures. Didi writes, excited to announce that we at Menlo Ventures are doubling
down on our investment in Anthropic. $61.5 billion is a small price for 10xing every software
engineer. Now, of course, the caveat here is that this is just one investor's take. But the fact that
he so clearly puts the value of Anthropic as tied to the specific use case of
transforming and improving software engineering is really fascinating.
If U.S. Anthropic, they'd say that they were competing for AGI.
Same way that OpenAI would say that.
Same way that Sergey Brin said that in his recent memo to Google.
So the fact that you have one of these investors,
who's just put a bunch of money back in this company,
really honing in on this very specific use case is fascinating.
Making it more interesting is the fact that it has become increasingly clear
that there is really almost no technology moat right now when it comes to the foundation models.
Sure, some can be stated-of-the-art for a few weeks or even a few months, but after that, everyone catches up.
It certainly doesn't seem defensible enough to build a real mode around.
And yet, maybe the one exception to that is the chokehold that Anthropic has on coders.
It has for some time now, really since the middle of last year, been the default choice as the coding assistant underlying model.
It is, for example, what most people use inside cursor, which has driven a huge amount of their
revenue, and I don't think it's an accident that the release of 3.7 coincides with a big
jump in their revenue as that becomes the default model as well. Now, obviously, Anthropic
is going to try to compete in more areas than just software engineering, but the fact that
it has unlocked some magic in this area, and it continues to keep that lead is really fascinating,
if only because it's so out of sync with basically every other area of AI.
Next up today, moving over to the infrastructure side of things,
Taiwanese chipmaker TSM has announced plans to invest $100 billion into the United States.
The firm will use the money to expand operations in Arizona and construct three new chip plants.
The investment will include two advanced chip packaging facilities as well as a research
and development center.
Over time, these plants could help bring the entire AI chip development and manufacturing cycle
onshore.
CEO, C.C. Way attended the White House for the announcement,
which of course advances the long-held strategic goal of reviving domestic chipmars.
making. In the ceremony, President Trump made it clear that building up the industry was a matter
of national security and also tried to argue that it's evidence that his tariff policies are having the
desired effect. He said, if they did the chips in Taiwan to send them here, they'll have 25 or 30%
or 50% or whatever the number may be in tariffs. It'll only go up. By doing it here, there's no
tariffs. Trump boasted, the most powerful AI chips in the world will be made right here in America.
David Newman, CEO of Tech Advisory firm Futurum Group,
speculated that the investment may indeed have been part of a deal to delay tariffs.
He said, as the U.S. continues to push for increased domestic manufacturing and with tariffs on the horizon,
a substantial commitment from TSM could serve as a strategic gesture of goodwill.
The expansion of U.S. facilities could also be a boon for TSMC after the success of the Arizona plant.
Planning for that facility began in 2020 and was bolstered by the Biden Chips Act.
The FAB ramped up into full production late last year and produced greater yields than TSM's average output for Taiwanese facilities.
The new commitment will be in addition to $65 billion that has already been earmarked to build out that Arizona facility.
The news triggered concerns that TSM was becoming an American company,
with online commentators referring to the company as ASMC,
and saying that Wei, quote, looks like he has a gun behind his head, hostage situation.
Taiwanese economic minister Kyoji Hu responded.
TSM already has plants in the U.S. in Japan, and now it's building a new plant in Germany.
These have nothing to do with tariffs.
TSM's global expansion is a crucial development.
Some pretty big stories today in the headlines here that certainly could have been their own main episodes.
For now, though, we'll wrap there.
up move into a discussion of voice agents and a model which could be the chat GPT moment for that
particular subset of AI technology. Today's episode is brought to you by Vanta. Trust isn't just
earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security
professional scaling your GRC program, proving your commitment to security has never been
more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust
by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701.
Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk.
Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly.
Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company.
Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk
and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com
slash NLW. That's VANTA.com slash NLW for $1,000 off.
There is a massive shift taking place right now, from using AI to help you do your work,
to deploying AI agents to just do your work for you. Of course, in that shift,
there is a ton of complication.
First of all, of these seemingly
thousands of agents out there,
which are actually ready for prime time,
which can do what they promise?
And beyond even that,
which of these agents will actually fit in my workflows?
What can integrate with the way
that we do business right now?
These are the questions at the heart
of the super intelligent agent readiness audit.
We've built a voice agent
that can scale across your entire team,
mapping your processes,
better understanding your business,
figuring out where you are
with AI and agents right now,
in order to provide recommendations that actually fit you and your company.
Our proprietary agent consulting engine and agent capabilities knowledge base
will leave you with action plans, recommendations,
and specific follow-ups that will help you make your next steps into the world
of a new agentic workforce.
To learn more about Super's Agent Readiness Audit,
email Agent at BSUPER.a.i,
or just email me directly, NLW at B-Supor.Ai,
and let's get you set up with the most disruptive technology of our lifetimes.
Hey listeners, are you tasked with the safe deployment and use of trustworthy AI?
KPMG has a first of its kind AI Risk and Controls Guide,
which provides a structured approach for organizations to begin identifying AI risks
and design controls to mitigate threats.
What makes KPMG's AI Risks and Controls Guide different is that it outlines practical
control considerations to help businesses manage risks and accelerate value.
To learn more, go to www.kpmG.org.us slash AI Guide.
that's www.kmg.us slash AI guide.
Welcome back to the AI Daily Brief.
Today we are talking nominally about a product demo that has a lot of people talking,
both literally and figuratively.
The product is called Sesame and some are even saying that this is the GPT3 moment for voice AI.
Beyond just talking about Sesame, however,
I wanted to use this as a chance to check in on the state of voice agents.
There has been an incredible explosion,
of voice-based platforms. Indeed, even Super Intelligent has rebuilt our main product offering
to be anchored by a voice agent that can conduct infinitely scalable interviews across entire
organizations to help them understand where their AI and agent opportunities lie. So obviously,
because of that, this is an area that we've been thinking about a lot. So what we're going to do
today is talk first about Sesame, this demo that has people so interested and excited, and then
we'll broaden out to the larger conversation of voice agents more generally. Is this the next great
computing platform, or are we underestimating just how much of a change speaking to computers
as our primary way of interacting really is? Let's get to Sesame first. Over the weekend, if you were
on Twitter slash X, you might have seen people dropping demos of what seemed like a really
impressive new voice model. Here's an example from Professor Ethan Malik.
Hey there. Another face in the digital crowd. Well, so to speak.
Gansomini you here. What do you mean by a digital crowd? Well, you know, just
Riffing. It's a figure of speech. Kind of like a party, but online. What's your name?
So hold on. What's your name first? You got me there. I guess introductions are customary.
Names Maya. It's a pleasure.
And so what do you do for a living, Maya?
Living is a strong word.
Let's flip over to one of the demos that the company themselves share just for a second to get a different context.
Let's imagine I just found you in a Twitter thread.
Can you guide me through a bit of a meditation?
Okay.
Close your eyes.
If you're comfortable, take a deep breath in.
And slowly exhale.
Let go of any tension and your shoulders, your jaw, wherever you hold it.
Okay.
Can you also reference what I've been doing on Twitter in a bit of a comedic way?
Okay.
So picture yourself stepping away from that glowing rectangle.
You know the one, the one with all the hot takes and carefully curated outrage.
Let it fade for a moment.
Just let it be.
Professor Malik writes,
The new AI voice from Sesame really is a powerful illustration of where AI is going.
He pointed out that in the demo that he shared, it was all real time from his browser.
He writes, excellent use of disfluencies, pauses, even intakes of breath, really make
seem like a human, though bits of uncanniness remain for now.
Pliny the Liberator writes,
All right, Sesame is by far the most AGI-feeling voice model I've interacted with yet.
They absolutely cooked.
Chubby writes, I agree with Pliny.
This is the first feel-the-a-gI moment for AI voice mode for me.
If this would be the new Siri or Alexa, I would treat it as a real human being as it sounds so natural.
And we have to remember this is the worst it will ever be.
Entrepreneur Marillo Perriera writes, this is incredible.
The voice sounds so natural and the replies are so fast.
Maybe too fast?
It was even able to pronounce my name, which isn't super common.
Better conversationalists than many humans.
Signal writes, my lord, the Sesame Voice AI is absolutely effing insane.
I knew it was artificial. I knew there wasn't a real person on the other end.
And yet I still felt like I was talking to a person.
I felt the same social pressure, the same awkwardness when I hesitated,
the same discomfort when I misspoke.
It wasn't just convincing it worked on me in a way I didn't expect.
I used to think I'd be immune to this.
Menlo's Didi-Das sums up,
this is the GPT3 moment for Voice AI, the movie Her.
Everything we had before was turn-by-turn, robotic, emotionless with no human tics.
This, as D.D. points out, is something very different.
Now, in terms of some of the details, one of the big innovations that's almost going a little underappreciated
is that the voice model is actually extremely small.
Sesame trained the model on approximately a million hours of audio,
but managed to bake it down into a one billion parameter model with a hundred million parameter decoder.
They also produced a 3B and an 8B model, which suggests this level of voice assistant will be very cheap
and likely able to run on local devices.
Tosh wrote,
The cost of speech to speech is going to zero when Sesame open sources the weights.
Sesame, which was co-founded by Oculus co-founder Brandon Arribay,
has also committed to releasing the model under the Apache 2.0 commercial use license.
So what then is their play? Is this a competitor, for example, to 11 labs?
In point of fact, Sesame is actually going for something different. Their webpage reads,
Bringing the Computer to Life, We Believe in a Future where Computers are lifelike. They will
see, hear, and collaborate with us the way we're used to. A natural human voice is key to
unlocking this future. And so to get there, they write they have two goals. The first is
a personal companion, which they describe as an ever-present brilliant friend and conversationalist,
keeping you informed and organized, helping you be a better version of yourself.
And second, and this is where the product vision comes in, lightweight eyewear, designed, they say,
to be worn all day, giving you high-quality audio and convenient access to your companion
who can observe the world alongside you. So in terms of why Sesame could think about just giving
away what seems like such a big innovation, it's because they're playing a very different
type of game. And the fact that they do have this other business model, which potentially allows them
to open source what they've built, has some real implications for other companies in the space.
Developer Adil Mania experimented with Sesame and immediately thought about all of the applications
and use cases that something like this could open up. He wrote,
It's way more human than ChatchipT advanced voice mode. I would clearly prefer to talk to such a
voice about my problems than a psychologist. I would clearly prefer practicing my English with her
than a teacher or dualingo. Now, obviously not everyone will feel the same, but I think that the
broader point that every increase in the capabilities of voice AI opens up new use cases and
opportunities is absolutely true. And that gets us to the broader state of AI voice and specifically
AI voice agents. In a recent report by Andrewson Horowitz's partner Olivia Moore, she wrote,
Voice is one of the most powerful unlocks for AI application companies. It is the most frequent
and most information dense form of human communication, made programmable for the first time due to
AI. For enterprises, AI directly replaces human labor with technology. It's cheaper, faster,
more reliable, and often outperforms humans. Voice agents also allow businesses to be available to
their customers 24-7 to answer questions, schedule appointments, or complete purchases.
Customer availability and business availability no longer have to match one-to-one. With voice
agents, every business can always be online. And here's where the big pronouncements start to come.
Moore writes, for consumers, we believe voice will be the first and perhaps the primary way people
interact with AI. This interaction could take the form of an always available companion or coach,
or by democratizing services such as language learning that were previously inaccessible.
And importantly, she writes, as models improve voice will become the wedge, not the product.
And indeed, there are so many products. One just needs to look at the B2B voice agents market
map that they include to see how companies are using voice for just all sorts of different
applications. Of course, there's customer service, the category that you probably think of most
when it comes to voice agents, but then there's also voice agents for sales, voice agents for finance,
voice agents for training, logistics, research, recruiting, and then also agents that are
verticalized by industry rather than function, home services, restaurants, auto dealers, real estate,
and a heck of a lot of health care. On the consumer side, there's a ton happening in ed tech,
there's some interesting experiments on therapy, plus there are the general experiments with
companions and assistants. The report points out that since 2020, why Combinator has seen
90 voice agent companies. And as you might imagine, this is only accelerated over the last year.
And interestingly, if you look back, 2024 really was the key building year to enable voice
agents in so many different ways. OpenAI debuted GPT40 voice in May of 2024 and began rollout
of advanced voice mode in July of 2024. In September, Notebook LM started to go viral with its
audio overviews, and then in October, OpenAI launched the real-time API for developers. A lot of the early
wedges are in doing the same stuff that's available now with humans, just at times or in ways
that wouldn't make sense at human prices. So, for example, after hours or overflow calls,
net new outbound calls and sales, and back office calls that aren't customer facing. Even as these voice
agents roll out right now, you still have the behavior of people desperately trying to get
to a human customer service worker instead as soon as they recognize that they're talking to AI.
And so the question becomes, does Sesame as an update fundamentally change that?
Alex Carlson argues that while this might feel like another incremental improvement,
it actually represents what he calls a paradigm shift in how we'll perceive and interact with technology.
He continues,
Unlike traditional text-to-speech systems that generate audio from pre-written text,
Sesame's conversational speech model processes language and prosody,
a fancy word for rhythm,
allowing for real-time adjustments including natural hesitations,
interruptions, and emotional inflections.
And there's some early evidence that this more human style of voice mode
could really make a difference in how voice agents actually
behave. Entrepreneur John McCrea writes, a sign of how powerful Sesame's platform is,
two or three times now my conversation with Maya was so engaging that it was in danger of going
over the 30-minute time limit. And when she said she had to go, I felt like I was being abandoned.
Some are even warning of negative social implications. Jeku Jose writes, imagine generative avatars
with Sesame-like audio running locally on your phone. Almost every teenager's first relationship is going
to be an AI then. Real relationships, let alone marriages, are going to nosedive in the next five
years. This will be the most fundamental change to humanity in all of history. I think it's quite hard to
predict changes to fundamental human paradigms like that. But what's for sure is that voice agents were
already primed to be one of the most important themes of 2025. And the fact that we are now just at the
beginning of the third month of that year and seeing what is potentially a major change and upgrade
to the underlying voice models that those platforms run on might just accelerate that trend even more.
I'm interested to know from you guys, have you interacted yet with a
any voice agents? How have they been? Are you hyper aware the whole time that you're talking to AI?
And if you are, does that bother you? Let me know in the comments either on Spotify or YouTube,
and we will continue watching this trend play out. For now that, that's going to do it for today's
AI Daily Brief. Until next time, peace.
