Cheeky Pint - The world of voice AI, with Mati Staniszewski of ElevenLabs
Episode Date: April 14, 2026Mati Staniszewski is the co-founder of ElevenLabs, the research company making audio accessible across languages and voices. He sits down with John to discuss the "voice Turing Test" and why ...AI has conquered text but still struggles with conversational speech. They discuss the future of human-computer interaction, including why we still can't get our phones to read a PDF properly and the massive potential for voice agents in everything from farming to healthcare. Mati also opens up about ElevenLabs’ rapid ascent to an $11 billion valuation and gives a behind-the-scenes look at how Ukraine is using their tech for digital government services.Timestamps(00:00:27) How audio models work(00:08:52) ElevenLabs business model(00:17:50) The conversational Turing Test(00:21:01) Link by Stripe(00:26:02) Cascaded vs speech-to-speech(00:31:53) Universal translation(00:51:41) Designing an AI-native org
Transcript
Discussion (0)
Madi Stanishefsky co-founded 11 labs in 2022, and has since scaled it to the $11 billion leader in AI audio.
He's credited it with capturing the humanness of speech through realistic emotional inflection,
and they're now expanding into everything from agentic workflows to music.
Thanks for doing this.
Thanks for having me.
Let me go a place to start is describe to me how, like, I know how an LLM works at a high level,
Describe to me how an audio model works.
Like if we were Carpathie-style looking to build a toy one from scratch,
how does it work?
In early days, you try to replicate it exactly how you would replicate it with the human body.
So you'll try to completely try to reproduce a machine, analog machine,
that will create a vocal tract effectively.
Then that progressed into trying to create effectively like digital signals for speech.
Bell Labs was one of the first,
to try to create a structured set of signals
that will represent the speech.
And that is a first precursor to what we would do today.
Then you would try to stitch in phonemes, effectively different sounds of how we speak humans,
and then try to concatenate them together.
It's another important part in that equation where you would, based on the most probabilistic
approach of the next word, you would effectively try to bring the phonemes from your labria
phonemes and bring them together.
And then down to the modern history where now we effectively do similar like neural nets,
in other domains.
So you predict the next sound based on, of course,
the context of the previous sounds, if it's a streaming speech.
If it's, let's say, a context of audio,
you will use combination of predicting of the phonemes,
but you also use the contextual text element of that work.
And here, credit to my co-founder, Piotch,
who effectively came with that new idea of how you can now
create voice models, which are both reliable, high quality,
quick, where you would bring a lot of the
ideas from transformer models, from diffusion models, into the speech space.
Yes. So that prediction of the next token on the phoneme space wasn't something that was possible.
You might be, you spoke briefly about this of like how you kind of operate on the text,
on the waveform space. There's also mel-spectrogram space. So like usually you do text,
melspectrogram waveform.
So it was a spectrogram space?
It's like a visual representation of how the speech sounds across pitch, across energy,
and then you transform that into a waveform.
Got it. So like when WaveNet came along and TACOTO
models, they would effectively use text to melspectrogram, so that visual representation,
and then how you decode and encode that into the waveform to bring it across.
And Piotch figured out how to abstract some of those steps and decode and encode them a lot,
a lot better.
So that predicting all of the next phoneme was one of the big piece.
And second big piece was, how do you bring that context into the equation?
So what I mean by context is, the voice actor was reading a textual copy.
You would know that, okay, this is a dialect sequence.
I need to produce a dialogue.
If it's a happy sentence,
I might need to pronounce as a happy sentence.
But kind of what happens before and after
comes into the equation,
and you need to bring that across.
And then there's a last big piece.
So voice model has the sound of how you intonate
the given fragment.
But the second big part is the voice itself
of the characteristics of accents, of style,
of prosody across that voice.
So when you actually try to vocalize something,
when you create that voice model, you turn text into audio,
you need the text, you also need the voice reference
of how you wanted to be spoken.
So here is kind of the second big innovation.
So apart from context, it's how you decode
and encode those features.
So when Bell Labs came with their initial representation
of speech, the big piece there was you would have effectively
hard-coded parameters for that speech.
With 11Labs models.
Hard-go-to parameters for enthusiastic speaker,
British accents.
Exactly, exactly, that kind of stuff,
like the set of pitch elements that you can select,
set of energy spectrograms you can select from.
And in our approach, effectively,
you would give the model open-ended ability
to select what those parameters should be.
So it's not going to be British, Polish, Spanish, English speaker,
but the model will deduce them themselves.
The same for other set of parameters
that are not hard-cotted, whether it's the enthusiasm,
whether it's the sadness, et cetera.
You're saying kind of Britishness
is an emergent property.
in your voice models.
Exactly.
Yeah, and those two big parts.
Encoding and decoding of how you create the voice.
Super hard problem before and figured out too.
How you then construct that in the sense is how you get the context across
so you can predict the next phonemes.
So how you bring them together in a reliable and stable way
while doing it quick.
And these were kind of the two first big innovations
in the voice models that continue to today.
But okay, so if LLM's reason about text
and, you know, words have parts,
tokens as the way they think about the world.
What is the equivalent of a token in the voice model?
You mentioned phonemes a bunch.
Like, what is that representation?
So we do, we store the voice embedding effectively for the speaker.
So you need that reference when you produce and create the speech.
Yes.
Of course, in the input to the voice model, you still get the text,
and you bring the speaker and coding.
And then when you produce speech, you do operate on the waveform or on that,
or effectively on the phoneme level of that speech.
And then when we kind of go to the opposite,
so of course, what is a phoneme, fill in my understanding?
It's like a syllable deconstructed even to smaller elements.
And these are effectively like the human sounds you can produce.
Got it.
So these would be like the most close to that representation.
But of course in our models, now it's going to be a combination of not only operating on phoneme level,
you also operate on the text level, you operate kind of kind of,
in both in sync because when you are predicting the context,
you need to understand how that sentence will get constructed,
and especially if it's more of a streaming real-time use case
and like a voice agent setting,
you need both parts to work across.
So it's similar to how you would operate on the token level
on the tech side, we operate on the token level on the audio side.
It feels like a big part of the magic of 11 was your voices
were much more human-sounding.
How did you accomplish that?
So I'll kind of give you a quick synopsis of how we think about the models on the text to speech side today.
In any model, you need architecture, you need compute, you need data.
So architecture innovations were one thing.
The data part was the second big thing.
With audio, you will have a lot of audio data available, but frequently you will not have it annotated in the right way.
You won't have which speaker is speaking when.
some of the what is annotated, but the how isn't.
So, like, as we are speaking now,
what's the emotions that we use,
what are the actions that we use?
So we would invest a lot internally on effectively creating our own data labelers,
our own team to be able to create those data sets that will be better.
And that's a combination of, of course, like semi-automatic techniques,
and then manual techniques.
And actually a lot of the models that we did afterwards
actually span out from a lot of that research, too.
So speech-to-text model,
initially was a model we did for ourselves
because the models on the market just weren't good
to annotate that data.
And then another brilliant researcher on our team
was kind of being able to construct it
so we could span it out as a model that we brought to the customers.
So you've just been doing useful stuff in voice,
and that has emerged with a whole bunch of products
that you might not have expected
because you find you're building useful stuff.
Exactly, exactly.
And that's kind of combination of data of being able to do it,
automatically create a team that's coached on voice,
on how to describe it, because most of the labelers out there
just aren't as well versed on understanding the audio and voice
helped us a lot to bring that back.
And then, of course, deploying those models in production,
seeing how customers interact with them,
having them annotate all of the data
helped us to refine those models over time.
A very interesting thing on the side.
So we spoke about the speech representation.
The first guy who created the speech representation
is a guy called Kempeland, Ron Kempeland.
So he created those analog machine that
would represent effectively a human vocal tract and try to produce that sound.
He'd spent decades on that and that kind of started producing vowels.
But that's the same person that created a chess machine,
the first viral, let's say chess machine, that would kind of simulate playing chess.
This is a mechanical Turk?
It was called Turk.
Yeah, yeah.
But exactly, but their kind of crazy thing behind it was operated by a human.
And it was all a fluke.
And that's where the mechanical Turk was.
which actually we use a not kind of data labeling production to make that work there.
Yeah, yeah.
And so we kind of jumped right in.
But if you describe the 11 business today, people think of you as the speech company.
How should they actually think of your business?
To the extent you can describe the big areas, text of speech, speech to text, voice agents,
just like break down the business for us.
Cool.
So in like the nutshell, I describe 11 laps as a research and product deployment company.
we built foundational audio and voice models,
and then build a platform for businesses
to transform how they communicate with their customers,
with their employees.
And that will apply through AI agents
from customer support, sales, hiring training,
all the way through to marketing and storytelling
for our creative tools.
And in that set, we've created all types of foundational audio models,
so text-to-speech models for producing,
speech-to-text models that work over 100 languages and happily beat others on benchmarks,
all the way through to conversational models of how you loop them together, to music, to other
domains of audio.
And then, of course, beyond the models, when you actually bring them to production, that's
where the second level of the platform comes in, where that meets the businesses on the
specific use case.
So on the agent-specific example, it would be how you now connect those models to the knowledge
base, to telephony, to the integration of the integration.
that you need to perform the actions,
how you evaluate and monitor the agent
that it behaves in the right way,
how you build the right safeguards.
On the creative side, on the marketing side,
it's how do you create a good ad
so you can create a good video voiceover
for one of the campaigns,
how you create an article that's narrated
with a specific voice that represents the brand in a good way.
So that's where we combine the models
and understanding of the customers we work with
into one policy platform.
Every platform company has this question about how far they go into applications.
So how do you think about where you go horizontal and power the whole ecosystem versus where you develop applications?
Because you can imagine there being a whole ecosystem of closed captioning tools that grow up that, again, are built on the 11 Labs tech.
It's not necessarily a space that you would have to go after yourself.
I think the big difference between your kind of question, today we see ourselves as a platform where if you're building a horizontal use,
in your business, a great place to come.
If you have a lot of domain specificity,
that's where I see a lot of kind of application companies
forming over time, where they will,
where that's specifically not the spaces we will go into.
And I think it also is interesting
when the tech is moving as quickly as it is here.
It's one thing, you know, with SaaS
where you get these like vertical-specific providers,
but I would imagine one of the biggest risks for you guys
in being intermediated is if there's,
is, you know, like in this example, a closed captioning service that is on a two versions,
old version of 11 labs and hasn't upgraded. That's a problem because you want people to be using
the latest and greatest model that you've developed and you'll be kind of deploying new capabilities
every week. And I presume that's part of your thinking is that just when it's moving that
quickly, you need to go direct in a lot of cases. That's right. In the close captioning, like, you know,
here already now we know that our services is going to be able to tackle like 99.9% of the cases.
that customers have.
And then there's like added benefit of we work with healthcare customers
where we will create custom models for those customers
where it will get that transcription perfectly.
The context is the tricky thing in closed captions
where like we talk a lot about a lot of technical stuff on this.
Yeah, for sure.
And that's where you need like effectively like a dictionary
of where they do the tag beforehand,
which as we work with the businesses,
we know we need to embed in that creation process.
We're talking about kind of products here.
And one thing I know is that LLMs are amazing,
and you have the usage stats of Czech GPT and Gemini
and all the popular LLMs where they're working on,
people use them a ton.
It feels like there's a big product overhang when it comes to voice,
where the leading edge voice models are incredibly capable.
And yet I was driving home the other day,
and I needed to read a PDF when I was driving.
And so I said, okay, I'll just have my phone read the PDF to me.
And you can kind of try and hack it with iOS screen reader, but doesn't really work with the scrolling.
And then in theory you can upload a Gemini, but you're trying to get it to not summarize it and actually
just hung when I tried to press the like, read this to me button.
And so there was no way I could get my phone to read me something, which seemed like a fairly basic,
you know, feature.
And, you know, all cars advertise voice control.
And yet it sucks in separately, you know, if you want to, you know, input something to the navigation.
Just no car has a good version of that yet.
maybe Tesla does, and an ant.
And so, why does it seem like with LLMs and cloud code and everything,
we are using all the capabilities of the intelligence,
whereas with voice, we're like living 10 years ago somehow?
Well, I'm thinking what I agree with the premise that we are 10 years behind.
In the lived experience of people, data, like they're using series transcription,
which has gotten better, and it's still way behind the leading edge.
Yeah, like, there's definitely a piece.
piece of like, I think the technology in many of those cases ready, there's a deployment gap
to what you are saying.
It's like an automotive or some of the big companies are not adopting that quickly now
for bringing that into the production.
But plenty of different problems that you need to fix along the way.
I mean, the quality of voice models for them to actually sound good, like this is only like
last three years thing.
Yeah, there's three years.
It's a three years thing.
It's a three years thing.
All right software updates now.
So that's three years for the first voice model that can narrate.
Yes.
Text, they think.
Two years ago, you can start seeing the real-time version of that.
And not really, like, it's, I think the real break was like a year ago, where you
could start seeing that in production.
And then I think over 2025, the big piece that hasn't been possible is how you connect now
the real-time voice interaction with something which I think you're referring to, like, it has
context of what you want to do, what is the material that you want to read, how does it connect
to set of your preferences from the past and gets that across?
I think that's like only recently became possible.
and where we've seen, like, kind of the big adoption
across the enterprises leading on the technical side.
I think this year, it should be in the automotive site two
or some of the applications.
Okay, so you think we'll start seeing kind of great voice models in cars this year?
This year for their own cloud use cases, like on car, in car,
so without connectivity, not yet.
There's deployment, of course, gap of, like, how you bring that into the gaps.
But I think, like, the next two years, three years,
How about the PDF reading use case?
That should work.
Yeah.
But how should I have done it?
So back in the day, I'll preempt this with a story to Q11 Reader.
But we had this problem.
We have so many audiobook authors come into 11LAPs.
So 2023, released first software.
We had a lot of creators and then a lot of audiobook authors or book authors
that tried to couldn't afford professional operation and wanted to create an audiobook.
However, none of the companies accepted AI audiobooks.
You can't sell in the AI audiobooks on Audible or something.
Exactly. It's audible with like block AI content.
So we had no choice.
Like we need to create an avenue for them to bring them.
Because there was no distribution for AI audiobooks.
Exactly. So we created 11 reader.
And that kind of came of functionality where you can upload your PDF.
You can upload your text and have it read out loud with a number of incredible voices.
So whether it's Sir Michael Kane all the way through to a state and work.
working together with Sir Richard Feynman,
where you can have that.
And so you were working with the Sir Michael Keynes of the world?
Exactly.
And then you can actually read it out of that.
And that kind of works extremely well.
So that works.
Now, how can you do it?
Actually, I do want everything read to me by Michael Kane.
It's a great voice.
Shouldn't you guys have a consumer app where I can just do the common voice things?
Like, I want to be able to have an 11 app on my phone,
and then if I upload a PDF to it, it can do the common things that I would like,
such as have it read it to me.
Yeah, that's exactly your own reader.
That's exactly your own reader.
So that works.
Okay.
The phone makers allow third-party keyboards.
Do you think they, do they allow third-party transcription engines, will they, do you think?
The phone makers you said, right?
Like Apple and Google.
Yeah, they are...
So they're an OS makers.
Yeah, not all of them.
Android, with Android you can work through it.
It's like, you know, variations of like nothing.
That's tech and others.
But yeah, I feel like if you had a popular 11 app that allowed for transcription, people would use it a bunch,
and maybe eventually Apple would say, oh, we should allow third-party transcript
engines if that's what people want.
I mean, it seems like there might be going in that direction, right?
And recently they announced that we'll open up the alchemical system.
Hopefully they will do the same with voice ecosystem, which is kind of similar.
But again, I think it's rational to do when it's moving so quickly.
Yeah.
The voice assistant paradigm is one of the oldest
paradigm, you know, UI paradigms in computing.
Like they open the pod bay door as hell from 1969.
Yeah.
I will claim it's not working yet.
So Siri doesn't have the intelligence.
And then on Gemini and ChatGBTGBT and those apps, I mean, I want to use the voice mode, but I don't know about you.
It just doesn't work.
And so, like, sometimes I'll be using my phone and I'll use the iOS keyboard transcription to type in the field and then, like, say a bunch of stuff and then send it off.
But this suggests to me that consumers really want voice mode that works.
and yet it's just not working yet for the major LLM apps or for anyone.
Why isn't it working yet?
It is pretty hard to do, because you want two things.
You want to be able to say things that you want,
but you want sometimes for it to execute it,
sometimes to wait for you to finish and add something in the sentence.
Sometimes you want it to be interactive,
so it asks you questions back to clarify and get some of the additional detail.
And all of that is actually pretty hard.
Like that's where kind of the magical,
like ideal version of a voice agent for us comes free.
where you need the speech-to-text element,
you need the transcription side,
unique, you need then the kind of the turn-taking mechanism.
So like, when do you finish sentence,
when is it likely based on silence,
based on the context,
and then sometimes you wanted to speak back and clarify,
or at least give you the tags back to clarify,
and then maybe execute set of instructions.
So that problem is still very hard research.
So I agree with the claim that, like,
this orchestration side
has not like passed a true conversational agent Turing test,
where it behaves as you would expect from another person,
where you can say,
that's the simple way of saying what I'm saying,
is that we have passed the Turing test with text LMs a long time ago,
or actually nowhere near that on voice LLM.
So it's kind of interesting how that's a final frontier.
Yeah, I feel like it's going to work in like specific domains,
like in customer support call,
passes the voice tuning test, works well.
And like, let's take another like spectrum of that,
an interactive gaming experience,
like a truly interactive as you would have with another human,
in that game, it's so hard and further out there.
We haven't passed it yet there.
Yes, yes.
Yeah, but I think that's a combination of, like,
even like a simpler version of within that,
sometimes you might give a response immediately back.
Sometimes you need a tool call to get additional information
from the database or how you orchestrate that.
So, like, that's probably the most common thing we see
as we work with some of the companies out there
is you want those systems to orchestrate extremely well
where it's a conversational use case, pretty simple.
You can root the agent to speak with,
but if you need to authenticate,
if you need to pull additional information from the database,
what do you do, how do you handle that graciously?
That's right.
And to that extent, I'll agree.
That's just getting, getting there.
And we'll hopefully see that.
Our goal is to pass the voice-turing test in all those cases
or the Turing test for all conversational agents
outside of Voice 2,
and I hope we'll all be there in the next year or so.
For subscription businesses,
a lot of revenue is long,
in that last few seconds before the checkout.
Someone has to get up or find their wallet,
or they mistyped their card number,
or they hit an error,
and they just give up and you lose the sale.
For a company like 11 Labs,
adding hundreds of thousands of subscribers,
even a tiny bit of friction like that,
it would really add up.
But that's why 11 Labs uses Link from Stripe.
Customers save their details once,
and then they can check out in seconds
across more than a million businesses
with save credentials.
So, if you want a faster checkout for your customers,
you should turn on Link from Stripe.
Are you guys working on personalized voice transcription, where it feels like part of the way we're making it hard for ourselves is when I speak to Siri, I have a bit of an accent, and so it sometimes has a hard time understanding me, but my accent doesn't change.
And so it could just get good at listening to John.
But my understanding is it's not.
It's just like running the global voice recognition model.
And I'm guessing it's the same for 11 labs for you.
running the global voice recognition model, but again, you have an accent. And so if someone's
understanding, like, if you walked up to someone in a coffee shop and said two words,
they might have a hard time understanding it because they're not putting it through their
matty Polish accent filter. And so where's this going with, like, actually interpreting
the person that you know to exist on the other side? Yeah, I have a very tricky one to detect.
So my voice is frequently used in the test. Ah, you're part of the test suite. Yeah. For text to speech,
for like, yeah, yeah, it's pretty tricky. But again, trying to parse your voice is. But again,
I'm trying to parse your voice in a global model is just making life hard.
It's like have a Matty specific model.
Yeah.
So on the speech to text by transcription, exactly.
Like the big part now that we are bringing in is you have two parts.
One, effectively like a person or a voice specific detection, which is true for the accent side,
but it's also true for a crowded room.
So that's where we have incredible research team that's able to continually do both the accuracy high,
but also add things like speaker detection,
of course, noise reduction.
But the second part is also keyword detection.
So there's specific words that you would want to say in those settings
that you want to effectively monitor for.
So we spoke about, you know, like let's say I'm going to the coffee shop
and order things.
The set of actions, like the coffee shop would expect me to do.
There's information theory.
It's like they can just listen out for the coffee words.
Exactly.
And then try to like match it to the closest proximity.
So like both things will help in a setup where you have my voice perfect,
you can decode it and code it on that.
If you don't have my voice or even if you want to double amplify it,
we already support effectively a keyword detection,
which is useful for like real-time setting and async setting.
So back to like Chicky Pine transcription,
you could effectively pre-generate that from the previous podcast
and look for a set of words that you would use traditionally in that.
And so how hard, okay, so you do the keyword detection already, but how hard are the, I want to get superhuman transcription performance by feeding it an hour of Matty audio before it listens to Matty and that it should be able to do a much better job transcribing? Is that just a really hard research problem? No, solvable. We think we can, we can roll it out in one of the next versions, which is like hopefully in the next month. Oh, so you think this year you're doing person-specific transcription?
Person-specific transcription.
Like, we can already diarious speakers extremely well.
So, like, if we are speaking, can of course dissimulate who is speaking when.
Yes.
Which is like, in transcription side, apart from accuracy, diarization is one of the harder problems.
And we do that extremely well.
And now it's going to be like effectively what you're saying, like fine-tuning based on the speaker that I want to listen to.
Yes.
Which we know will be important.
I mean, like in healthcare setup, such an important part.
You're in operating room, you're a doctor, you want to say a command, then you want to
really be able to listen to that one person specific piece.
You have a hardware device at home, let's say it's a pilot that helps you control the TV.
Here too, you will want that to listen to you versus, let's say, the family roaming around.
Or maybe you want it to everyone.
So, like, you could decide it, but in many cases, you want to be able to specify that.
Okay, that's really exciting.
It's great because there's, like, still so many unsolved research problems.
Yeah, just like, there's just breakthrough after breakthrough coming in the
in the domain of voice models.
How about on the flip side,
when it comes to speech generation,
can you the Zoom touch up my appearance feature?
Yeah, yeah.
I've always thought about that
in the context of voice,
where should you offer a de-axenting filter for voices?
Or like, even there's one podcast that I like to listen to,
but the voice a little mumbly,
and I always thought they should put it through a demumbling filter.
Just to, like, make the enunciation a little better.
But all these things, again, like Photoshop,
an image.
There's no reason that the, like, have you thought about voice to voice, basically, rather
than voice to text or text to voice?
Yeah, so there are kind of two big parts.
One, on the speed generation side, similar, so many evasion still there, there's like a wider
piece, and that's like the, we released a V-free model that kind of, we're solving that
for the first time, is like, can you control speech?
So you can have the text to speech, you generate something that sounds emotionally great.
previously until end of last year,
effectively you would rely on model to decide
what's the best performance.
You could regenerate it, but ultimately model decides
the best performance.
So that's where the controllability came in,
where we can finally give it cues of, say it in a slower way,
or change how you deliver the dramatic pause,
or kind of any cues that you give.
And to be able to do that, you need the architectural changes
and the data that we kind of created over time,
where you annotated what was said and how it was said,
so you can actually train the model to do that.
So today, finally, you can have both speed generation or entire voice agent experience
with what we call expressive mode, where the agent knows the emotions on the other side.
So if the person is stressed, it can react and be reassuring.
And that's generating a lem response on the reassuring side and response in that set of emotions too.
And that break was super hard to do.
And that, of course, stretches to a lot of what you said.
It could be some version of speech enhancement, either real time or.
in the post, a setup to change how that's delivered.
And that's relatively recent innovation.
And it's like we know it can still be so much better.
Like the edge cases of how you want to describe it is pretty large.
So that's one.
And then the second part of the question, which is a huge question, that's speech-to-speech models.
So as you said, our approach, as you think about voice agent, conversational agents,
is effectively a cascaded approach.
You use transcription and speech-to-text, L-LM, text-to-speech,
orchestrates all of that together.
And then you have a speech-to-speech, which kind of goes directly from speech and there's
speech response on the other side.
And I say speech to speech, is that the idea that it doesn't go through text as an encoding
in the intermediate say, oh, interesting.
Okay.
For performance reasons, for accuracy reasons?
You usually go for latency.
Okay.
It's faster to run a model that does not have to transcribe and then generate.
Exactly.
It's quicker, but on the flip side, you lose reliability.
Yes.
You look like all visibility into the parts of the pipeline.
And emotionality, we think you can deliver both on both sides extremely well, and maybe
you can make it more controllable too.
So today we are optimizing heavily on a cascaded approach.
I'm sorry, a cascaded approach is...
Is the speech or text?
Going through the text layer.
And as we work with like all of the businesses and enterprises, they will need that visibility
into what happens.
They will want to execute certain tasks on top of that.
They want a good visibility into each of the steps and great accuracy of all the models.
But beyond that, they can abstract away what's the LM layer, what's the intelligence layer,
the integrations are easier in that system.
So that's like where we are betting a lot of the research work of how you can make that great,
and we think we can make that great.
And speech-to-speech, as you think about maybe more of like a companion version of the applications,
that's where that will flourish because maybe the hallucinations aren't as important,
but the latency is a little bit more, and maybe hallucinations are even a feature.
And maybe in the future, future, just to finish that part,
you will have like some version of combination of the models.
That for like low complexity, easy models, you will have speech to speech.
And for like higher their complexity, you will have the cascade it.
Okay.
So I was going to ask about this.
Like, the other way there is research on how the invention of writing changed humans' brains
and just like change the neural pathways in ways beyond kind of the actual written language.
Do you observe that speech-to-speech models think differently?
than cascaded models.
Like, it sounds like they're dumber.
They are definitely dumber.
You need smaller model, you cannot.
But that's interesting, right?
That, like, forcing models to reason about text,
I mean, I know they just have much more in there as well,
but they're smarter.
Yeah, but it's like, you know, like, if you are going speech-to-speech,
usually, you will use smaller models, so it's still quick.
Yeah, yeah, yeah.
I see, so it's also just a model-size thing.
Yeah, yeah.
Okay, but are there interesting differences beyond, like,
correlates-like size?
What I can say,
it's like slightly different to your question.
The people interacting through voice
and the performance we see for like how they interact
with the business changes
just by nature of interacting with voice.
A good example, you can contact 11 labs
and register for your interest,
you go through the form,
and at the end of that, we supplemented that
instead of going through the form process,
you can speak with our agent and leave more details.
And what happened are two things.
One, people were actually much more keen
to leave the forms through speaking with the agent,
so we would go through the form a lot easier.
But second, there would be a lot more open-ended
in terms of what the use case are.
So they would start giving us information
about the wider set of use cases,
the complexity of the use case.
So the writing out was tedious and tricky.
This is like an open-ended adventure game.
You could ask follow-up questions, you can clarify.
But people were just more at ease
and could trust the system while doing that,
that it's working.
And that kind of helped us
And then free, which maybe is like more of a technological barrier,
it also works across all languages.
So now we have leads from like all parts of the world coming in
and leaving their details.
So we did that use case.
And now we have few different companies building their
ADR versions of that too to help
them capture the leads coming in from banks all the way to
actually one of the automotive companies that leaves that
where people are just more keen to speak through voice.
So I want to ask about this kind of a second order effect.
You have, you know, you talked in the past about how growing up in Poland, I guess the dubbing of TV shows, they were cheap and so they would only have one voice actor for a TV show. So no matter all the parts, male and female, they're like, I love you. I love you too. You know, there's like one voice actor doing all them. And now, you know, thanks to better voice models, you'll be able to just have like really good voices, AI generated for all the dubbing. Because again, it's not like it's taking jobs from great dubbing that was happening previously. It was like awful dubbing.
you know, happening in Poland previously.
So that's like one example of the second order effects.
What are the other second order effects you're seeing
of ubiquitous, good, text to speech, speech to text?
It seems like across a broad array of languages
because whatever in English,
just this didn't exist in Polish or Irish or, you know, pick your language.
One, like breaking down the language barrier,
you know, the kind of the inspiration came from the movie side,
but it also applies in any communication.
set up, like, could in the future, could I travel to another country and speak, speak Polish
or speak English?
And that language isn't being understood in the local native language.
Like from Hitchhiker's Guide to Galaxy, this version of the bubblefish, yeah.
Exactly.
That you can actually understand the world.
And voice, of course, will be an interaction layer, but similarly, all of us will have our own
kind of extension and voice agents that can help on our behalf.
And there is very clear and great examples of that of people that lost their voice and
can get it for the first time, for the first time.
We see that everywhere, whether that's people that lost due to ALS or fraud cancer that can get it back.
Just recently, there was an example of a patient that had a neuralink.
I worked with them to bring the voice that that person could speak with their own voice back to the,
back with the family around.
We worked with the lady that lost her voice before she got married.
And then finally, the technology became possible.
We were able to recreate that voice.
and for the first time she could replicate the marriage ceremony
and speak the vows together,
which was such a heartfelt moment,
probably like the most important from all the work that we do.
When you guys talk about voice agents,
is a voice agent just the idea that you have some long-running
or persistent agent that is going out and interacting with the world through voice?
And so customer service be one example of it.
you know, in the other direction, your claw going and making you a restaurant reservation
and actually calling up the restaurant. Is that kind of how I should think about voice agents?
That's right. It's exactly, whether it's like the reactive side of being able to interact with
the customer or the proactive to call it back. We recently had a very interesting one.
Topical because it was a Guinness-related one where there was a developer developing a Gindex
effectively. Oh, I saw that. They were calling all the pubs in Ireland, checking the price of a pint.
Yeah, you could like ask that or report information.
That was built.
The Gindex is built with 11 Labs technology.
It was built with 11 Labs too.
So like people could actually, could do both sides.
Could proactively reach out, reactively reach out, all was captured for voice.
And then kind of 3,000, 3,000 different entities could report their prices and get that across.
Have you, by the way, hooked up your open claw to 11 Labs?
Is the OpenClaw 11 Labs combo, something that's a lot of people.
at 11 are doing?
So, as you know, the Open Club will, like, kind of look for the most popular tools frequently
where it tries to cook up.
So 11 laps is one of the recommended ones.
It's the top option for voice.
Can you tell me a bit about the business of voice models where I think people have an intuition
around big LMs where there are these very expensive training runs.
And yes, they kind of appreciate it quickly, but there's so much usage that all of the models
trained to date have paid off their training runs.
and then some, and then there's this kind of ever larger
capex going into, I mean, a lot of it is inference these days,
but also training.
And so you have some intuitions from the LLM world.
I'm curious just how I should think about voice fair.
One, how expensive is training the voice models?
Is the expense in the researchers?
Is the expense in the training runs?
And I mean, the economics is presumably kind of simple,
It's just per usage, but yeah, to talk us through the business.
Yeah, definitely cheaper than the LM and image video models,
significantly smaller models.
Yeah.
Okay, so the models are smaller.
Smaller, smaller.
What's a parameter count for a leading edge voice model?
Few billion to low tens of billion per meter models.
Yeah.
So...
And for context, I think the...
I mean, kind of like, you know, CPUs moved away eventually from gigahertz as like
the metric as they moved to more cores.
I think we've mostly moved away from just raw parameter count,
but I think the leading edge of LEMs are in the hundreds of billions of parameters.
I think the leading ones, yes, but of course, you know,
you have the variations that you will use at lower scale.
So KAPEX is still pretty high.
We've, of course, raised recently a half a billion at 11 billion valuation.
Makes sense.
Makes sense.
To continue being able to build the best models in the world.
Researchers, you know, of course, you want the best people in the world.
I think we have those people working in audio and my co-founder who is who is leading that work
So that's that's definitely a big piece of like not financially, but even like how you keep them ambitious
about a deployment so you kind of continue building leading models helps you attract more talent and and building that
And then on the how we serve it's of course inference
It's correlated with how the models are used and and for us like we've seen incredible incredible growth
across the work. Mostly this is charged per if it's input text or text to speech, it's usually
per per text token, if it's voice agent or transcription and then it's per minute. And we see that
kind of being the bigger part, but usually like broadly it's per token basis. And of course as we
work with businesses, it's like an annual agreement. The bigger to spend, the bigger the comments,
the bigger the discount to get it across. The way we usually do is like when we have a new model,
we try to give it at cost to a lot of the customers.
so they can experience the best.
It's still usually, like, not as reliable.
The newest thing is often the most expensive,
whereas you make the newest thing
the most economically attractive one?
We try to make it attractive so the customers are,
like, you know, like, it's more expensive for us
than any previous generation.
We don't like, the quality is higher,
so we try to keep the prices still competitive to that.
I see, you subsidize it, but it's inherently more expensive.
Exactly.
Exactly, exactly.
And over time, we might do some tricks to optimize it,
but, like, we want the customers to, like,
experience, because of research, the big thing that we've seen is the reliability of the model
in the early days might not be there.
And then, two, people don't even know what's possible with that model.
So you kind of want the widest set of distribution so people can show the world what's
possible.
So you can have it, of course, as the distribution mechanism, learn yourself what to improve,
what to change, and then get it out there.
Are the voice models just getting bigger and bigger?
Like, will we have voice models in the hundreds of billions of parameters, or have we found,
like it seems like for certain types of model architecture, there's like an upper limit on, like,
the natural size.
Have we found that upper limit for voice models?
It feels like for specific use cases, like, say, audiobook narration, you probably found that
size.
You probably don't need to stretch it's too much bigger to make the quality as much higher.
But for certain use cases, that will probably grow.
The thing that's, you know, like I hesitated on the question is,
In a cascaded approach, you probably will not see dramatic size changes.
You inherently want the models to be quick and are reliable.
You want to orchestrate them in a smart way.
In a fused approach, probably that will get into like tens, hundreds billion-parameter models
because you kind of combine, of course, the LM side and the voice side.
So that will get bigger.
But on the just voice, I think it will keep being small.
Okay.
But there are certain domains where, yeah, we'll see bigger models.
That's interesting.
Yeah, yeah.
It is amazing how it does seem fun from a research point of view,
how there are still these various unsolved aspects
and how you guys are just making technical breakthroughs
and then releasing them down the product pipeline.
That's like a really fun stage of a company's lifecycle.
For sure.
It's like fun because it feels like we can do innovations on both sides.
There's like so much on research side, so much on product side.
And then like the kind of, you know, ultimately the biggest path
is how we deploy it to the customers,
where Lytentat, SMB, will have very different dynamic
than the enterprise.
It's not vendor-sass relationship
where you just give the product out there
for the biggest companies out there,
but you are more of a partner in their AI transformation part.
So you want the resources to work alongside them
to work on frequently, very new use cases
that were impossible to help create
and bring those voice agents to production.
So that's like a big, big shift.
But the biggest thing,
focus is how we bring the conversational agents out there
to the businesses around the world.
So when you say bring conversational agents
is the biggest priority.
Is this for customer service type use cases?
Like what are the most popular use cases
for conversational agents?
Yeah, like we want to be a partner for like full
interactions between businesses and their customers
or their audience.
I'm saying their audience because that will apply in support.
Support is the easiest one because that's where it's most
ready.
But like, and that's maybe the biggest.
difference to how we see ourselves to some of the other companies in the space is this can
also apply to sales. You can have the proactive side of reaching back. You can have AISDR versions
of that. And then you can have all the way to the marketing use cases where we are your partner
for working on even outside of like the conversational agent space of how you create a great marketing
campaign. Yes.
And so how does this break down between, you know, we had Dennis Trainor from Intercom on here, and they have Finn, their agent, and it's a thing in the website that you can go talk to.
And he described a very similar phenomenon that you described, which is you start maybe thinking, oh, this will help me answer customer support queries.
But it becomes like a generic UI for the website, where it's a box you can type in to go do things and understand things.
And so why wouldn't you read the docs in design your integration that way, you know, whatever?
And so will I have like one for text and then one for voice?
Will you guys do text to?
Will just how does that?
Because it seems like this is also succeeding at the text level with Finn and Sierra and all these things.
The places where we know we will be able to provide the biggest value is like where ultimately today will have either a big portion or most of their interactions coming for voice.
So if that kind of intersection is.
is there, that's where we can provide higher value.
And of course, like, if you need a text chatbot there,
that's like, if you fix the voice agent,
you'll have fixed text piece inherently as well.
But the place where we do optimize today
is going to be like, how do you select the right voice
for the right customer interaction, how you pull that
in the pretty complex case of what you mentioned earlier,
of like how you orchestrate that to pause or look
for something deeper into the docks,
how it can be extension of entirety
of the business, so not only in support, but across entire of a user journey.
But the bottom line is like, we want to be able to provide you across entirety of the interactions.
Voice is usually a big part of those interactions.
And yes, we need to solve the integrations, we need to solve the knowledge, we need to solve
text as part of that.
But like we wouldn't, for example, go into what I think will happen in a lot of those
cases, like very deeply into reasoning version of those use cases, where you maybe need to like
the multi-touch.
Yeah, yeah. And a lot of complex actions.
A lot of like financial analysis.
That would be not something we optimize for.
Can we talk about your revenue ramp?
You're just one of the fastest growing startups period of the past few years.
What's your most recently announced revenue figure?
Most recently announced was end of 2025.
Whatever number you want to give us.
So most recently announced was 350 at the end of 2025.
But the best proof of the technology working.
So recently we announced our work with Deutsche Telecom and T-Mobile with Revolut,
with Klarna, with Meta, with IBM,
a wide set of use cases.
And this quarter was kind of one of the best
for enterprise growth,
where we had the first quarter hit $100 million
in an additional ARR growth, which is crazy.
In net new ARR.
In net new ARR.
Okay, so if you're saying this quarter
was $100 million in the end of the year,
I'm no mathematician,
but it's up in the $450 million range.
And that's versus this time last year,
that's a several-fold increase.
Just what's working?
Like from the outside, I would assume that there's really strong cohort growth within accounts,
and then you seem to have self-serve and enterprise businesses that both contribute a lot.
I don't know how big self-serve is, but as a user, I like to be able to fiddle with 11 labs
and not have to go talk to sales.
But maybe you can just talk about what worked to reach 450 million plus of ARR so quickly.
Yeah, so exactly. So we are over 50% is now sales led to enterprise.
Yeah.
And you know, like, I think largely that the technology that, like, powers a lot of their agentic interactions just became reliable at the same time as high quality over last year, year and a half.
So, like, that's, that's, that's, you know, frequently, you know, you know this extremely well.
You will, you will start the account and then, and then, of course, it continues expanding.
And we see, there's definitely land and expand motion in the 11 laps.
we bring.
And what does that expand look like?
Is it like new departments?
Is it just the usage starts taking off?
When a customer expands.
Both, but usually the first part, too,
it's like, we try to make it very easy for our customers.
Maybe that kind of against ourselves,
where we give the technology a pretty attractive economics,
because we so much believe in the technology providing value.
So you can actually try it and test it.
And then within that one department.
And you think you'll make it up in usage, basically.
Exactly.
That the usage, the kind of commit continues increasing
because you know it's providing value.
And then it's so much easier to make that a choice.
And then, of course, cross-department pollination
is there too.
And it's like, you know, our work of digital comes
sort of marketing side.
So we did magenta work and Pockta's generation.
And then it kind of expanded to customer support.
And then it expanded to us working on an agent
across the entirety of the network so people can call in
and have the agent.
So you could see those step changes,
step changes across. But we are now 400, 470 people as a company. So we keep on growing.
But some of the things that stay consistent is small teams. So we have less than 10 people teams for
each of the product or research initiatives or even as you think about sharding some of our
go-to-market strategy. Those will be smaller teams, understanding the industry in depth,
understanding the market in depth and going independently and going quickly. So that definitely
contributed largely to that.
Two, especially on the biggest enterprises,
what we found works is,
and it's like we have the full spectrum,
self-serve, PLG motion,
that helps drive distribution,
drive, kind of awareness of 11 labs.
And on the completely other spectrum,
we have the high tide for deployed engineering
working side by side with the customers
to customize the entirety of their work together.
Why did you guys do self-serve?
Because I presume you have a lot of competitors,
where they have tech
and it's behind a contact sales forum
and you have to go talk to an SDR
and then talk to an AE, blah, blah, blah, blah.
And you guys just offer the tech available on this.
And I'm a huge believer in this.
I mean, a huge part of Stripe's growth
has been driven by the fact
that we just made Stripe available to anyone
and built a lot of product around that adoption pattern.
But so many companies seem to skip it.
So I'm curious how you guys came to...
So many reasons. So many reasons.
I think, you know, the quick ones
that come to mind is feedback
You have immediate understanding of how good your technology is.
Two, which is an extension of that.
We stand behind our tech.
We believe it is the best in the world for models, for voice agents, for deployment.
So we want people to experience that.
And I think you do that the same in Stripe, where the best version of the technology is available to everyone.
We're just so attractive to actually try it out.
We always try to make everything we built for the highest end-use cases, bring it back to the ecosystem free.
Frequently, the newest of the use cases, you know, for enterprise, you will need reliability,
you need compliance, you need the scale, which we deliver.
So frequently, as you develop new technology, it might not be ready for a lot of those
parameters, but it's definitely ready for developers and SMBs.
And we love what they are doing because they are showing us the future and effectively helping
us find a trajectory of where a loud-lapes should go.
I'm totally convinced on.
I'm just always amazed that more companies don't pursue it, where it feels like they're
really shooting themselves in the foot, by not.
Like, did you guys self-serve on Stripe?
Or did you...
We self-serve on Stripe?
Yeah, for example, you know, 11 is a huge company.
And yet, you started on Stripe on a self-send business.
You kind of, like, initially, and it's like, you know, we were two of us at the beginning.
You try to see what's working in the industry, but you try to think from first principles.
So you want to try it out.
You want to understand how it works.
So the more friction elements before you're trying it out, the less you trust whether
it's available, whether there will be additional payment that's hidden behind, some
of those steps, so you don't want to go through them.
So it's so much.
Speaking of Stripe, do you have any stripe feedback for us?
Anything you want us to fix?
My most common feedback until recently is, like,
why don't you give us pay us, you go user-based billing type version?
But one of our finance needs, Machek, I know I was speaking with your team,
and that was day before.
Yeah, yeah.
He was great.
He was like thinking about it for a long time.
He's great.
He said, like, you guys should buy metronome.
You should buy a metronome.
And then the next day, metronome acquisition.
was announced. So now you have it. So that's, that is my most common feedback and we'll be launching.
Oh, that's a good announcement for this, for this, for this podcast. We'll be launching
user-based billing to everyone.
Oh, sorry, I'm shocked you. Oh, as in previously. Pay as you go. Pay as you go. Okay. Previously,
previously you had it on an enterprise basis, but everything on the self-serve basis was like plans.
So we had the subscriptions, yeah. Subscription plans, you can go over them. Yeah. But now we are
launching a full pay-as-you-go experience. So you can just try out voice engine, which is
effectively this all orchestration loop all the way through to any of the models directly.
Going back to self-serve, I think a new thing in AI is that all self-served products
should have pay-as-you-go as an option.
Maybe you want to have like a subscription with some unlimited to yours, but I don't know if you had
the experience of like you're using Claude and like you're typing away your queries
and eventually you hit some rate limit and it's like, sorry, you've hit your usage limit
and you want to be able to do the thing that you can do a Cloud Code, which is just pay per
API, it's like, I'll pay for it. And it's kind of very funny as a consumer to not have the
option to pay more, to use the product more. And so, yeah, I think every AI product will need,
you know, they probably want to have some all you can need, most of what you can need
subscription with limits. And then the ability to pay for over. So it sounds like that's what you're
Yeah, exactly. That's what, that's what we're doing. The only thing I want to ask you about is,
I feel like all CEOs of larger companies today are trying to figure out how,
do all these AI advancements change the nature of the organization and how do you
redesign your organization a bit around all this new intelligence and so
that could be about what the scaling factor is of like the number of people you need to do the work
but it also should be like do you need more senior people because they're better able to direct the AIs
and the AIs or maybe you can do the work of what previously would have been junior people do you need more
junior people because they're going to be more AI native in how they work do you want
smaller teams, do you want bigger teams? How do you actually go to the process engineering of,
your finance team should be using Claude extensively? But like, finance teams do not historically,
you know, have a lot of home-built software. And so there's all these questions that are floating
around. And you have very rapidly built a much more AI-native company. And so I'm curious
what lessons we should all be learning from 11 Labs as a large business recently built. And so without
the baggage of decades of, you know, how we've always done it.
Yeah. Yeah, we started between two, we're just like a year when the two topics of the day
were crypto and metaverse. So just before and then I, of course, AI flow started.
Exactly, exactly. But we could like have the privilege of like, kind of scaling through the
world when it was all happening. For us, what works. And we like really believe in that being
the big part of the future. The first is small teams, like keeping the teams small and super flat.
So like, can you have, both me and my co-founder
will have over 15 direct reports each that we'll work with.
And most of those people will have that same scale
of direct reports.
Okay, so your span of control is way larger
in the traditional company.
Normal would be eight.
You have double that.
And obviously, that's an exponential.
Exactly.
And of course, you know, there are some teams
which in the short term might not do that.
But ultimately, that's where we think is going to be headed.
It's like roughly 10 team size within each of those work items.
And startups, no offense.
But like startups often have pretty wacko management ideas.
Like there was a funny tweet, Lord Grant me the confidence of a, you know,
early stage startup founder blogging about their management theories.
But like, you think this is not a startup effect.
This is an AI effect where basically...
No, it's definitely a little bit of startup effect too.
It's a...
I think it out, it's like...
Hindsight, hindsight benefit.
I'm canceling our stripe changes.
Yeah.
No, no, it's like...
I need to pre-end it.
I'm like, you know, it's a...
The hindsight of this may be working.
We'll see in next five to 10 years.
So much flatter org.
Much flatter org.
So it works for us.
It might not work for all the companies.
And there are some parts where, like, go to market.
We still are trying to figure out what's the best way.
But smaller teams, flatter org.
And I think there are two paradigms, but like generally people being more technical.
Or if not technical, even in non-technical teams, having a technical resource.
So, you know, we will have a person in ops or in talent that will, we have effectively a tech lead for that team.
Yes.
That helps them automate a lot of that.
that work and helps up level the rest of the team too.
Yes. So there are kind of two parts that are helping. Okay, so talking through this in
talent or something like that, is it that you are building your own software for other
companies might have bought software like a workday or a greenhouse or something?
Is it that they are using the existing software you have better? Is the process
that would be spreadsheets in a traditional company are built with software? How do
you kind of use the software in these sorts of organizations?
Yeah, like sometimes, but we still use a lot of like the traditional vendors.
Like one, pattern is, of course, elimifying everything, like making the data explorable for you to be able to interact with it.
Of like who's in the pipeline, what worked, who does the best references, like all of that, all of that works.
So you can double down on that.
But two, it's frequently things that you manually do that a lot of the car, like there's a gap between where the agents are today versus what you could do if you have the technical skill set.
And a good example, it's like, how do you scrape all the right profiles to be able to reach out to the right candidates?
So you're like, analyze whether it's, you know, how much I should want to say, but, but the, like, try to detect specific things that we know worked.
So you'll bring that across to the, to the people.
On go-to-market side, like, there's just so many things you can do with additional amplifiers.
It goes from understanding what case studies are relevant and creating a good pre-read for you before you go to the meeting, through creating the AISDR experience that we spoke about, to creating an entire deck experience.
So you have a pre-populated deck with the right numbers that is customized to that customer, which you want still the person to go through and develop, but ultimately is in there.
So there's plenty of those additional things that you know will amplify the work of the people around.
potentially replace some of those easier tasks that are done.
And then there's like, you know, we wanted for people to explore the culture at 11 Labs.
We created a voice agent that people can speak with and see what's the culture, but also get
prepped for the interviews.
I think across many of those teams, like additional benefit of what they can do.
Interesting piece.
So, of course, in Ukraine, with ongoing work, they need to rethink a lot of how their development,
their systems, their support workforce for the citizens across the country.
And people are in the war zone.
They don't have the same access to the information.
They cannot rely on the same phone lines.
They cannot rely on the same physical services around the country.
So they've developed effectively a central...
Is your employees in the Ukraine?
We had a few, but they reached out because they were developing their central map called Dia.
They developed it over the years, but now before they were double-downing of how this can
doubling down on how this can be a way of supporting the citizens.
And of course, there's a easy part of how you create a first agenda in the government where you have a help with the benefits and what's happening on the front line or education.
So that's delivered to everyone or healthcare so you can book your checkup or appointment.
So like how you create all of that.
And of course, we travel to Kiev.
We worked with them on bringing that and making that available for voice so everybody can access it.
But the thing we've learned while being there was that model of what we speak about where you have technical resources in each of the teams.
they actually have the same in every of the ministries.
So every ministry had technical resources
working on creating that agentic version of their work.
And then it was like a central digital transformation team
that would like assemble this all together
to deliver that for the central citizen support,
which I thought was brilliant.
That's very straightforward by Ukraine.
So take forward.
Like the most advanced set of work we've seen.
So we got a little bit validated like,
okay, maybe technical resources in each of the teams
is a good idea.
And that works heavily for us.
And, you know,
you mentioned some of the other parts, like, do you hire the senior or younger?
The main thing we try to filter for, of course, the culture piece is so important.
You can scale people, but scaling culture is much harder.
So, like, you want to optimize for that being right.
And in our case, it's first principles, taking ownership, striving for excellence, but staying humble.
And the main thing that's kind of in that ownership part that I think works well for the AI world is agency.
Like if you have that agency to explore, regardless of where you are in the experience,
cycle, it's going to be a tremendous samplifier to your work.
My biggest takeaway from all this has been that around agency, where I feel like high
agency people are the winners of the advances in AI and within organizations, low agency people will
lose out.
Yeah, completely agree.
Probably the most proud thing that Piad and I are as we scale the 11 laps, the people that are
at 11 laps, it's been like just the culture and seeing the expansion of the culture,
where culture builds the company now rather than any single person or any single product
builds the company.
That was probably the biggest validation and happiness.
And there is kind of the other angle of that where I think people are like striving
to be incredible in their craft and their work, but at the same time have fun and a lot of
their work and that kind of combination of agency and just enjoying what you do is probably the
the best thing we've been able to do today at 11 laps.
Well, it sounds like a really fun stage, like we were saying.
Interesting research breakthroughs, really fast-growing business.
So I'm sure you're enjoying it.
Andy, thank you.
John, thank you so much.
