a16z Podcast - ElevenLabs CEO: Why Voice is the Next AI Interface

Starting point is 00:00:00 We don't want to become same as previous generation of the editing suites. So instead, let's solve it on the research level where it will know based on the voice exactly how it should speak with the speed. To be able to cut it to all those different use cases, you need such a big array of different voices, different languages, different accents, different styles. So we launched the voice marketplace where you could create your voice and then share it. And when the voice is shared, you earn money in the return. Today we have almost 10,000 voices.

Starting point is 00:00:27 We paid $10 million back to the people in the country. community. There's some crazy stories from the voices. Just speaking through exactly the technology showing the examples of the global race for technological and economic reaction. Today, we'll hear from David Sacks, Mark Andresen, and Ben Horowitz, voices rapidly becoming the next day ahead. To discuss the Trump administration's today we'll hear from Maddie Stanishefskev policy, co-founder and CEO of Elevation and How the U.S. can lead on energy, chips, and fully licensed AI using to real-time voice regulation, as well as how small autonomous teams and global

Starting point is 00:01:00 hiring power the company's product velocity. We discussed the ethics of AI audio, the voice marketplace paying creators over 10 million, and the shift from creator brand to enterprise platform. Plus, why speed is the moat in the race to define the future of sound. Let's get into it. I'm excited to welcome our first speaker, Madi, co-founder and CEO of B-LAMOS. All right, so good to have you here, Maddie.

Starting point is 00:01:36 Thanks so much for having me here. Great to see everyone and good morning. That was the Walcott Music generated by 11 Labs, was it? It was. We expand continuously across the audio space. So we started with voices, then created an orchestration of how to build voice agents. And now also create a fully licensed music model. So can produce amazing music to go alongside with it.

Starting point is 00:01:56 Awesome. Well, talk about that. I've had the opportunity and also the luck to get to know from the very early days when 11 Labs got started and get to partner of the last three years to just see your execution everywhere from product launches to shipping new lines and models like you just mentioned, everything from text to speech models, speech to text, and then we started doing music, sound effects, and now the AI agent platform. I'm very curious.

Starting point is 00:02:25 First, I'm still in all of the shipping speed after, all the three years. But I want to ask how do you actually maintain both the speed and quality when you have such expensive product roadmap? So first of all, we partnered almost three years ago. And so it's great to hear all the kind notes. But also, they didn't realize when we partnered,

Starting point is 00:02:45 the infrastructure team was free people. And of course, now I'm 11 loves founder. We love number 11. And the company, Infra team is 11 people. So we've seen the growth of the other side as well. And I hear that the company. he's here, raised $66 billion in the total fundraising, so the number 11 is everywhere here. But I think the start of, I think first piece, I think the smartest person I got to know as

Starting point is 00:03:08 my co-finder, Piotch, who has been the research brain for creating a lot of the models and then being able to assemble what we think are the most incredible researchers in the voice space to really create the first text-to-speech model that could understand the context in a better way and turn that into the emotion, intonation, then find a way to capture the characteristics of the voice So you have the voice sound with the right style, with the right age, with the right gender, dialect, everything in one. And then the researchers across, of course, now expanded that to speech, to text, music and other work. So that's our foundation. And then the way we structure it to be able to ship quickly, especially with so many things happening in AI space, is a lot of small teams.

Starting point is 00:03:47 So today we have roughly 20 product teams, each of 5 to 10 people's size, which full independence can go ahead and ship products. Of course, that carries some of the sometimes issues of duplicative work or sometimes people going at different speeds, but at the positive end, the ownership of each of the teams is extremely high, so people know that this is down to them to really deliver and ship, and it allows us to move extremely quickly. We back at our work into creative space, so creative platform where we help with narrations,

Starting point is 00:04:18 voiceovers, dubs for creatives and creatives in the media entertainment space, and then on the agent's side where we help people recreate voice agent experience, conversational agent experience, across customer experience, all the way through to immersive media. Great. 11 Labs has labs in the name, very similar to many of the other big labs, which means you're doing your first-party R&D and model development, but also building all these 20 products. How do you think about balancing both, like keep progressing on the model research, but at the same time not delaying sort of the product launches?

Starting point is 00:04:50 It's very tricky. I'm sure many of you have the same thing. Do you build the product when you don't know if the research innovation will displace the product you just built? We had this in the early days too. So one of the simple examples was we had a model at work and one of the most common requests was could we do at different speeds for voices?

Starting point is 00:05:08 So could you have additional slider to modify the speed of how audio gets generated and how quickly it speaks? And we are very against this idea of no, we don't want to do any sliders, any toggles. We don't want to become same as previous generation of the editing suite. So instead, let's solve it on the research level where it will know based on the voice

Starting point is 00:05:27 exactly how it should speak with the speed. And we resisted this for, I think, good amount of nine months and we couldn't solve it on the research side. And then the product was super simple solve that got all the users across. And now the approach we take and looking at this is if we think

Starting point is 00:05:41 the research work will take more than three months, then the product is, can do anything they want to start adding other models, adding some of the extensions. Of course, sometimes the timeline is tricky to predict, but roughly the guidance we have from our internal research team,

Starting point is 00:05:55 what are the initiatives we hope to ship this quarter, what are long-term initiatives? And then for anything long-term, you can use any other work to close that gap and make it better. I guess first you kind of have to figure out if the research commitment is going to meet the timeline first and then go on to align with the product teams. That makes a lot of sense.

Starting point is 00:06:16 As everyone is moving to San Francisco and building in person and locked in, in the same space. 11 has always been building globally and having people more distributed. But you're not having centers, I guess, in different locations from London, Warsaw, San Francisco, to New York, and other places. How do you think about building this global expansion

Starting point is 00:06:37 and finding talent globally versus, I guess, the trade-offs of building in the same place? Yeah, so me and Michael Founder Polish. We started between Warsaw and London at the time. And I think 11 labs wouldn't have existed if we weren't starting from Europe. It's a very peculiar thing, but in Poland, if you watch a movie in Polish language,

Starting point is 00:06:56 like a foreign movie in Polish language, all the voices, whether that's a male voice or a female voice, get narrated with one single character. No emotions, no intonation. As you can imagine, it's pretty terrible, and it's still happening today for most of the content out there. I've had a similar experience growing up in China that we have a lot of Western movies dubbed in Chinese monotone.

Starting point is 00:07:17 So bad. So bad. And it's like, in Poland, of course, post-communist country, it's a cheaper way to do it. You don't have to hire as many people. You have one monoton audiobook reading of a movie. And that was kind of where the company started. And we started initially in Europe. And we realized that if we wanted the best people to solve what was a research problem at the time, we need to hire wherever they are. And we couldn't lock ourselves to just San Francisco or look at the West Coast. We knew that we need to find them across Europe, across Asia, and bring them into the company. So he started fully remote.

Starting point is 00:07:51 and started looking at those people. And then on engineering, we also were very against this traditional hiring method of looking at LinkedIn, looking at traditional background, and trying to figure out, could we go and figure out a different method to hire people. That led to some very interesting hires. So we hired a person that had incredible open-source text-to-speech model and was working in the call center at the same time as a recipient of the calls to make money. And he's now the team, one of the most brilliant researchers we have

Starting point is 00:08:19 doing all the data processing. But the same pattern kind of followed. And of course, the early team was very distributed. And then as we started scaling, so beyond 30 people, we realized that the new people joining, there's benefit of them having a space to be next to others, to get deeper into the culture, understand what are all the products that are happening in the companies. We started the haps where you can go into London and Warsaw and San Francisco, where you can work with others in person. And that's how we try to marry those two. If you are early in your career. We try to hire you in the hub so you can immerse yourself in the company. If you are used to remote work completely fine. But then if you want, you can always come and join us in the

Starting point is 00:08:56 hub. And that worked really well. Currently, we continue hiring very untraditional backgrounds in some of the place of the company. And then fusing that with very traditional backgrounds, which can teach the others. And sales, for example, we've done some of those experiments too, where that combination worked really well. The lesson is you can really find talent everywhere. It just how hard and how you look for them. And I think in Europe also, this was an interesting one. In the U.S., people are very keen and excited to work. And if you go for any social event, like you want to talk about work.

Starting point is 00:09:28 And in Europe, I didn't have this feeling where it's like most people don't want to do that. It's like the cultural piece is different. But then you do have the pockets of people that actually strive at too. They just don't have the companies where they could do that in. So I feel like our team from Europe is the most motivated and passionate set of people that we are lucky to have. Yeah, I can attest to that. given I've met some of them, very hardcore, very good work ethic for sure. And you have also maintained a pretty flat org structure

Starting point is 00:09:54 and have people own quite laterally a lot of responsibilities. Can you talk about the rationale behind that? And I guess there was also no title policy. Yeah, so we removed titles a year ago, and it's going well. It still works. And I do think that, but I thought a lot of AI companies kind of do it too already with a member of technical staff being like the usual piece you have for engineering and then in a lot of the go-to-market,

Starting point is 00:10:17 you are just-go-to-market, not VP of sales or other roles. I think it's actually a pretty common pattern. But in our case, we had a small team approach where you have extremely small amount of people, usually the 5 to 10. And we wanted to make it very clear that every team, we create those teams.

Starting point is 00:10:38 You have six months to prove it. If it's proven, the team will stay and continue working. But it really is that the moment you join, you can have any impact on the company. So you can have any role in that team. The tenure will not define your position in the hierarchy. If you are smart and quick and passionate, you can elevate yourself very quickly,

Starting point is 00:10:56 which this really helped. And also it's a common layer to the external world where everybody looking at 11 laps knows that we are, the go-to-market team is go-to-market team. There's no positioning to the same extent. What this allows us to do is I think when we speak with a lot of our partners, with all of our customers.

Starting point is 00:11:17 They also know that they are getting the best people always, and we can also send people to different conferences, different events, regardless of that positioning. I think the tricky thing in the flat structure is not only positives. The way we currently have, it's a set of leads effectively for the subdivisions,

Starting point is 00:11:37 so the research, creative work, agents work, go-to-market, self-serve and sales-led, and of course, ops, only that's the layer of leads, and then under that, there's pretty flat, small team approach across the world, but then you really want the leads to be able to carry the complexity around the team, so suggest things between one team to another, if they see that there's something valuable

Starting point is 00:12:01 between them happening, so I think picking those people that can context switch between is super important, and then letting the team fully focus on that, and then having, which is, which was interesting learning, where if you put a person into all the Slack channels and give them transparency, they actually get frequently distracted because then they read all the messages. You can still choose not to read them, but they still do. So you kind of need to cut the access to a lot of those pieces to force the attention. And that kind of works. All those

Starting point is 00:12:32 small things work really well. Maybe we can borrow some of that lesson too. Let's switch in gear a little bit. You're on the front line seeing a lot of the creative work whether it's from art, music, or advertising that are starting to adopt AI tools. And in the beginning, that was not the case. There was a lot of resistance. And now we're just seeing the adaptation and the welcoming of using more of the generative AI tools,

Starting point is 00:12:59 including AI audio. And you have done some really smart things from the marketplace payouts to working with these creative industries since day one, actually. I remember how much you stress like we have to find a way to work with them. and sort of observing sort of market shift over time.

Starting point is 00:13:18 So the question is, how do you actually adapt to these changes and find the ways to work with the industry in the infancy in the beginning? And how did you navigate some of the challenges in that? So I think the first piece is actually spending time with the industry and trying to understand what are their priorities, their incentives. Of course, it's sometimes tricky. Sometimes you then end up being star-struck. We had an honor and pleasure to work with Jarrett

Starting point is 00:13:48 on some of his incredible work and learn from him on what is important and which parts of the production process you can actually use AI, which ones you want to keep, where is it actually helpful. So I think that's the super important thesis across all the partnerships in the space.

Starting point is 00:14:08 In our case, we try to figure out how to do that on the voice space, which is, of course, with that technology, A, how will the voice acting space look like in the future? And then, too, of course, to be able to cut it to all those different use cases, you need such a big array of different voices, different languages, different accents, different styles. So we launched a voice marketplace where you could create your voice and then share it. And when the voice base shared, you earn money in the return. Today we have almost 10,000 voices. We paid $10 million back to the people in the community.

Starting point is 00:14:44 There are some crazy stories from the voices. One of our first voices will say a deep Spanish voice. And the magic of the technology is that the same voice now is available on all different languages in the same way. So it's 30 different languages at the time. Now it's 70. But 30 languages at the time. And we had the Spanish voice join us.

Starting point is 00:15:02 And it wasn't picking up on the Spain. Nobody really liked it as much. And then it picked up in an English-speaking country, that same voice because of that deepness. And now it's our top free voice for all the use cases. So hidden messages, you can all register to our voice marketplace and maybe earn some money too. So that's the, I think the second important thing,

Starting point is 00:15:23 it's like figuring out how we can be part, how we can bring the industry together to disrupt together rather than just the disrupt. And with labels, I think I'm still learning how to interact. So we worked with labels, the Maryland and Cobalt so fourth majors to bring their music into the music model

Starting point is 00:15:42 so we can do it in a licensed way so you can generate that and give commercial rights so you are fully protected. Not as a hard process. It took us 18 months to figure out the agreement that works and in the end I think the main thing

Starting point is 00:15:55 was adding sort of forcing functions or forcing timings to find effectively a trigger of like, okay, this is when we do. it and we either do it together or we do it separately. And those forcing functions really help add urgency. Then we need to move that forcing function a few times, but it still worked to a large extent to go after that. And then two is, of course, you know, finding the compromise wasn't,

Starting point is 00:16:25 wasn't, wasn't, wasn't easy. But then in our case, working with the, with the, with the, with the labels there was kind of protecting what they are caring about. And they, of course, also care about how they continue doing well by their members, by their artists that they work with. So we would spend a lot of time working with their members speaking about how we think about technology, what's going to happen in the next couple of years, and that really helped.

Starting point is 00:16:51 So just speaking through exactly the technology, showing the examples and kind of avoiding this initial knee-jerk reaction that AI is bad has been tremendous. And maybe tying back to the earlier question as you are navigating like this landscape, how do you think about bringing the right talent that can head and lead some of these functions and these are mostly unknown territories

Starting point is 00:17:16 of how to navigate it? Like where have you been seeing success in bringing the right people? So here for the spaces that are kind of completely new to us and like legalists and another example, we would always kind of bring at least one or two people that were in that space that kind of have interacted with the same parties

Starting point is 00:17:36 full time in the past, but then would actually adjust that with a lot of consulting people that would help us in a specific conversation. So in this case, in music, we had music lawyers that worked very closely with us that consult across a few of them. And the good thing is that they know all the players, and they effectively were this bridging gap between both of us, so we could speak the same language. And then that was really helpful. Yeah, and you have had a very specific taste for people that are risk tolerant enough and also understand the commercial business opportunities to, you know, help guide the right chain of actions in each of those domains. I found that very fascinating.

Starting point is 00:18:29 100%. I mean, legal, I don't know how many of you are trying to find a first legal counsel or have a number of those. for us this was I think one of the trickest roles to hire for because you are hiring into the space you don't know you know very little about and then and then we had the first couple of legal people that that were clearly not fed so we separated us then we hired a third person and that person came from like a number of fortune 500 companies and they never worked in startup space never worked in venture and what resulted is like everything every conversation was pointing out the risks that we see so like anything we wanted to do was like the number of risks that this

Starting point is 00:19:12 could carry and it was really tricky to work because we it's like you kind of get risks but you've gone the risk advice of like okay and this is where we should draw the line but everything was back the decision and now we hired a person working previously in a number of companies as a council and don't poach them they are and they understand the risk equation a lot better where they are not only like a counterpart to figuring out what the risks are but also like okay this is what other companies do this is what we should potentially do and then they are like a true fault partner and a tremendous change for sure 11 labs started as more of a creator brand everywhere from the individual creators to the

Starting point is 00:20:00 the creators that are building businesses. But now you have been having a lot of success moving into enterprise, not just started from the AI agent platform, but even with the tech to speech, speech to text models. How have you been navigating that transition? Because that's one of the very commonplace

Starting point is 00:20:17 where a lot of really great consumer, creator brands fell down, but you have had so far a pretty smooth transition. So when we launched, we had a lot of early inbound when we started the classic PLG, a lot of inbound from Enterprise. And I remember speaking with A60Z team

Starting point is 00:20:37 when they joined us, where our initial take was, of course, we want to be engineering company, we don't want salespeople, we would like to reinvent that and have like engineers do the sales. We did hire one traditional salesperson and one non-traditional salesperson,

Starting point is 00:20:53 like an engineer. We told them, like, do sales now. And that really, as you can imagine, didn't work out in the specific case. but we learned our lesson, and we now do invest in a combination of that. It's 80% sales, 20% engineering, so still a little bit of that. But it was a super important lever of understanding who are the customers, what they care about, and working deeply with them to bring it back.

Starting point is 00:21:18 And then that kind of working with them was kind of opening of what we need to actually do on the product and research side. Munjal from Hippocratic is here. he was one of the earliest incredible use cases in the healthcare space where they would create effectively agents that would take inbound calls that are calling the hospitals to take and schedule appointments and beyond that they would do all the other parts of outboding to the patients to remind them about taking medicine or reminding them out the appointment that's happening

Starting point is 00:21:49 and to be able to do that that suddenly shifts from using a one foundational model into combining the speech to text, the LM, the text to speech to orchestrate them together, then the integrations you need to build, then you actually need to deploy. And they were one of the areas that was 2023, but then we've seen this repeated pattern across a number of other customers and customer experience base and many others. And we decided to invest more into helping with the entire orchestration. So instead of just doing text to speech, we can help combining our research to make this whole whole combination of that's more

Starting point is 00:22:27 fluid. But then if you are a thing about enterprise, you do need to build the combination of knowledge base inside a system. You need to help deploy that with telephony providers, whether it's Twilio, it's zip tracking. How do you do that in a templatize an easier way? And then, of course, the biggest gap that's

Starting point is 00:22:43 the most common, it's easy to do a demo, but how do you actually build it to production? How do you test, how your version control, how you evaluate, monitor over time, fine tune over time based on the results. And all of that has been a big part. And underlying all of that

Starting point is 00:22:59 and we spoke a little bit with Matt before coming here, the foundation needs to be there, which is the security, the compliance, serving the customers across that will rely on that infrastructure. That's something that we want to shine through at 11 laps where if you are using the software

Starting point is 00:23:15 it's going to always be reliable and always the 4-9th or 5-9th, hopefully one day will be there which is tricky in the AI space. that the that's the that's a goal of course the the difference between the one obvious difference between PLG and sales is the the cycle to work through and identify the right customers is much longer and and I think that's where eagerness from our internal team was was was interesting to observe where you had a lot of people that didn't work in an enterprise setting

Starting point is 00:23:48 and then you had other side of the company that did and the side that didn't was very skeptic about going enterprise and like waiting the six months or 12 months to results. And in early days, we needed to shield them from that information and like, trust us, we'll do this and it will work. But they were very skeptic. And then, of course, after 12 months, it worked out. But that was probably the hardest culturally of how you kind of still keep everyone jumping on the same train. That's exactly right. A lot of companies actually, at least I observed sort of slowed down after start adopting more of the enterprise sort of product launching and like building for the customer's request that started to, thank you so much, to delay sort

Starting point is 00:24:35 of the product launches. Is that something you're seeing or is there still like a good balance of like we still want to be able to put out demos and POCs and early teasers quickly but at the same time we'll get to you know deliver a very robust and reliable product? So there are two parts. The first part is, so we have a difference on the team structure and then we have a difference on the external product structure. On the external product structure,

Starting point is 00:25:02 we want to ship very quickly. But of course, if you are shipping to enterprise, you want to make sure that it's stable and reliable. So we delineate very clearly what's alpha, what's not alpha. And then we go through that transition through that period. And then as we work with the customers, they can, and our partners, they can decide whether they want the access

Starting point is 00:25:20 to alpha in the first place, and when they do, that's clearly shown that this is an alpha product that might not be as stable. And so they get a choice. And I think that choice has been the most important lever, like, do you want it or not? And some are incredible on doing that innovation and showing some of the work or experimenting with that work. Deutsche Telecom with John here is creating some of the incredible new podcast experiences. And that came from like testing early models of turning a text into a more notebook L.M style of a podcast with incredible voices that you can select for German speaking voices, English speaking voices that sound good. And then there's a second which is team structure piece. And that's something that we didn't

Starting point is 00:26:08 do until later when we had more than 100 of us is that we delineate inside a company products that are pre-product market fit and post-product market fit. On the post-products market fit. You are working for the long term. You test and evaluate a lot before. You only deploy when that's truly ready. The pre-product market fit, your mission is to ship until you think we've hit the product market fit. And usually we give the six months period of like proving it out. If not, we kill the product and we've killed product in the past. This way, but that's like the main important piece of like, okay, until we know there's a big potential user base, we will continue iterating. I have been able to observe some of those, I guess, hard decisions in the moment,

Starting point is 00:26:54 but it's the right decision later on to let go some of the products. This is all my favorite questions. My partner, Martin Costello, always say companies go through three phases. There's the product phase, there's sales phase, and there's a scaling phase. And given you have been through some of those phases, what has been the hardest transition for you as a CEO? there is a lot of many ones of course i have my my co-founder next to me across each of those which is the i know him for 15 years he's my best friend since high school so i have like the the most luck to have that combination of course you're jennifer and all the all the partners to help us through those transitions which is which has been incredible um but i think

Starting point is 00:27:35 the the the recent or like a recent realization was when we we are now 350 people company and And of course, that means our go-to-market team and the incentive structure around that has evolved pretty strongly. And what wasn't clear to me, and now in hindsight, it's obvious, is that in early days, everybody would just operate on a passion basis.

Starting point is 00:27:58 They would just operate what they think is best for the company. As our go-to-market team enlarged, we realized that the incentive structure really matters if you are building that machine. And that transition where you shift from from a lot of the people that are helping create that machine are part of that machine, those incentive structures will eventually drive the behaviors,

Starting point is 00:28:22 which might be slightly different to what you had in mind if you don't make it extremely clear. And in some ways, the quota, the commissions are effectively a lagging indicator of strategy. And then strategy is kind of leading of what will happen in the future. So you need to find a way to resolve those two together where you want to make sure the quad-end commissions and the strategy that you want to drive are closer together and the kind of the disparity as close as possible.

Starting point is 00:28:53 And so here, for me, the biggest realization was that we are becoming a bigger company because there are clear behaviors that happen based on the commissions. And then two, to actually resolve those, we need to be very upfront in terms of making it explicit that sometimes even if commissioners, such as this and you think it's the wrong thing, come back to us, let's speak about it and let us score.

Starting point is 00:29:16 So now we are explicit with all our sales teams that if they are seeing a deal, that let's say might be competitive in nature, and our pricing table would suggest that they can go very low and earn higher commission, but they think it's wrong, it's better to come to us. We are happy to still grant commission, but kill the deal and go out for us.

Starting point is 00:29:34 We had this case recently where one of our foundational level competitor came to us, wanting to license our models for demos, and of course, the incentive would suggest that you should sell to them, but luckily, luckily we didn't. Yeah. You granted commission, though. In early days, you can definitely allow. I'm not just that now it's in the policy, so you cannot sell to the foundational model companies.

Starting point is 00:29:59 So it's clear, clear to all the internally. That was incredible, Maddie. Thank you so much for sharing all the lessons and learnings with us. Let's give a round of applause to Maddie. Thank you. As a reminder, the content here is for informational purposes only. Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security

Starting point is 00:30:23 and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures. Thank you.

a16z Podcast - ElevenLabs CEO: Why Voice is the Next AI Interface

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.