Silicon Valley Girl: AI, Tech and Career Growth - $6.6B AI CEO: How to Make Your First $10,000 with AI | ElevenLabs CEO & Co-Founder, Mati Staniszewski

Starting point is 00:00:00 Study and play. Come together on a Windows 11 PC. And for a limited time, college students get the best of both worlds. Get the Unreal College deal, everything you need to study and play with select Windows 11 PCs. Eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate

Starting point is 00:00:20 with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer. While supplies last, ends June 30th, turns at AKA.m.m.S. slash college PC. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals because we're built for what you're building.

Starting point is 00:00:42 Fit for your ambition for Citizens Bank. We paid about $5 million to the entire community. Meet Marty, CEO and co-founder of 11 Labs, a company that has grown into a $6.6 billion leader in the voice AI space, shaping how we talk, work, and even earn money. They've created an entire voice marketplace.

Starting point is 00:01:04 Now, anyone can clone their voice and earn passive income. Can you name some opportunities that you see that can make people a amount of money so they can make a living like 10k a month, something that's immediate. Business and you just want to make good money, I would try to take those voice agents and go out to, let's say, local doctor's office. And 11 labs built the world's most realistic voice deck. The question is, can they control what happens next? Most of those companies just don't know this is possible.

Starting point is 00:01:29 You don't have to be the coder. You just need to... If my voice is authorized to use my credit card to buy anything and then somebody just uses the resemblance of it. I think it's going to happen. Hey guys, welcome to Silicon Valley Girl. We have one of the guests today whose product I've been using for a while now. So I'm going to ask a little technical questions as well.

Starting point is 00:01:50 But please welcome Marty from 11 Labs. Thank you so much. Thank you so much, Marina. Great to see you again. And thanks for having me. Yeah, thank you. I feel like you're one of the pioneers of this AI industry because when I ask people like what apps they're using, when I'm talking about apps that I'm using, I always mention 11 laps because it's been a lifesaver. I wanted to start with a question about the role of voice in AI. So what it feels to me is that in 2023, you know, we started adopting chat GPT was old text. And then these voice capabilities became more and more powerful. It understands what I'm saying.

Starting point is 00:02:23 Now I understand my accent. If I mispronounce, something it still gets me. Do you feel like we're moving into the era where voice is our main tool to interact with AI? I mean, 100%. I do think that voice will be one of the key interfaces to the technology around us. And that shift is happening, like you said, it's a few years back. You wouldn't even dream of this being possible. And now I think it's becoming a reality where it allows you to transfer so much information more than the text.

Starting point is 00:02:52 You can get the emotionality, the inflection pattern, the imperfection. reflections reflected in the voice, which of course makes it easier for the, if it's an input, for the technology to understand a lot more about the setup or what you are trying to achieve. And then if you hear it back as well, I think it's a lot better and more pleasurable experience as well. How do you see voice transforming businesses? Do you have any cases where people are using voice to generate leads or convert leads? There's definitely a few different areas, whether it's on the more classic, customer support use cases where you, instead of having a old IVR system or no system,

Starting point is 00:03:33 you can now deploy a voice agent that will take the calls instead and will both delight the customers on the other side because it understands you, it's quick, it's good, but then also just performs better. And then outside of customer support, we are seeing that across the entire life cycle of the user journey. In some places where it adds an experience. that wasn't possible before. In a simple case is inside of the product

Starting point is 00:04:00 or even outside of the product, and you might have seen back in a day there was those widgets for chat. Now you could have a voice agent that helps you navigate through the product experience. So it becomes your like a partner, programmer, product person that helps you navigate through that life cycle. And you also mentioned,

Starting point is 00:04:19 so of course some of the big pieces is in inbounding and outbounding. We actually use it ourselves in the 11 laps too, where of course we do have a standard flow, we have people that will answer their reply and take a phone call too. But if you want to go quicker, you can speak directly with our agents

Starting point is 00:04:36 to understand our product offering, understand our pricing, understand what you can do with the product, which helps you accelerate through the pipeline, depending and sometimes self-disqualify if you are not the right fit for our product offering, and sometimes helps you accelerate, okay, this is exactly the set of use cases I can do,

Starting point is 00:04:54 this is how I can do, employ and then routes it to other people. So it doesn't actually convert. It's in some cases it does. In some cases, it's as a quick step back. We have a few different tiers. We have like a business tier and an enterprise tier. So it does convert immediately sometimes to the business tier program.

Starting point is 00:05:10 Because it's a preset. Because it's preset. It's self-serve. On the enterprise side, we all still run KYC check so it doesn't do that immediately. But on the business one, it does. And then we've seen some of those voice agents also from a lot of the technology and platform we built, help in a completely different non-commercial aspects too.

Starting point is 00:05:31 Quick follow-up question about the sales process. Have you measured the conversion percentage into sales with the AI voice salesperson? We did, but given it was, and I don't remember the number of top of my head, but given there was alternative before would have been just waiting. So it was just a net new amount of leads

Starting point is 00:05:52 and we receive so much inbound of using a lot of the products, which we are lucky to have, that it helped us just convert so many more leads that we would have otherwise taken weeks, months, or maybe never gotten into. How can I set this up for my company? The easiest one would be to register on our platform. So that part of offering, and we have two key offerings,

Starting point is 00:06:13 is our agentic platform offering. We jump into the platform, and we help you abstract two elements. The first one is all the research or experience complexity. So we help you connect the speech, the LLM elements, the text to speech elements. So the agent speaks in a smooth and a quick way. So it's a very low latency, a reliable part on that side. And then there's a second part where you will need to spend a little bit more time on bringing your business logic in place. So example could be what's the knowledge base of how your business

Starting point is 00:06:46 operates or what are the questions you want to be asked, what are the materials you want to surface? So you would bring that into the platform. Then we have a set of workflows that you can set up effectively imagine like if this happens, this happens, or if this happens, I want this function to trigger. This could be if someone is calling me and I want to appoint a schedule an appointment. We have a predefined workflow for you to be able to do this. So it can look into your calendar. Okay, I'm selling a course basically.

Starting point is 00:07:13 Like what language trip does, we sell courses. So basically want to be able to sell courses to people. Can I do it in different languages? using my voice? You can. Wow. So you could. And so it's selling the courses and the people would call in, buy the course, and off to the

Starting point is 00:07:30 go and maybe they're on board with the agent later on to help. How do they, how do they buy over the phone? Do you send them a link, ask for their email or they just, yeah? Depends. But the simplest would be what you suggest, which is we do have an omni-channel solution where you effectively get a link as part of that and you can leave additional details or you have a follow-up on the email of like a checkout. subscription for the course. So both of those would be possible. Or you could, depending on how that

Starting point is 00:07:55 website is set up, you could effectively embed the agent on your website. So it helps you redirect to the subscription page. It guides you through it and they check out themselves live with the agent that helps them fill in the forum. But like you said, one of the great things on the function side is that you can switch languages, you can hand over. That's fascinating for my business. So, I mean, been a pioneering a lot of that language learning work and I think this would be amazing because both it would switch the language and would switch it with your own voice if that was your own voice. It continues speaking in that same manner. And then of course the last piece is all the integration. So we support integrations. I didn't really have had it. Congratulations. Thank you.

Starting point is 00:08:38 It's one of the big. So maybe that's a good cue for me as well. Because when we started the company, we of course started from pioneering of the research on the speech side. So text to speech, voices and then we expand it to speech to text, the orchestration models, now music. But as we think about the research, it's always how we can push the audio frontier forward. I love how you found this new opportunity and now it's a bigger chunk of your business as far as I understand. How much would it cost for a business like mine, small business, to have AI answer the goals and sell? I think the, and of course it depends on the volume, but I think what hopefully will happen is that both you will see more people coming through and if we set it up in the right way,

Starting point is 00:09:18 maybe this will mean even opening up the channel, which over time hopefully means even more calls. But I think to start, it would be in order of hundreds of dollars per month. It's also IP calling, right? Is integrated in that? Yes. So we integrate with Twilio or telephony systems. So you can bring any phone number that you already have. And it works.

Starting point is 00:09:39 I don't know who currently do you already accept any of the calls coming through the telephone too, or it's all on the website? We mostly try to navigate them to WhatsApp because a lot of people who are calling, they don't speak English, so they don't feel comfortable. But if we advertise that it's, you know, Marina's voice, AI, nobody's judging your accent. Because I feel like when people even talk to me, their non-native speaker, the first thing they do, they're like, I'm sorry, my English is not as good as you're like, it doesn't matter. But I feel like even like using English to make a phone call is such a huge barrier for non-native speakers. And I feel like if you understand that you're talking to AI, it just makes it so much easier. That's true.

Starting point is 00:10:17 It doesn't judge. You can do all mistakes, which is maybe, you know, like there's a completely other aspect what you've, of course, been helping people learn languages for a long time. But maybe there's even an aspect where they could practice speaking their language with you, which would be like a, you know, kind of a slightly different of course deployment, deployment, but completely possible where you can give them tips, improve, and effectively create a marine. arena's duolingo that people have dynamic experience with, which is another kind of incredible area that's growing in that tech space. Yeah, let's talk about that part. So we talked about

Starting point is 00:10:53 deploying 11 labs to work as a sales agent. Let's talk about like, I have this number here where you paid $2 million in royalties to people who kind of share their voices with 11 labs. Can you talk about that? How can people start making money by sharing their voice with 11 labs? So it's one of the efforts we launched in the early days where we effectively created a voice marketplace, voice ecosystem, where every person can create their own voice, go for authentication flow, you need to record roughly 30 minutes or more of you speaking. Then you have a perfect replica of your own voice that speaks in the language you recorded, plus all the language we support.

Starting point is 00:11:35 So you have usually 30 or so different variations now. with the new model we are releasing it will be 70. So you have the voice that's now available for your own use. And then if you decide, you can share it to our marketplace. And if you share it to our marketplace, a specific period of time, specific conditions of what you are sharing it for, then other people can use it across 11 Labs ecosystem. And when your voice is being used, you get paid back as a result.

Starting point is 00:12:02 This way, we have now almost 10,000 voices that people shared and created. What is incredible is it spans so many different. languages, accents, different styles. So, like, now if you're logging to the platform, you just have this incredible platform. And we pay voice down back. So it was, I think, $2 million at the beginning of the years that we paid back. And now, I think the last time I checked it was a few months ago,

Starting point is 00:12:30 we paid back $5 million to the entire community. How much does an average voice creator make? It depends. Of course, you know, like, so it's like, the like probably in Tuggle approaching close to $10 million and we have close to 10,000 voices. So that would be like, you know, if you take the average. But I think it's especially given a lot of the voices got are kind of new and it takes a little bit of time before they take attention.

Starting point is 00:12:57 You also, to actually make it successful, ideally you try to engage some of the community around that they can see the voice, whether it's the Discord, the Reddit, some of the other forums, it definitely helps break through that initial. And if not, over time, we also try to surface new voices and get them out in the audiences. So it really depends. I think it will be a lot of people in like a few hundred dollars per month category. And that's probably what you could expect if you do a little bit of that effort and what you could earn. However, you know, it's, I think it's true that it's if your, if your voice sounds very similar to other voices, it's very how much. Yeah. Yeah, it's interesting. How many voices like in general do you

Starting point is 00:13:37 then how many can you distinguish? Yeah. But if you have a unique voice, if you're a creator, right? Exactly. Then it can be incredible. Our first voice, one of our first voices that got shared, and it was a Spanish voice that had a very deep way of speaking, deep prosody. And that voice became one of the most popular,

Starting point is 00:14:00 not in Spanish, but in English-speaking countries, and became like our top-ten voice, where it was just such a unique and different experience. Let's talk about the nuances of cloning your voice. Because for example, so what happens sometimes in my team, we clone my voice using all the different mics that I have, but sometimes we insert it, and it's still slightly different from the video, because the way we use it is that, you know, we recorded something here.

Starting point is 00:14:25 I recorded some brand deal or whatever, and then I start traveling, and they're like, could you re-record this phrase? So we just take a piece from the video, redo it, with a phrase that the brand asked for, but then we insert in the video, it's slightly different. Like the,

Starting point is 00:14:40 it sounds in a different way. Are there any ways to fix it? Yes, of course, we re like ask it to remake it again, but it's still like not exactly what we're recorded. No, it's, it's of course a tricky problem where

Starting point is 00:14:56 when you create a voice, you most likely take the voice throughout the entire video, and then you create that voice, and then it is effectively the average of how you spoke around that video, but in a given scene, you will have maybe changed the intonation pattern a little bit or the emotional pattern slightly off that average. The ideal way would be to affect for us to do more of the conditioning

Starting point is 00:15:20 on of like what you do pre and post in the video. So we take that more of us in input, and we try to morph it in in a slightly better way. And then there's a second thing. Sometimes even though I know you'll try to clean up the voice and then add the background sounds background effects, they might be by just the process be mixed in and then no, it doesn't smooth entirely.

Starting point is 00:15:41 So from our side, what we hope to do over time is that as you insert those videos, we can precondition it after pre-seconds and after, and it will slant better. So that's something we are working. So I upload the video. So we are working on that, not yet. It's not applied.

Starting point is 00:15:54 But it's going to be the big piece. We definitely need to bring it there. I think in the short term, what you mentioned is what we see. as the most common pattern, which is redoing and regenerating. But the other thing you could try is try to, instead of taking longer audio sample across the video, just take even few seconds, which I know sounds like maybe it will be worse result. But if you just take a few seconds from that fragment and create that lower quality version,

Starting point is 00:16:23 it actually could sound pretty good. Okay. Thank you. So where do you see all of this going with people recreating their voices? Will everybody have a clone in two or three years? Because we could have thought about 11 labs when I heard about it like two or three years ago, right? I couldn't think about a salesperson using my voice. Now we have it. What do you think is going to happen in two years?

Starting point is 00:16:46 What is this new use case that this all is going to unlock? Interesting question. Of course, we are seeing like kind of entirely new ways of interacting with voices. So I do think, yes, you will have your digital AI voice. And I think even step further, you will have your own digital voice agent that does things for you, that you want to make sure it's authenticated people know you operate. So, you know, like we spoke about the example of how people can call in. You can configure a voice agent.

Starting point is 00:17:14 But I think the other side will be also true. I all have my voice agent. Calling the bank. Because they use voice authentication, right? It's going to change. I think that's not the best mechanism for the future. Anymore. Not anymore.

Starting point is 00:17:26 But like, say, you want to book a restaurant or follow up about appointment and a in a healthcare and you want to make sure that they know your most recent details or that it's confirmed. I think you will want an authenticated version of voice agent. I'm saying the authenticated because like you say, most of the verification if they don't will fail and you want to know that it's a permissioned voice. So you all need to start embedding watermarks and metadata around it. But I think to kind of go back to your question of like where it all evolves, I think there will be like an interesting pattern where, and I think it will happen on both sides as a user, but also as a business, you will be able to serve so many different voices to your customers

Starting point is 00:18:09 or you as a customer can decide what voice speaks to you. So to speak for specific examples, we are working with a company in Korea, Korea and Japan, it's a multinational company there, which has a very different age groups calling in, instead of all their patients and then much younger a set of a set of people. And they want to serve, depending on the data, the number that is calling in, serve different voice to that group, both in terms of how it speaks, how it sounds, but also the style and which it speaks. Of course, it's a, you know, it's a generalization, but roughly they wanted that if you,

Starting point is 00:18:50 if an older person is calling in, the voice speaks much slower, much calmer, less emotionality. It's a younger person, much quicker, a lot of higher amplitude of emotions. And I think this same pattern will start happening across everything where if you are calling in a specific region, you might have an accent of that region. If you are calling a restaurant that's maybe representing a specific cuisine, you get a voice of that cuisine speaking with you. And maybe there are like variations of all those different types, which can work. And then separately as a person calling in to any of those services, you could

Starting point is 00:19:24 pre-select that too. So if you are calling a bank and you enjoy speaking always, always with the voice of this specific style, then you can select it, and that voice will be the voice of your preference. We've seen this happen in a company also in Asia where they created effectively a travel agent or like a Google Maps competitive product, where you can select a voice that narrates your direction. And one of the voices they selectors became viral. and everybody wants to use it now and the travel directions

Starting point is 00:20:02 because it just made for such a better experience. So if I extrapolate in the future, I think there will be a lot more both personalization but also selection that you can choose into. I think 100% true. You will have your own authenticated voice that you can use for your voice agent, for your content. That has all the information.

Starting point is 00:20:19 That has all the information that you can use. That's very interesting. I like that part. Like having my voice call and be authorized to use my data. How do you talk about impersonation with voice? Like if my voice is authorized to use my credit card to buy anything and then somebody just uses the resemblance of it, will there be any metadata that could be detected by other systems?

Starting point is 00:20:41 And how would it, what would it look like? Yeah, so I think, first of all, I think it's going to happen. Like I think the assumption we should be going with is that where you will have good actors, good technology, trying to avoid it, but then there will be also more permissive and technology and bad actors trying to abuse it with any technology shift. And already now, there is a lot of open source technology,

Starting point is 00:21:08 other commercial technology, which doesn't have the same safeguards that could clone your voice and create a mimicking and that sounds like you. So I think any system that we think about devising in the future kind of needs to assume that you can create a clone of a voice and make it a perfect replica. Now, of course, if you're, like as I think about 11 laps, we can, and we do, add safeguards as you create a voice,

Starting point is 00:21:35 so you cannot do that. Or if you do, we detect it and moderate and can flag it internally if we are not sure. So whether it's being able to trace everything back to the account or moderate what text was used, whether it was trying to do a scam. But to core of your question, like as you think about the future,

Starting point is 00:21:52 the ideal system and it would require cooperation from number of parties would have three different layers. And then the first layer is instead of trying to check for AI, you actually check for human. That's easy for me to say. Of course, there's like, how do you check for humanness? But a simpler step or original step could be that you, on the devices that you use, so on my telephone or on my laptop,

Starting point is 00:22:18 I am encoding that this is my phone, my laptop. When I'm calling from it, it's being decoded on the other side. they know that this is device I use, so most likely this is me. That's the first layer. Second layer is actually what we spoke about earlier, and that's possible, you watermark authenticated AI. So if I'm using a specific tooling, the tools that can add this watermark are known, and I watermark that within the content.

Starting point is 00:22:44 It's not super straightforward, especially in audio, because if you add a watermark in content, it can affect the quality of the content itself. but it's roughly, roughly good. And that's the second layer. So you check for authenticated AI. And then the third layer is by default, it's AI, and you assume it's AI. So if it didn't pass the first or second layer,

Starting point is 00:23:05 and you see content that hasn't been authenticated or proved for being a human, it's AI by default, and you don't trust it. And then you can add more mechanisms on top of that third layer where you try to explicitly check or add additional signal, like, ah, this is real. But that would be a mindset shift. where today if you look for content,

Starting point is 00:23:24 you're like, oh, maybe this is AI. It should be the opposite where it's like, oh, no, this is definitely AI. Is it maybe human or is it maybe AI that was creative, with creators permission? And then you have those cases in between that will be interesting as you of course create the content. You mentioned that sometimes if you need to re-record, you might create an AI voice with, of course,

Starting point is 00:23:44 with your permission. But then do you do that across the clip? And maybe you do that 1% or 5% of the content is AI voice. maybe in a future it will be 30 or 50% and at what stage would you say this is like your AI delivery or human delivery? When you need to build up your team to handle the growing chaos at work, use Indeed sponsor jobs. It gives your job post the boost it needs to be seen

Starting point is 00:24:10 and helps reach people with the right skills, certifications and more. Spend less time searching and more time actually interviewing candidates who check all your boxes. Listeners of this shell will get a $75-sponsored job credit at Indeed.com slash podcast. That's Indeed.com slash podcast. Terms and conditions apply. Need a hiring hero?

Starting point is 00:24:28 This is a job for Indeed sponsored jobs. We're the Hartford, with decades of experience ensuring millions of unique small businesses when it comes to your small business insurance. Thank you. One size, absolutely, does not fit all. Get a quote or find an agent today at thehartford.com slash small business.

Starting point is 00:24:46 You're a founder in AI. How do you sleep at night when everything is moving so fast? what are your main fears or what keeps you up at night I think there are two parts to it I think the first part that I need to mention is that it's such an incredible opportunity with the shift like it's the biggest shift or maybe bigger shift than the internet and we are at 11 labs so

Starting point is 00:25:08 happy and lucky to be part of that shift and be leading on the voice frontier so I think that and I think that the team and all of us are feeling that that we have unique opportunity that never happens in your life, that you can create a technology, define how it will be used, and hopefully create value across, whether it's voice agents and how a voice interface will look in the future,

Starting point is 00:25:30 whether it's making content global, whether it's making content available in audio. But of course, with all of that, as you think about being at the frontier, it also makes us carry some of the responsibility for how we define that. So a lot of our parts will stem from that. I think the first one is we still think there's innovation, on the research level that you can bring into the space, at least one or two big ones in audio.

Starting point is 00:25:55 And we've been able to do it so far in text to speech, speech to text, recently in music, but we still want to continue leading and continue being better than some of the biggest labs in the world, whether it's some of the new AI companies are all in humans. We think we have that opportunity, and that is motivating, but of course, definitely causes less sleep at night.

Starting point is 00:26:16 The team is super hardworking too, which makes for it. shorter nights. Then from the risks perspective, we spoke about some of those. We do feel like it's our responsibility to make sure that we avoid some of those risks. So we are trying to invest a lot of time in developing safeguards around it. Then of course, the third one with a lot of the technology, how the economy or how the jobs in that economy will change. And we would like to do it in a way which brings a lot of the people in that economy together with the change rather than it's change that will just affect it and disrupted. But how, how.

Starting point is 00:26:50 How can some of the people that want to be part of it, be part of that disruption too? That's the voice ecosystem that we built is part of that, that reason. But, of course, I think we need to, we need to keep hiring amazing people, keep pushing ahead. As well, so much is happening. I still think it's very early. I may be biased and self-serving here, but it's still very early. You mentioned jobs that are being replaced with voice technologies. What do you think are the jobs that are at most risk, I guess, like customer support,

Starting point is 00:27:20 And what should these people be doing now to not get replaced in a couple of years? I think the trope and I think it's very true is that, or the people that will be replaced by people that use AI. So I think this is the key message that, like, you should effectively go into trying a lot of those tools and products. So you stay at the frontier. And then the people that are in any of those jobs that use AI, I think can actually benefit a lot, a lot too. And even in customer support, of course, a lot of that will shift. But for example, what we are seeing is that the simple manual tasks of appointment, taking or doing and processing a simpler refund, all of that is like very manual, very recipe based in most cases. But then as you go to the more complex parts, you need a human expert to help close that gap.

Starting point is 00:28:13 And in that part of the process is actually even more in need, whether it would be debugging a harder problem that you have in the product, whether it's understanding your what happens after the appointment, there's a specific thing you receive and you want to decide whether you need the X or Y help, which of course needs to go for some of the regulation too. But for all of those, the pattern is that the expertise is even more valued. And of course over time, I think the AI will start shifting and taking more of that. So there will be like some percentage that goes across. But that'll be my main piece of like if you understand how the AI works, you can become more of the expert and better knowledgeable yourself and help. And that's also true in a creative space too. I think so much you can iterate so much more frequently.

Starting point is 00:29:06 You can produce to the wider audience. You have to move faster and faster. That's what I'm feeling with this. You can definitely do faster iterations. You have to run to stay where you are. I don't know if you get this feeling, but for me it's like the world is speeding up every single day. I do think it's speeding up, but at the same time, I think it's not zero sum where it's not by speeding up in this category doesn't take away from another category.

Starting point is 00:29:32 I think the entire economy is just growing as well with a lot of that adoption. So there will be more creative opportunity than it ever was before. And yes, to be part of that creative opportunity, you probably need to move faster with a lot of the innovation than you might have needed to before. But I think still like a wide set of people can and will benefit. But of course, you know, it's going to a lot of the repetitive, manual, non-talented intelligence,

Starting point is 00:30:02 like basic intelligence-based work. will be replaced with well, the AI workflows. And the best way to avoid this is by learning a lot of the AI tooling. So you yourself are better. And maybe just to finish off and maybe to summarize the customer support piece, thinking about it slightly differently and outside of customer support, is that frequently if you have a domain expertise, whichever domain that is, then that's where you can deliver even more value.

Starting point is 00:30:31 So combining your domain expertise with AI is, is much higher value and output. And if you don't have domain expertise, then you probably want to gain that domain expertise, which would be. Yeah, I've seen a lot of graphs for future of jobs reports, and there's this section like Your Expertise Plus AI, and it goes like this in terms of demand.

Starting point is 00:30:58 What would be the tools that you would recommend everyone to start using now? Name top three AI tools. Top three AI tools. Okay, outside of 11 labs, which you do need to try and use. I would say I really like Black Forest Labs for their image work. I mean, the mid-jury journey has been cranking out for so many years, but Black Forest Labs I really like as kind of the new iteration. I think they have a good realism,

Starting point is 00:31:26 and I think they will go through a set of additional iterations that are great. from the classic ones I mean Anthropics Clod Cod I think is incredible where I think

Starting point is 00:31:40 it helps you like be another level engineer or even if you're not engineer try to be a little bit more of the engineer and then last one

Starting point is 00:31:48 I was really I really like lovable but similarly I mean VZero are great but given we are in Europe I feel lovable

Starting point is 00:31:59 deserves the They're from Sweden, right? They're from Sweden, yeah. But all of them, I mean, it's just so incredible to see, like, our go-to-market teams try whether it's not a bolviso or replet. I think, no, Figma also launched there, so I haven't tried it yet. But that's fun to see how, like, people that haven't been traditionally on the engineering front are closer, and they understand the product, pain points,

Starting point is 00:32:24 they understand their use case all better. So there's both this path of, like, prototyping, showing the clients, which is amazing. but then also by extension, they are effectively getting closer to what is behind the scenes on the product side too. Yeah. And when you mentioned lovable, do you build something for yourself or for 11 labs? Both. So on the go-to-market side, we frequently will do a demonstration to a customer of like, let's say we were doing the use case that you mentioned. We could build a prototype on a mock-up website of how the checkout would look like, how the agent would interact with you.

Starting point is 00:32:56 that type of use case all the time, whether it's on the conferences or with the client calls. But also on the personal side, I recently tried with my two nieces, they are five and seven years old, so I have the best job of being Fanoncle or trying to be. And they were speaking about how they could potentially create a story generator for themselves,

Starting point is 00:33:22 where you would type in the character names in the story would be created. You're an entrepreneur. You've started this company, spotted this opportunity. Do you see any other areas, aside from voice, where people should be doubling down? Because one of the founders I had on this podcast told me that, actually co-founder of Hugging Face, he told me that in the next five years, you have to be an entrepreneur where you're done. So a lot of people are learning how to become an entrepreneur.

Starting point is 00:33:47 Can you name some opportunities that you see that can make people decent amount of money so they can make a living like 10K a month, something that's immediate. something that you see a gap in the market. It will be voice specific, but I think it's so, so early that I think it's a huge one. There's definitely a lot of the infrastructure being built for the voice agents. We build it, but other companies are too. And I think there is a big gap between voice agents and then actually deploying them in a lot of those businesses. And you don't have to have the engineering expertise to deploy those voice agents.

Starting point is 00:34:21 The platform now frequently will support a relatively self-served manner of taking. it, but you can easily take that voice agent and deploy that in a specific domains. And most of the businesses in the world still don't know about it. If it's not venture-scale business and you just want to make good money, I would try to take those voice agents and go to, let's say, local doctor's office and help them appointments schedule for the dentist so they can take appointments more easily. day can then focus more on the work instead of nurse doing that in between or missing appointments. That's actually one of the most common.

Starting point is 00:34:58 I don't know the percentage, but so frequently those appointments don't get booked because there's no one on the phone and can take them. You can go to local mechanics and help them take appointments. And I think all of these require a slight variation of the domain piece that you need to know. All of those businesses are in thousands to tens of thousands of dollars per month if you get to the few. Their infrastructure is there. You just need to bring it to two to those domains.

Starting point is 00:35:26 Yeah, it's like B2B, automate businesses with AI. Yeah, and small businesses all around the world. And you don't have to be a coder. You don't have to be the coder. You just need to spend the time, call them and ask or go to them. And I think there's just like this category which might not be taken off by some of the biggest companies that will focus on bigger enterprise elements, like the classic like this is like small medium businesses rather than than enterprise segment.

Starting point is 00:35:54 And at the same time, most of those companies just don't know this is possible. So like next year, too, it's just an incredible opportunity to do it. And of course, you know, start as in English speaking, but I think the same is true for so many of the countries and languages, which might be given so much of that work, isn't always localized. I think in our case, we're doing a pretty good job there. you can bring it to a local market and do exactly the same, the same work. Absolutely love it. Thank you.

Starting point is 00:36:23 Thank you. So if you were a certain company today and you're a brand new entrepreneur, what will be your advice for anyone who's starting out? The first advice would be that you deeply understand your user and the problem that you're trying to fix. I think that would be the first piece. It's like, do I know the problem and do I know people have that problem? Because you started 11 levels because you didn't like the trans.

Starting point is 00:36:47 The translation of... This is a super crazy piece that in Poland, if you watch a movie, all the characters, whether it's a male or female character, are narrated with one single voice. With no intonation, right? No intonation, it's flat. Exactly, exactly. I think it was the same as post-Soviet times in Russia.

Starting point is 00:37:07 And it still continues today. You know, it was kind of obvious when we started looking into the audio space and then realized that this is still a problem, something we grew up with, something that you ask any Polish person or most of Polish people, and they will tell you how bad of an experience that is, as you can likely imagine. It's pretty bad. And it will change, and it will second now be, okay, if you think about the future, you will have all different or different original voices represented. So if the movie is streamed, you just hear exactly the same language. Of course, it expands from the dubbing to just voiceovers and speech,

Starting point is 00:37:46 because so much of the content is unavailable in Oli in the first place and now a lot of voice agent stuff, but it was a very clear problem. And I think as I think about starting a company or if I were to start a company again, I would try to obsess about the problem. And then the second one is, do people actually have that problem?

Starting point is 00:38:02 Is it like she burning a problem? And in dubbing, it was a good example where we thought the dubbing is the biggest problems, but before we actually solved the dubbing, we realized from a lot of conversation with users that there are so many other problems that they would like to fix first. the most common one is actually one you mentioned

Starting point is 00:38:18 where people just wanted to repair lines after recording or just being able to deliver voiceover without speaking and that was like the most common thing after we tried to reach out to people like well before we had it ready it's like hey we are almost finished

Starting point is 00:38:33 with our dubbing product would you like to dub your movies and most likely we would get some small percentage of replies and then inside of those replies it would be yes this would be interesting but actually if you're going to help me with just

Starting point is 00:38:46 my voice and do it. Yeah, that would be much, much better. So then we're like, okay, there's this incredible opportunity that's smaller component of the technology we're going to build that we should do instead first. And we did. And then we validated that again. And people were, yes, that's something we would love. And then given we started from creators on social media, after we heard this, but then we realized

Starting point is 00:39:14 that there are actually other people not on social media. but also on voiceovers being the biggest group for us was book authors initially. Everybody just couldn't report. Audio books. Exactly. Because that's like a few days in the studio. A few days in the studio, very expensive. So many people get tired with the voice.

Starting point is 00:39:29 So it's never as expected initially. So it takes more than that. And then that turned out to be like second of the first biggest, biggest ones. But you actually built the dubbing product first. And you realize nobody wanted to pay for it. Yeah. So we did the prototype. We did a prototype.

Starting point is 00:39:44 We did a, yeah, Psychos, a little bit of a, like, you know, like a stitch up of not, not, it did have a lot of our own research, but, but it wasn't, it wasn't months of work. It was like, we created a prototype. We, while we were building the prototype, we're reaching out to customers. Like, we were working on this, do you want it? We had a good waiting list. Then we tried to show them what is, what it, how it looks. and they were like, oh, this quality isn't as good. If you could actually help me with this and this instead, it would be better,

Starting point is 00:40:19 which is the same technology because people notice that if you dub, you can hear the voice of the person in the other language. It still sounds the same. And it turned out that the problem was even earlier. It was like, oh, just my voice is great. I love that. I love how you started with the surface, then you went deeper and built the whole technology that sold so many problems

Starting point is 00:40:35 that were on the surface as well. Yeah. So I think, yeah, I think the entrepreneur is building today. If they understand the problem, And of course, I'm in a very lucky position where I know my co-founder now for 15 years and know him inside out. And he's the genius behind a lot of the work we do. But I think that would be my second piece where, like, you want to really pick your co-founders and the early team as carefully as you can, as these will be the people. You will spend most of the nights and years ahead.

Starting point is 00:41:06 Success depends all of that. The culture depends on that. And then similarly, we're very happy to have some of the best early joiners to one of the person on the growth side. We trusted inside out. And two of our engineers turned out to be some of the most hardworking and smart engineers we have, which set up the culture bar very high. Nice, nice. Okay, I'm going to wrap up with this question as a person who's been advocating learning languages. Will people still learn languages in three years if they can have their AI.

Starting point is 00:41:39 authorized voice speaking any language, join any Zoom call. The only thing that's left is maybe one in one conversation, but then maybe we have a device that translates everything. The interesting one, I think they will, but not always the primary purpose will be for understanding others. It will be frequently for just developing yourself as a more of an enjoyable thing you want to do for your own sake. Like horse riding, right?

Starting point is 00:42:07 from a necessity to a hobby, right? To more of a hobby. And of course, there are like parts that by learning language, you learn the culture, you'll learn, and your kind of your perspective opens. I think that still will be true. Or if you're moving to another country. Are you moving to the country?

Starting point is 00:42:24 I mean, like if you want to move to the U.S., you would still learn some English, right? Hopefully you will not need to do it, and you will still be able to understand a culture and level that you never could be far. So the hitchhiker's guy, it will be like a babble fish variation. headphone, maybe device, maybe neural link. But even in those cases, there will be sound processing

Starting point is 00:42:43 time involved because you need to finish speaking for the device to pick it up and then translate it. So language natively speaking will be better. But yes, I do think most of that need will disappear for you to be able to interact and understand, which I think will be a beautiful thing. And then hopefully you can learn it for other purposes. interesting how the whole industry might disappear or might transform completely but it's happening not to just language learn it's happening to every 100%. But I think it will stay. It's definitely a little morph but some some of that will definitely stay. Thank you so much, Masi. It was very inspiring and very practical. I love that. And thank you so much for being an early user and all the

Starting point is 00:43:30 feedback as well. Thank you and I'm hoping we're going to integrate the sales part. I'm excited about Amazing. Let's make it upset. I'm going to talk to my team right now. Let's go. Thanks. Thank you. Own it all. Pay off your home, travel for life, drive a Ferrari. In celebration of the world premiere of the Monopoly, big board buck slot machine by aristocrat gaming,

Starting point is 00:43:45 Yamava Resort and Casino at San Manuel is giving one person a $1.6 million dream package. The biggest prize in Yamava's history. Club Serrano members can earn daily instant prizes and secure a spot in the finale May 29th. Don't pass go and own it all. Only at Yamava, celebrating its 40th anniversary. You win? Details at Yamava.com must be 21-20. Please gamble responsibly.

Starting point is 00:44:04 Monopoly is a trademark of Hasbro. Hasbro is not a sponsor of this promotion. Enjoy more ways to save at Ralph's, like low prices in every aisle. And when you download the Ralph's app, you can clip and save more with digital coupons every week. Plus, you can earn fuel points

Starting point is 00:44:19 to save up to $1 per gallon at the pump. At Ralph's, you can enjoy more ways to save and more rewards every time you shop. So it's always easy to save big every day with savings and rewards. Ralph's SoCal for over 150 years. Savings may vary by state. Fuel restrictions apply. See site for details. Thank you.

Silicon Valley Girl: AI, Tech and Career Growth - $6.6B AI CEO: How to Make Your First $10,000 with AI | ElevenLabs CEO & Co-Founder, Mati Staniszewski

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.