Latent Space: The AI Engineer Podcast - Personalized AI Language Education — with Andrew Hsu, Speak

Episode Date: July 11, 2025

Speak (https://speak.com) may not be very well known to native English speakers, but they have come from a slow start in 2016 to emerge as one of the favorite partners of OpenAI, with their Startup Fu...nd leading and joining their Series B and C as one of the new AI-native unicorns, noting that “Speak has the potential to revolutionize not just language learning, but education broadly”.Today we speak with Speak’s CTO, Andrew Hsu, on the journey of building the “3rd generation” of language learning software (with Rosetta Stone being Gen 1, and Duolingo being Gen 2). Speak’s premise is that speech and language models can now do what was previously only possible with human tutors—provide fluent, responsive, and adaptive instruction—and this belief has shaped its product and company strategy since its early days.https://www.linkedin.com/in/adhsu/https://speak.comOne of the most interesting strategic decisions discussed in the episode is Speak’s early focus on South Korea. While counterintuitive for a San Francisco-based startup, the decision was influenced by a combination of market opportunity and founder proximity via a Korean first employee. South Korea’s intense demand for English fluency and a highly competitive education market made it a proving ground for a deeply AI-native product. By succeeding in a market saturated with human-based education solutions, Speak validated its model and built strong product-market fit before expanding to other Asian markets and eventually, globally.The arrival of Whisper and GPT-based LLMs in 2022 marked a turning point for Speak. Suddenly, capabilities that were once theoretical—real-time feedback, semantic understanding, conversational memory—became technically feasible. Speak didn’t pivot, but rather evolved into its second phase: from a supplemental practice tool to a full-featured language tutor. This transition required significant engineering work, including building custom ASR models, managing latency, and integrating real-time APIs for interactive lessons. It also unlocked the possibility of developing voice-first, immersive roleplay experiences and a roadmap to real-time conversational fluency.To scale globally and support many languages, Speak is investing heavily in AI-generated curriculum and content. Instead of manually scripting all lessons, they are building agents and pipelines that can scaffold curriculum, generate lesson content, and adapt pedagogically to the learner. This ties into one of Speak’s most ambitious goals: creating a knowledge graph that captures what a learner knows and can do in a target language, and then adapting the course path accordingly. This level-adjusting tutor model aims to personalize learning at scale and could eventually be applied beyond language learning to any educational domain.Finally, the conversation touches on the broader implications of AI-powered education and the slow real-world adoption of transformative AI technologies. Despite the capabilities of GPT-4 and others, most people’s daily lives haven’t changed dramatically. Speak sees itself as part of the generation of startups that will translate AI’s raw power into tangible consumer value. The company is also a testament to long-term conviction—founded in 2016, it weathered years of slow growth before AI caught up to its vision. Now, with over $50M ARR, a growing B2B arm, and plans to expand across languages and learning domains, Speak represents what AI-native education could look like in the next decade.Full Video EpisodeTimestamps00:00 Introductions & Thiel Fellowship Origins02:13 Genesis of Speak: Early Vision & Market Focus03:44 Building the Product: Iterations and Lessons Learned10:59 AI’s Role in Language Learning13:49 Scaling Globally & B2B Expansion16:30 Why Korea? Localizing for Success19:08 Content Creation, The Speak Method, and Engineering Culture23:31 The Impact of Whisper and LLM Advances29:08 AI-Generated Content & Measuring Fluency35:30 Personalization, Dialects, and Pronunciation39:38 Immersive Learning, Multimodality, and Real-Time Voice50:02 Engineering Challenges & Company Culture53:20 Beyond Languages: B2B, Knowledge Graphs, and Broader Learning57:32 Fun Stories, Lessons, and Reflections1:02:03 Final Thoughts: The Future of AI Learning & Slow Takeoff This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

Transcript
Discussion (0)
Starting point is 00:00:05 Hey, everyone. Welcome to the Lidenspace Podcast. This is Alessio, partner and CTO Ad Decibel, and I'm joined by my co-host Wix, Founder of Small A.A. Hello, hello. We're back in the studio with Andrew Struis-Speak. Welcome. Thank you for having me. I have to start this stuff. I didn't prep you on this at all, but you were a TIL fellow in 2011? First class. First class.
Starting point is 00:00:24 Is that the one with SBF? No, he was, I think, several years later, actually. Yeah, yeah. What was it like? Just talk about the... That's a good question. I haven't been asked that one in a while. it was a really crazy idea at the time and very controversial. And I think the first few years of the fellowship were definitely let's just find 20 people in the 20 and give them $100,000 to drop
Starting point is 00:00:51 out of college. And it could be, it was no holds barred. You could do anything. You could be doing some crazy research idea, a startup, anything. And I actually met my current co-founder at Speak. He was in the second year of the fellowship and made many, like, very close friends from the first few years. But, I mean, for me, it was life-changing. I had a very unusual path where I was actually, I did finish college, unfortunately. I was in grad school at the time because I went to school really early. Yeah, I was like, aren't you too old, you know, to you like some young. I was, I was 19 at the time and in grad school. It was a very accelerated path, but I think, like, I knew at the time that I was going to leave grad school and do startups anyway, and the timing lined up really well.
Starting point is 00:01:38 Yeah, yeah. Vitalik, I think. She was also in a later year. Yeah. Damn. Okay, anyway. But the first two years, I mean, there are some crazy successes. You know, Dylan from Figma.
Starting point is 00:01:50 I mean, yeah, like a lot of people. Awesome. Well, you know, feel free to bring in those stories as and when. Cool. Only you know those kinds of people. You are now, CTO co-founder speak. I would say, from, you know, from. a very early stage, like one of the most successful and prominent open-air partners that
Starting point is 00:02:07 anyone would know is like doing well. And like teaching English to Koreans is like your rough remit at the time. How did that all come about? It's funny that you say that because despite our current sort of revenue scale and objectively, I think, how successful we are, we've always operated in a market, at least initially on the other side of the world and been much, much more popular in the sort of Eastern world and a bunch of Asian markets and relatively unknown in the West. So it hasn't really felt like we've had that sort of awareness until, you know, the past few years, really. But brief story is that my co-founder and I, back in 2016, were fascinated by the promise of AI. And we spent a year of sabbatical,
Starting point is 00:02:55 basically learning everything we could. We talked to Carpathie back then, actually, when he was like just finishing grad school and did a lot of sort of self-study research. And we were just so convinced, I think fundamentally, that speech models were going like this, language models were going like this. And in the five to 10 year span, they would become superhuman. And we were utterly convinced of this future. And we saw that the way people learn things and specifically learn languages, which was a very sort of human-based thing, if you really care about fluency, that would completely change and we'd be able to build language shooters that were pure software, pure AI. So that was kind of the genesis story of speak. It took much, much longer than we expected to build a great
Starting point is 00:03:45 product and find good PMF. The first few years were very painful. And I think without this really compelling vision of the future, we would have quit. We actually like never pivoted. Last year, we brought the entire company to Taipei. We do this company trip every year. And we played our original YC application video on screen. And it was really funny because the things we were saying in that video were the exact same things that I still say today about the long-term vision and what we're building towards. So that was really cool to see. Can you summarize the long-term vision again? It was that as speech models and language models become superhuman, that would let us create an AI language tutor that would help you become fluent faster than any human good. And I think we're like 80 to 90
Starting point is 00:04:33 percent of the tech is here now. And you have this big focus on like speaking. Obviously it's in the name of the company. Yeah, that's right. And I think the speech models were maybe a little delayed compared to the tax models. Did you ever think about, okay, maybe speech is just not going to work for this use case or like what were kind of like the valleys of, you know, discomfort and then what were maybe some of the pivotal releases and models that you were like, okay, it's going to work. It might take a little longer, but it's going to work. So we've always done custom speech stuff. The first act of the company, if you will, was before LMs, right, before 2022, when Whisper came out, when ChatGPD came out, in the years before that, you know, like roughly two to three years,
Starting point is 00:05:13 is when we feel like we found PMF in South Korea and then started growing still only in that market, still only teaching English. And we developed custom speed recognition. models and users were speaking into the app all day. So we had a ton of this non-native English speaker data. And we would use that to fine-tuned models, understand our users better. We still do that today. And it's important for us for the core recording loop in many of our lessons, that it's extremely fast. So we're very latency sensitive. There's many other sort of product surfaces within the app today that are more LM powered, where it's more open-ended, real tutoring, where we actually give you feedback on what you said in the semantics and so on.
Starting point is 00:05:54 So that stuff is more like whisper-powered, more L-L-M-powered. But we've always had like a very fast core ASR loop that's been fully custom. I just onboarded to the app earlier today. I like other apps, there's kind of like this Twitter conversation that you do for onboarding. I'm guessing that is mostly LLM-based and then you're kind of judging the person to respond. So I select the Spanish and the conversation was in Spanish via text to start. And then from there, start two great lessons for me. Yeah.
Starting point is 00:06:20 Was that all unlocked from LLMs where now you can kind of have these conversations and then bring people into the speech flow? Yeah. So before that, so we call that magic onboarding. And it was a new thing we built that was more conversational. We wanted it to feel more like you were talking with a tutor. And they were sort of learning things about you. And we would use that later to personalize the experience. Before that, we had a much more traditional app onboarding.
Starting point is 00:06:44 There's still a lot of open questions, interesting questions around what is the proper onboarding UX? because a lot of people start using speak, and they're not in a situation where they can actually speak loud. So we have, like, you know, fallback outlets and so on. But it's something we're, like, super actively experimenting with. Is there a structured output behind that? You know, anything that you found implementing magic onboarding. I think people always want to improve onboarding.
Starting point is 00:07:06 What's the uplift? Or was there one? You still don't know yet. The interesting thing is that, in general, because it's speaking-based, which is a much higher barrier than just, like, tapping a multiple-choice button, what we see is that install to sign up rate is a decent amount lower, but trial start rate is higher.
Starting point is 00:07:26 It's still something that we have an active experiment that's running, and we're trying to be super agile about testing many different sort of formats of this. I don't think I have the final answer yet, but I think the intent, the really vision that we're going for here is that as soon as you download the app from the app store, maybe you see it in an ad, the first thing that the first interaction when you have like a fresh open of the app should feel pretty futuristic. It should feel like, okay, this is like the new AI native next-gen way of learning language to fluency. And that's kind of been always our ambition. Like we wanted to build something that wasn't possible before without LM and like AI technology.
Starting point is 00:08:09 Yeah, I think I wanted to go back on the onboarding soon, but there's a general idea of like when you replace a form, with voice bot that you need to have some kind of state machine behind the hook, the thing to drive, like, what else don't I know about you? Let me proactively ask that. And I'm just wondering if you had any insights there or is it literally just a state machine?
Starting point is 00:08:31 We try both, actually. Right now, I think probably what you saw is a state machine, but I think that... Trust the AGI. Yeah, right. I think that things should move in a direction where it's much more of a natural conversation. There is a general sense of a
Starting point is 00:08:47 goal in the prompt that you can specify. And part of the hard thing here is all the guardrails, right? Like, once you start talking about what you had for breakfast yesterday, right? And trying to be like antagonistic to the system, then things start like really going off the rails. So for a bunch of these experiences, we're pretty careful about the fallbacks and we have a lot of evils around that. But I think where it should end up is just feeling like you have a quick three to five minute conversation with your tutor. And then it knows. a lot about you. And then you create your account, et cetera. And you create memories? Yeah. So we store what you're saying. We summarize in the experience. The way it works is
Starting point is 00:09:27 the tutor will ask you some sort of question like what are your goals around learning English or the language. And then we will basically use a separate LM prompt to summarize. So it's not the like full transcript for what you said that you see. It's more of like an abstracted, okay, here's what you care about. And we think that's a better product experience. What were some of the other key tenants on the product? Obviously, language learning is like one of those consumer markets where like dozens of companies always trying to get started. And you get these old companies like, you know, Babel and you got Duolingo. So speaking, the act of speaking was like a big part of it.
Starting point is 00:10:02 I think this memory stuff is great. I think if you try some of the other apps, it's like they always started to re-ask you the same things that you got wrong before, but you're not really ordering. Is there anything else that is maybe not as obvious from the outside in the design of the app and the product? that is like you think it's really different. I would say from a macro level, this is actually a pretty new product category, AI powered language learning. And all these apps that you mentioned,
Starting point is 00:10:27 Duol, Babel, et cetera, they're more of like the gen 2 of language learning. So if you think about Gen 1 was Rosetta Stone, if you remember, right? CD-ROMs in airports. And then Gen 2 was basically mobile. So you have these very casual, massively popular mobile apps like Duolingo
Starting point is 00:10:44 that I think the comp there, is probably closer to a mobile game, something that feels productive, something that's very engaging, very gamified. And Duolingo is very leaning into it, the gamification. And they've done an amazing job of that, to be clear. Yeah, they might be the world's best people at it. Yeah.
Starting point is 00:11:01 And our view is that LLMs and AI now enable Gen 3 of language learning, which is something that is very AI-native, very focused on functional fluency, which is why we do all these role plays and let you practice Spanish by talking to your Uber, driver. We don't teach vocabulary and grammar. We teach sentence patterns and we try to get you to just repeat and drill and drill, almost like you're in a gym until it's automatic because that's what speaking is, right? It has to be spontaneous and automatic. In terms of the other aspects of
Starting point is 00:11:31 the design, though, we went through many, many iterations over the first few years of starting the company. This is kind of what I was mentioning about. It was like really painful in the first four or five years. And in fact, the current version of the speak app is not the first thing that we launched. We had something that we called internally the red app, which was like a red app icon, still a similar logo. And it was more around packs of content instead of courses where you could sort of choose any topic that you wanted to learn. It was from many different languages for learning. It was essentially like not a very directed experience and it didn't really work. It was free. It was a very basic thing. But we in 2018 tore everything down.
Starting point is 00:12:15 and realize that we had to really fully change what we were doing. And that's when we decided to focus on South Korea, specifically on teaching English. We built a bunch of new lesson types, and we created our courses so that the experience was much more on Rails. We realized people don't want to choose. They're already using some of their motivation on a daily basis just to open the app. They don't want to make another choice after that, right?
Starting point is 00:12:40 Just tell me what to do, right? Like, you know, give me a big button. and then I can tap it and just start a video lesson or whatever. We also, pretty critically, I think, abandoned the free version and just went straight premium. And we kind of sidestepped the motivation question that way because we knew that there were a ton of users that really wanted to learn English and were already really motivated. So we wanted to basically filter for these users.
Starting point is 00:13:05 So I wouldn't say there was one silver bullet. It was kind of the combination of many learnings over three or four years. and then that started really growing in South Korea. And from there, I guess like phase two was really 2022 when LLMs came out and Whisper came out. And that allowed us to go from this more supplemental speaking practice tool to more full-featured language tutoring where we could use LMs like 3.5 turbo back then to give you direct feedback on your wording. And on, you know, like that was kind of a weird thing to say. a native speaker would say it this way or use a different word or whatever. I always do a poor job of doing this, but can we get some headline numbers, like, just to get a sense of scale?
Starting point is 00:13:51 Because I think maybe some audiences don't know. Where are you at now in terms of your reach? So we're now the biggest English app in South Korea. Yeah. We do billboards, big celebrity campaigns, that sort of scale. Like we're, you know, very popular there. I think like 6% of the green population has tried us. we're well on the way
Starting point is 00:14:09 in a bunch of other Asian markets like Japan, Taiwan. So the Asian markets are currently our mainstay. We also teach English in 40 more countries. We're coming to the U.S. as well. Launching, I mean, we have Spanish-French life and several more languages are coming this year.
Starting point is 00:14:27 That's a huge focus of the company right now. In terms of revenue scale, well over 50 million ARR, it's a pretty simple business model. it's like mostly consumer. The B2B stuff is super, super exciting, and that's also growing really fast, and I think it'll be a really meaningful part of the business.
Starting point is 00:14:44 When did you start B2B? About a year ago. It was like very much a side bet slash experiment at first, and then it just started working. Of course it's going to work. Yeah, and now it's like, okay, you know, this is part of the future, right? This is a real thing.
Starting point is 00:15:00 Yeah, so that's exciting. What's the B2B raised between learning language and like real time AI translation? at Google Ayo that's like one of those like Google Beam things for like you know for conferencing they do real time translation like
Starting point is 00:15:12 yeah yeah yeah so people always ask this right they're always like what happens when the Babelfish comes right when the real time translation comes and Babelfish is the Hitchikers guide right yes exactly the counter example that I always have that I think is quite illustrative
Starting point is 00:15:28 is in German the verb is at the end of the sentence right so if you're trying to do real time translation from German to English as an example, you can't actually make any progress on the English until you hear the whole German sentence and you know what the verb is at the end, right? So like the minimum latency there is the full sentence. And that's like an example of the technical blocker for like why it'll never be truly, truly perfect. But also I think besides that, if you talk to all of our
Starting point is 00:15:58 users in Asia, they don't want a translator. The reason that they are trying to learn English is to make themselves a better person to connect with other people. They want to be able to look you in the eye and speak English, speak the same language as you. So it's actually like a very different thing. I think what will end up happening is that we will build a real-time translation feature into speak and have it integrated into the learning experience. And also like there's always that human side, right?
Starting point is 00:16:24 Like I'm dating a Romanian woman. Yeah. His wife is trying to learn Italian. Like there's always that. Yeah, absolutely. We're going to keep happening. I want to double click on Korea. I think it's like a very insightful, smart decision.
Starting point is 00:16:38 Maybe people only know Korea through K-pop. Actually, I think a lot of Americans learn Korean because of K-pop. That's a side thing. But like, you could have done Taiwan, you could have done China. I remember starting a documentary about how China was crazy about English or mad about English. I think that was the title of the documentary. Was it obvious? Were you sure when he went into Korea?
Starting point is 00:16:58 Was it just a test? We visited a bunch of Asian countries when we were thinking about how do we relaunch things, how do we focus in? And we almost chose Taiwan, actually. Yeah. But I think it was a little bit serendipitous. So our first employee is Korean and was my co-founder's college roommate, actually. When my co-founder visited Seoul to check out the market,
Starting point is 00:17:19 he asked S.J to come along as essentially a translator and to, like, you know, facilitate. And I think that just went really well. And it was just very obvious from being on the, ground in the market that Korea is pretty obsessed with learning English. And there is every human-based solution possible, right? You know, like English academies, classes, skyscrapers, full of classrooms, stuff like that. And our logic was basically, if we can really make headway and win this market, that is chalk full of these human competitor products and all these people that fundamentally care about fluency, then we probably have something pretty real and strong
Starting point is 00:18:07 PMF that we could win other markets with. So that was the original logic. And so far it's been working. It's retroactively obvious, which is the best kind of obvious. But like, it's so counterintuitive that you would be the team to do this and not a Korean team, right? Yeah. They would be, they would know because they had personal experience of like, I started in Korean, I learned English. Here's how you do it. Yeah. In hindsight, it's super weird. Right. Like, we were definitely, sitting in an office here in San Francisco, operating with users in a market all the way on the other side of the world,
Starting point is 00:18:39 it would not have worked without Sun Jae. I have to give them a lot of credit here because we paid a lot of attention to the specific wording of button text in the app and localized strings. We had a lot of reports from users pretty early on that they were shocked that it was an American company. They thought, right?
Starting point is 00:18:59 Because you can always tell. There's always some weird wording. or whatever, but there wasn't in speak. And I think that probably had like a large sort of non-tannable effect. Yeah. Focus, attention to detail. Tech stack. This was 2018.
Starting point is 00:19:13 What were you rolling? You just did ASR and there was no LLM. So Burt, but maybe? We actually had really no LLM component of it. So all of the content. Oh yeah. Another thing we did that I forgot to mention was we decided we needed to fully own all the content.
Starting point is 00:19:30 So the way that we teach, all in-house, all sort of thought from first principles. We build this thing called the speak method, which is basically like a pedagogical philosophy around teaching sentence patterns that you drill and then sort of combine into higher order patterns. And all of that was in-house with, you know, our content team and our teachers.
Starting point is 00:19:54 And we build a lot of internal tooling to make this possible. There's just a lot of operational overhead, I would say. This is something we've struggled with to scale to many more languages, and that's like a big research effort within the company right now. We're building a mobile product, right? My co-founder and I have always just loved apps and been big iPhone users. So we cared a lot about the app being native, feeling great, being high performance. The DNA of the company was always consumer. Frankly, my co-founder and I had never worked in a real company. I dropped out of grad school, had a few failed startups, and then eventually started speak. And he had never
Starting point is 00:20:34 worked in real company either. He, just startups in the past. So we didn't know anything about enterprise. Enterprise workflows or like what sort of software real companies used. So I think, frankly, consumer was the only path. I don't think we could have done anything else. We just didn't know enough. And I think that has served us well, though, in terms of just really caring about the craft of it in wanting to build something that felt not 90 to 95% but 95 to 100% in terms of polish. Was it hard to build an engineering team that did that at the time? Because ML engineering is very academia driven back then and then you have like the more consumer stuff that it's maybe more nascent. And it's mobile. I'm now realizing that our story is very weird. So in addition to the
Starting point is 00:21:28 market on the other side of the world. Our first iOS engineer that we hired through a YC referral was in Slovenia. If you don't know where Slovenia is, look it up on Google Maps, but it's you know, it's like a pretty obscure little country. And then we needed to hire a backend engineer and one of his best friends
Starting point is 00:21:45 was a great back-in engineer and we hired him. And then this happened four more times all in the same city. And then we were like, okay, we should probably just open a physical office. So for... Slovenia. Yes. So for several years, we had an engineering office in Slovenia.
Starting point is 00:22:00 What? And then a few people here in San Francisco. And we still do. Now we have 90% of our core product development team in San Francisco here. Office in Fidei, we're really only hiring here. But for the first like several years, that, you know, that was like another very interesting sort of cultural aspect of the company, I guess. I think a lot of early stage founders have to do that.
Starting point is 00:22:24 That's the only people they can afford or whatever. Yeah. What are your tips that make that remote stage work? For us, it wasn't really a price thing. I think legitimately thought he was the best person that we interviewed. And then it just kind of happened that way when you roll it out. Yeah, it's not a price. It's more about remote work, right?
Starting point is 00:22:45 Like, distributed team, early stage. A lot of people say, like, no, you have to move everyone to SF or your startup would die. Yeah. I don't think that we were good at remote work. I don't think that my personality or my co-founder's personality is inherently very good at async, just to be perfectly frank. I actually think that almost like in spite of it, we made it work, it was a little bit brute force. Like I would just sync with them every single day.
Starting point is 00:23:12 Right. And there was pain because the time zone overlaps, like it was like exactly the most inconvenient. But I think for several years we did that. We got really good at the cadence of it. I think they were excellent engineers as well. So it worked out. But if I had to do it over again, I probably wouldn't do it. It's hard to say, yeah.
Starting point is 00:23:32 Should we move to phase two on the LLM side? That's when opening I started opening up. And when did they invest? This was 2022. So that was also when Whisper dropped. And Whisper was a really exciting moment for us. It was actually since we started the company and made that prediction of, okay, in five or 10 years, speech models, language models will become
Starting point is 00:23:54 superhuman level. Whisperer was really that magic moment for us where we were like, oh, shit, I think what we predicted is here. And I pretty distinctly remember this moment in the office when we got access to the model. And we were testing it on an audio clip of like a very beginner English learner in Korea saying something. And it was, if you close your eyes as a human, you'd have no idea what they were saying. There were four of us in the room. We all closed her eyes and none of us had any idea and the model got it right. So, I mean, superhuman. I think that was the moment that we had been waiting on. And at the same time, LMs were on the ascendancy. Chat ChitpT would come out, I think on Thanksgiving of 2022 and 3.5 Turbo came out.
Starting point is 00:24:43 And I think like we kind of realized very quickly that all the pieces we're clicking now, right? Like we have what we need at our fingertips now to go from something that was listen and repeat where the user would see something on screen, hear a reference of the teacher saying the thing, and then they would just repeat the thing, right? It was like very simple. Still a great product, by the way. You know, still grew to like several million error in South Korea. So clearly there was like a big market need for that.
Starting point is 00:25:12 Free whisper. Yes. Wow. This is from like 2019 through 2022. Yeah. And then that's the grind. You need it to hang in there. Yeah.
Starting point is 00:25:20 And again, I think that if we, there were many moments when things weren't working from 2017 through 2019, we were looking in the mirror and we were like, why are we doing this? This is crazy. But I think we were so convinced about the vision. We just like couldn't believe that the vision would not come true. So we stuck with it. So fast forward to 2022, the pieces started coming together. We realized that we could start building something that felt more like a link.
Starting point is 00:25:47 language tutor. That could give you feedback. That could start explaining to you why you did something wrong. And that was act two of speak, a true English tutor. This is something that a lot of founders struggle with today. It's like, I'm kind of building something hoping that the models get better later. Yeah. How did you feel once the models got better? Did you feel like, okay, I am ahead of the curve because I built all this history of building product and like doing all this work? Or did you almost feel like, okay, we spend all this money and time building these models, and now we're just going to use whisper. It was purely positive for us.
Starting point is 00:26:23 We still kept using our custom AISR system because it was streaming real-time, really fast, really well-fined. Whisper wasn't streaming, so it was a different use case. We used it for the more spontaneous stuff. And I think in almost every way, we were just really excited because pretty directly, as the frontier of model intelligence improved, it would just unlock things on a roadmap that were locked before, if that makes sense.
Starting point is 00:26:53 And we still really operate in that mode today where we take a model and then we try to think about, okay, how do we saturate model capability by building product on top of it? And then it happens again, right? And then we build and saturate the model capability again. I think that's a really cool paradigm
Starting point is 00:27:09 to, like, you know, think about. But all the LM stuff basically allowed us to build a tutor for English, and we still didn't have real-time voice, for example, right? But the barriers are coming down now. Obviously, it's a really hot topic. We're actively building out a real-time voice platform
Starting point is 00:27:25 that we can build a lot of more verticalized specific lesson experiences on top of that I'm super, super excited about. I don't think they're going to replace our current lessons. They're going to be more immersive, just a different thing, probably for more advanced learners. Still language learning, though, not broadening out from language.
Starting point is 00:27:44 Yeah, so I think that language learning is interesting because it is so universal. 99% of people you know have certainly tried to learn language. And it's so hard, right? Becoming fluent just has a huge failure rate. And it's something people are willing to pay for. So I think that has been just like a pretty amazing beachhead for us. And I think we'll be doing language learning for a long time. There's a huge, huge, huge company to be built here.
Starting point is 00:28:11 But our even longer term ambition is really, this idea that even beyond language, we think AI will reinvent how people learn anything, right? It already has for me, right? I use chat GPT to learn things every 10 minutes. And I think I'm just naturally like a very curious person. So whenever I'm thinking about something, I want to know more about it.
Starting point is 00:28:31 And then I'll naturally go to chat GPT and then I'll learn about it. It's unlocked this like entirely new dimension of learning. And I'm spending way more time learning as well as an adult, which is really cool. and I want to bring that in a more sort of structured systematic way to everyone. So I think that that's like the vision beyond language. I'm curious to sort of double click on to just the tech side. We talked a little bit about the content that you own and develop in house.
Starting point is 00:28:59 We talked a little bit about the onboarding memory. I assume that you have conversational memory as you go, right? And any other major pieces of the puzzle that really unlocked it for you? So there's a few things I can talk about. I think one thing is, in order to go from teaching English to teaching a bunch more languages, we needed to really figure out more direct AI content generation. That was a pretty, right, because it's hard to scale, like, our little studio in L.A. where we shoot a lot of the video lessons.
Starting point is 00:29:30 All of the scripts were written manually before by our content team. But we want, like, 100x more content, right? and 10x more languages. Eventually 100x more language pairs, which is how we think about it. It's like what's your native language and then what language are you learning. And really, the only way to do that
Starting point is 00:29:50 is to make it more AI generated. And, you know, very much like a AI native company. We want to be on a frontier here. We want to keep a small team and to have as much leverage as possible through these types of tools. So that's a big active area where we're building out,
Starting point is 00:30:07 I think using, you know, people overuse the word agent, but we have a tutor agent, we have a curriculum writing agent, we have a giant LM-based pipeline that creates a curriculum, scaffolds it in the right way, writes the lessons themselves. That's a big active area that will basically help us to scale to a lot more markets and a lot more languages. So that's like one big thing. Another big thing is we care a lot about fluency, obviously. Specifically, we want to be able to quantify how fluent you are. So if you're learning Spanish, it's like, okay, what does it mean to be fluent, right? There's some real world test for that.
Starting point is 00:30:47 We care about real world fluency. Your ability to go to Mexico City and go to a street taco stand and actually order, right? That's a very functional fluency in one aspect. You might be really good at that, but be completely unable to, like, talk about your family, right? So the frontier of fluency is very jagged, but we're very pragmatic. and we care a lot about meeting user goals and helping them become fluent at what they care about. And we're thinking a lot about,
Starting point is 00:31:16 okay, how do you quantify that? How do you actually store a knowledge graph of everything you know about Spanish in terms of the vocabulary you know or you don't know, the patterns that you know or you don't know, the mistakes you made using speak over the last month that are clustered? You said the magic word of knowledge graphs. Is that live?
Starting point is 00:31:38 Is that experimental? There are aspects of it that are live. And it's a very sort of multi-dimensional system where we think of it as there are many aspects of fluency, right? There's many subscores. And we have a few of them that are currently live and we're actively developing other aspects of it. And then all of those will fold up into a more holistic fluency score.
Starting point is 00:32:00 The idea is that eventually once we have a complete enough picture, everything will fold up into, a number that we call the speak score that is a very sort of holistic measure of just like, how good are you at Spanish? Right. And obviously, 54, it's kind of meaningless by itself, but it does give you a general sense, right? Like being at 54 versus being at five is very different, right? And I think everyone can kind of like intuitively understand that. And it's surprising, like I would have grounded it more in real world. Like we will get you to pass this exam that is a standard that is like the ESL standard or whatever.
Starting point is 00:32:39 So the way that we think about that is we don't really teach for the test. I think it's possible in the future that we'll do a test prep product. But in general, we care about real world proficiency in various functional situations. So the way that we think about it is if you're at this level, then these are the things you can do. Right. So it is exactly that. We have that a lot in Italy. I grew up in Italy, so English is my second language.
Starting point is 00:33:03 and there's a lot of people that pass a lot of tests and, like, yeah, high grades and all the classes, and then they travel to the U.S. and the U.S. and the U.S. and the U.S. and the U.S. and it's, like, I feel like the hard part is, like, being in the conversation, you know? I think, like, when I started my written and reading was, like, much higher than that conversation, which, like, doesn't really help you if you're, like, traveling somewhere. That's me for Chinese, because my parents spoke manner to me growing up, so I can understand, like, a non-trival amount, but I'm very bad at speaking.
Starting point is 00:33:32 I heard there's a good language learning products. I had one question on the course generation. How do you eval that product? When you're asking the AI to generate courses, how do you figure out the courses are going to be good? Rely very heavily on our content team. And we are trying to build out an e-ball suite. It's really hard.
Starting point is 00:33:55 The illustrative example here is that as we try to hire and train new content writers on our content team, it's so nuanced. There's many different aspects of training them in the speak method and how to write the right types of lessons and articulating why this form of lesson, which is subtly different from this other form of lesson, is better. Right?
Starting point is 00:34:22 So we try as hard as we can to articulate that. So I think like forming like a sense of e-vales, using model graded evils like that, that's one piece of it. And I also think, like, in the future, a really good curriculum or lesson writer agent will probably be, like, reinforcement fine-tuned on a lot of our internal data as well. That's something we're experimenting with, but it's still pretty early. This seems like a great example of, like, you know, AI removing jobs, which is like, oh, you're creating the courses with AI, you don't have a person.
Starting point is 00:34:53 But it's actually like, instead of one person creating two courses, like reviewing 50 courses that they generate. That's kind of how you're seeing the content team. The way that we see it, really not just for our contenting members, but also I think it's perfectly applicable to engineering is that it's leverage. It just allows you to do 100x in the same amount of time. We still need human review of the syllabus, the curriculum, the specific lines, etc. But the hope is that this will allow us to launch 100x more courses. A lot of language is colloquial. I think the way that you put it on one of our episodes one time was the Italian that is taught in school is not the Italian-Italian-Italian-speaking.
Starting point is 00:35:32 Yeah. How much of that do you adjust for informal versus formal? Entirely. That's one of our fundamental tenets, which is that we don't teach textbook English or textbook language. We try very hard to teach Gen Z slang. We don't go quite that far, but slaking. We try to teach very casual, conversational language.
Starting point is 00:35:56 that is actually what real people use. And like you said, that's usually very, very different. Like if you pick up like a typical English textbook in Korea, it's all really traditional and weird formulations. And it's not how people actually speak. Yeah. I know you're going to release Italian soon. So I can give you a hand on that.
Starting point is 00:36:16 I don't in the US, there's not that many dialects. There's like accents, but like most of the language, like the words that people use are similar. Because I know, for example, Spanish is like, you know, Spanish spoken in Argentina is like very different than Spanish spoken in Mexico. How do you kind of adjust for that? Or maybe you don't. So I would say that, for example, currently we teach American English, standard American English. We don't really teach other accents or other dialects.
Starting point is 00:36:43 For now, given how small we are, we just have to be pragmatic and teach in the direction that most people want and most of our users know. So we've made those decisions like on the contenting side for American, and Spanish every language that we're teaching. But I do expect that in the future, we're going to get a lot more sharply differentiated. Like, if you want to learn British English, then we'll teach you British English. We'll teach you how to pronounce it, etc. I think all of that feels like something that superhuman language, you know, tutor should be able to do. I just think it'd be very funny if all the Koreans had like a very distinct southern accent. It'd be great to make that happen.
Starting point is 00:37:19 Yeah. I do think about this because, you know, obviously there's a moving of the goalpost. Like now that we have this, now we want the next thing. Yeah. And obviously, people who are English as a second language always have an accent. Like, I haven't, like, a lot of people think I don't have an accent, but if you know any Singaporeans, you know, I'm saying important. How much accent training is important, right? Like, I think, like, actually, that does help a lot with, for people.
Starting point is 00:37:44 Yeah. And you cannot tokenize accents yet. Yes, that's right. So I have two main thoughts on this. I think the first one is that communication and your ability to, speak spontaneously and get us a concept across, an idea across, is almost fully orthogonal
Starting point is 00:38:03 to pronunciation. You can be really bad at pronunciation, but still communicate effectively. So a lot of the current core product experience is about just speak as much as possible, make mistakes, don't worry about screwing something up on the accent or the pronunciation side. The important thing is that you literally move your mouth and you make the sounds,
Starting point is 00:38:24 right? And it turns, out there's like a really key psychological barrier there where people are just not willing to do this in front of a human, even if it's a human that is a teacher that you're paying, right? So a lot of the core message of our marketing campaigns in many of our like biggest markets is along lines of like you can make mistakes in this private space with speak. And I think psychologically that's extremely powerful. And then you can go and get it right more confidently in the real world after you practice with speak.
Starting point is 00:38:54 Now, having said that, people do care about their pronunciation and their accent, right? So we have, for English only right now, a pronunciation coach that is basically like a fine-tuned version of Wave to Vec, which is a metal model. But we basically fine-tune it on a bunch of our own phonetic transcripts like fine-tuned data. It works pretty well. It's currently for single words. We're going to expand it to full sentences, to more languages, et cetera. but I think that just if you look at like the pure market opportunity, our sense is that we really want to push people to just speak very freely as much as possible.
Starting point is 00:39:37 You know, just get that volume up. Yeah. Yeah. In terms of immersing language learning in the real world, one of the more interesting approaches that people keep trying is to have, let's say like a Chrome extension or something on top of a page. I think Tuchan was doing this. There's a bunch of those.
Starting point is 00:39:53 Yeah. Yeah. And then there was another one. saw recently, which is like watch a YouTube video and it'll transcribe for you, but randomly mask out words. I saw that too. Yeah, yeah. Yeah. That was like a show Hacker News. Yeah. Do those work? There's kind of the question of is, you know, is that the right product? Right. I don't think. So basically the difference is your content or real world content, right? Obviously, you want real world content. I think that for work, right? So for speak for business, for the,
Starting point is 00:40:19 for the BDB product, another part of the vision is really like, what should, a superhuman language shooter be able to do. It should probably be able to handle kids as well as a Samsung employee that wants to transfer to the U.S. office and wants to use speak for work, right? So our view there is that it's the same product. It's a different distribution mechanism, right, consumer versus B2B. And I think that we will eventually build something like a Mac app. Maybe it'll be integrated with the browser in some way. We're not really sure yet. But obviously, in order to apply it to your day-to-day, there needs to be some way to hook into your actual sort of work documents, whatever.
Starting point is 00:41:02 That's a whole can of worms. We are actively thinking about it. But I think my sense is that it's not clear to me that any of these products have really taken off. And I think that there's many other approaches that are possible. I don't have the answer. But like another example, very hypothetical future world is maybe, Open AI, you know, the new Johnny Ive thing will come out with some hardware that will be listening to you all day. And then we can, you know, give you some sort of like very deep analysis that is integrated with the speak app at the end of the day or like, you know, the end of the week, whatever. I don't know. Okay, remember time since you brought that up. I'm sure you don't, actually, they haven't told you anything, but what? I don't know anything. What's it going to be? I don't know anything. It's like a very, like, this is the number one topic and all the parties I go to now. Really? Yeah. What's the most compelling idea you've heard? Okay. So there's, there's people that say, Joe, he hates wearables. Yeah, I've heard that too. I'm like, if it's not a wearable, then you
Starting point is 00:41:58 just made a second phone. In that case, just make a phone. Yeah, I thought they said it was, I mean, didn't Sam say that it was got, you wanted to do a phone in the past? That was in the far, that was like in the far past. He says a lot of things. He'll say a lot of things. Yes. Okay, anyway, I think wearable makes sense. I think the, the race is to capture context. I have a wearable. Yeah, we have a wearable here too. Yeah. Transcribe everything. That transcribes everything? Yeah, that's cool. Yeah, it's a previous episode of ours with, I can hook you up if you want. Yeah. But yeah, I think like it's something that a lot of people are interested, obviously, because it's a huge bed by them. And yeah, it's curious. Okay, you mentioned video. I just wanted to
Starting point is 00:42:38 double click on that a little bit. I'm sure engagement very high for video because people love to watch video. I thought that speak would be one of those places where like you just kind of leave it in your pocket, you take a walk, learn to speak. Probably that's not true. What we've done so far, is part of the course experience is a teacher video. We've tested other more audio forward types as well. We found that, of course, like you said, video is very engaging, but at the same time, we have a lot of users that do want to be able to walk around with the phone locked in their pocket. So doing something that is more like voice mode with optional visuals, I think, is really good. I think there's a huge opportunity for a better way to learn things like listening comprehension.
Starting point is 00:43:29 So I took German in grad school for two years, and I thought I was getting somewhere, but any time I listened to a native German speaker, it's so fast. It's completely on a different level. And I think you can imagine a plethora of really cool experiences that feel kind of like you're listening to a podcast, but it's all AI generated. It's fully controllable. It's integrated with the app. You know, there's something there for sure. Yeah. Don't want to do AI podcast, man. We're cooked. It's okay. We'll document the own ending. I mean, I think when that happens, we just end the show. Like, why not? To zoom out a little bit, in the pretty near future, multimodal models will cross the threshold where they will be able to generate images a lot faster
Starting point is 00:44:18 than they currently are maybe somewhat close to real-time even, right? And audio at the same time, text at the same time. And you can imagine just like a very powerful multimodal tutor that can kind of do it all at once where there's an audio track. And then if the teacher is teaching something, with the right timing, it chooses, okay, at this point, I'm about to introduce a new concept. So I'm going to show the word on screen so that the user can see how it's spelled. right there's a lot there you can do generative UI a lot of nuance there where it's easy to do it badly
Starting point is 00:44:52 but to do it well requires a fair amount of reasoning and mental modeling of what the user knows which feeds into what you need to show at what time so that's probably going to have to be like a pretty parallel set of systems have you spent any time looking at this like you know like video 3 where you do video plus audio at the same time on how you can tweak the audio part versus the video part because I can't imagine you might work on a video part and then you want to change the audio generation model. I don't actually know how the model works inside on like how much we haven't really looked at the video stuff much. We basically think that we're very bandwidth constraint, right? So we're just scaling and trying to hire as fast as possible like
Starting point is 00:45:34 everyone else is. And as a result, we're really focusing on just like the most in reach highest impact things. I do think that the barriers are coming down very fast for. all of this sort of stuff. I'm just so excited about multimodality and where things are going here, because imagine if you're learning Spanish, being able to look at an image that the model generates for you, and then doing Q&A on it, right? Like a beach scene, and then the model will ask you, like, how many people are running on the beach? And then you have to sort of respond in the target language that you're learning. Very traditional language learning exercise, but you can imagine it being fully generative, which is really cool. Awesome.
Starting point is 00:46:14 Lots of stuff like that. The engineer in me worries about inference costs, but I think you can just kind of sweep that under the rug. See if it works first, and then you can worry about costs. You mentioned a real-time voice platform. I just want to give you the platform to talk more about that. Just like you mentioned, for example, that you're a very heavy user of the real-time API from OpenEI,
Starting point is 00:46:36 and you build a bunch of tooling around it. Yeah, so we last year had early access to the real-time API, and there's a very obvious sort of use case for language learning. I think one common theme that has just been pretty awesome since LMs came out is that language learning as an application is just a really good fit for LMs, all these model types in almost every way, which has been just really great for speak specifically. For real time, I think the audio piece promises to really infuse almost every surface in the app. you can imagine this is the primary way that you talk to your tutor, right?
Starting point is 00:47:16 And an additional complication is that it needs to be multilingual, and there needs to be code switching. So that's a pretty frontier problem right now, right? So like I should be able, if I'm learning Spanish, to speak both English and Spanish and vice versa from the model. That's a pretty hard TTS problem today. It's actually like only a few models are able to speak two languages in the same sentence and then pronounce them properly.
Starting point is 00:47:39 You can't have a router model, like a tiny little round. model, guess which language first, and then route. Well, the problem is that there's, you could have a subword in a single sentence in a different language. Yeah. So you can't just concatenate either. Yeah. Because it won't sound right.
Starting point is 00:47:57 Yeah. It won't sound natural. That's not how humans do it. So this seems to be like a, like a very native, controllable audio, you know, function. Yeah. But we are in the process of building a variety of experiences on top of the real-time API. I want to clarify that actually nothing is in production yet, mostly for price reasons, frankly.
Starting point is 00:48:16 The pricing model of the real-time API makes more sense for something like a customer support agent where you're very directly replacing somebody that you would pay hourly otherwise. And that's how you're seeing the pricing model for a lot of these initial agents work out. For us, we want our users to be able to do these real-time role plays and have these conversations for many hours a day, right, if they want.
Starting point is 00:48:40 Right? Getting cost under control is definitely a pretty key consideration right now. But we are pretty close. Maybe actually, even by the time that this episode is released, we'll have something live. But we have what I think is a really cool application of the real-time API, which is basically a new instructional lesson, where it's the model actually teaching you something,
Starting point is 00:49:02 like a new language concept. And it's intended to sort of augment slash play the same role as our current video lessons, which are the instructional lesson type. And it's interactive, obviously, at certain points in the three to five minute lesson, you're interacting with the real-time API. It's semi-on-guard rails.
Starting point is 00:49:21 There was a lot of scaffolding we needed to build to basically, number one, switch between the interactive and non-interactive portions of this lesson properly, if that makes sense, right? There's some portions where you're just listening or looking, and then some portions where you're actively in a short conversation. and we kind of swap back and forth. And we have like a bunch of sort of custom architecture and info around that.
Starting point is 00:49:46 And then there's also making the cost makes sense, or at least like semi-make sense. And then there's a bunch of web RTC infrastructure. We're at sort of, you know, not huge, but non-trivial scale either. So we definitely just, it'll cost us millions of dollars if we do something wrong. Yeah. Yeah. Do you do inference in Korea because of the latency and all that? It's something that we have been increasingly paying attention to for all the real-time paths.
Starting point is 00:50:15 I would say two or three years ago, when real-time stuff was still quite nascent, users didn't really care as much. But I think now the standards have risen, right? Like latency has to be low. Everyone cares. Do you have a hard latency budget for responses or do you just kind of work it out? So, for example, right, like you have a knowledge graph that you're accessing, you have content that you're retrieving, there's a lot of stuff there. And then like, you know, maybe you're using a reasoning model, probably not.
Starting point is 00:50:44 But like that all eats into the budget. I will say that from the like real time engineering side, everyone talks about, okay, submit user request to get agent audio response like first bytes, right? First audio bytes. What's that latency? And then we try to get that as low as possible. I would argue that's actually like a vanity metric because what you don't take into account is how the VAD works. How do you do turn detection to detect when the user is finished speaking?
Starting point is 00:51:14 Right? Because that can easily add like another second if you do it badly. And nobody talks about that for some reason, right? Like what you need to measure is actually when does the user stop talking to when does the model for SautioCom? And usually that number is much larger. That is a very domain specific problem. You can use like the semantic VAD on real time API for regular English conversation. And that will basically, classify at every token how likely it is that you're done speaking as a sort of normal conversational English speaker like in this conversation, that's fine, but it doesn't work at all for language learners, right? If I am trying to respond in a language that I'm learning,
Starting point is 00:51:53 I'm going to be hesitating halfway through for 10 seconds, for more, right? So it needs to be fully custom, probably. This is something that we're also actively working on, but that is actually like the dominating factor in perceived latency. Coding. Do you use cursor, windsurf, and other autonomous agents? It's kind of all of the above. Yeah. So I think like as the CTO, I view it as part of my responsibility to really set expectations, push everyone on the team, show them what's possible. We've been trying everything. Yeah. And I think we try to basically set the expectation that the frontier is moving so fast,
Starting point is 00:52:37 it's deeply non-intuitive. If you've tried coding tools six months ago, and they weren't that great, especially if it's not typescript or Python, right? It's like more collapse to the most popular languages. Yeah, all it is. We try to set a culture in the engineering team
Starting point is 00:52:55 where usage of these tools as much as possible and as the default path is the expectation. And in hiring, we are now explicitly asking about this a lot, thinking about what are the types of people that are going to be better, higher agency at trying these types of tools. It's so important.
Starting point is 00:53:16 Before we zoom out, anything we missed about speak that you really want to highlight or something that people underwrite about it. One thing that I've always been really excited about is that I feel like a lot of the foundational pieces that we're building around Knowledge Graph, for example, a lot of these concepts should be applicable to not just learning language, but also other things in the future.
Starting point is 00:53:39 We're already starting to see the very beginnings of this on the B2B side, where a lot of it is more like management skills and hospitality skills, communication skills, more like true L&D for enterprise, less like core, pure English proficiency. So I think you're, you know, that's like obviously immediate neighborhood. but you can imagine many academic subjects, math, biology, etc., you know, it's going to also work for. I'm super excited about that. If I knew my employer was giving me a language tool, but then it was evaluating me on my management skills while learning language, I might use it less. Just, you know, you want to separate that out.
Starting point is 00:54:21 Yeah, very fair. I agree overall that the knowledge graph problem is very important. We have a whole track on it for the conference. and I think that the amount of data can be so high. And actually, like, you want to generate relevant triplets. I assume you use the normal subjects, predicate objects type. It's a bit more custom than that because it's a bit more domain specific around the way that we conceptualize the vocabulary you know and the sentence patterns and so on.
Starting point is 00:54:47 So it's more specifically around like language learning concepts if you all. But what I think we can extract from speak or as it as generalized as a framework is what I think calling sort of like the bloom two segment problem type thing, like the level adjusting tutor. Like where are you at? Let me adjust my thing to where you're at. And then I'll put you up to the next level. And I think the knowledge graph is a part of it. But I don't know if that's all of it. I've never seen a working example. We are approaching that problem from a few different angles. I think part of it is knowledge graph. Part of it is being very careful in how we structure the curriculum so that you're placed at the right level,
Starting point is 00:55:25 so that the learning path itself, which has a foundational backbone, because beginner-to-intermediate, age learners, actually all need to know a bunch of similar concepts. It isn't really until you get intermediate and more advanced, where that starts to, like, more sharply diverge, and A-0 through B-1, I would say,
Starting point is 00:55:46 there's a pretty well-defined, like, sort of linear path, actually. A lot of the deep thinking that we've done around how do we structure the pedagogy is also super useful in terms of just like matching people to the right level. And then you can take this backbone and then basically modify it based on the knowledge graph, on your system's knowledge of what the user is like bad at versus good at. I think a lot of startups or especially ed tech, like that is the core engine. Like, you know, once you do that, you can kind of teach anything.
Starting point is 00:56:19 Totally. Yeah. We have a few more broader fun questions. Yeah. So speak.com. Yeah. Domain. I looked at out, voice.com got bought for $30 million.
Starting point is 00:56:30 Oh, my. Really? When? 2019. Okay. So I don't know if you want to share how much you paid for it, but I think it was a lot less. I figured it would be a lot less, but I'm curious if my estimate was 100K. But it was more than that.
Starting point is 00:56:43 More than that. Wow. Okay. I'm not going to say anymore about the numbers. So what was the, yeah, what was the sort? Was it easy? Was it? Did you use a broker?
Starting point is 00:56:50 We had, you know, their mesh shop from Outspot, who sold chat.com to open AI and he has a lot of very... That was like a hundred million deal or something, right? That was very big. Oh, wait, no, that was AI.com. Chad.com. Oh, chat.com. Okay.
Starting point is 00:57:02 Yeah. We bought it several years ago. It felt very expensive for us at the time. It was a little bit of a crazy move, but I think we were very convinced that we needed a super strong consumer brand that was scalable globally. And that was just always our ambition. Like, we want to be the way the next billion. people learn languages. And we need speak.com. So we don't regret it. It's such a, such a great word,
Starting point is 00:57:29 makes for great swag, very nice decision. You had a couple other fun, fun questions. Any fun Korean celebrity stories because you work with so many influencers? We have a bunch baking right now, but I think it was, you know, something more generally that has just been so fun on the journey. So we would visit Seoul every year. Yeah. And see. Seeing speak go from nothing to the first time we saw somebody on the street using speak. To now our main teacher in the app is like a mini celebrity. People come up to her on the street as she's just walking around Seoul and recognize her from the app, which is really cool. Now we do a lot of advertising.
Starting point is 00:58:11 We do billboards, TV commercials. We work with big influencers and so on. So just like seeing the scale of that has me kind of like in awe. It's like really cool just to see something that used to be nothing. I wanted you to name drop like black pink or I don't know. Look, there's some stuff baking right now. Okay, all right. We talked about the Tio Fellowship.
Starting point is 00:58:31 Yeah. On your LinkedIn, you're going to have this whole between 2012 and 2016, which you talked about, you did some startups. Any of them that you want to share? Like, ideas that you worked on that you thought were cool. Maybe it was just early, but, you know, yeah, what people should revisit. I've always been interested in learning and education. One of the other field startups that I did in that time was, it feels silly to even talk about this because I amounted to nothing.
Starting point is 00:58:58 But it was called Bloom, you know, the Bloom-2-Signware problem. It was actually like named after that. And we were trying to basically build like a better adult learning platform and have really cool interactive, job script widgets for various concepts that you could learn, didn't find PMF. I was young and didn't really know anything about business at the time either. But I think that the common thread through actually like everything that I've been interested in since leaving grad school has been, how do we build software, build tools that help people learn things more effectively and better and faster.
Starting point is 00:59:34 And now I feel very lucky to be in this position because obviously AI is the ultimate version of that. And it's been completely transformative for me personally because I just get a lot of just inherent fun. and pleasure out of being able to think of a concept and then, oh, now I can talk to this omniscient LLM that can tell me more about it. And I'm really good at asking the right follow-up questions that I want to know. So that's been completely transformative for me. Do you get a lot of like people using speak for therapy? Like, you know, because it's not meant to be that, but since you have inference, they will use it. In 2023, when we first launched our AI role plays using GPT4.
Starting point is 01:00:20 Back then, people were way more concerned about safety, right? And obviously, the models now are much better at refusals and line sharper between what's appropriate and not. But we did see a lot of our first users start to put in pretty questionable custom scenarios. You probably guess. And, you know, like, this was something we expected, but I think seeing the logs in person It's like very different.
Starting point is 01:00:46 Got it. Some shocking stuff in there. Last couple questions. One on Andre, you talked to him in your machine learning journey. He's also working on ed tech now. I don't know if you've ever had conversations with him. No, I haven't. He's also interested in language learning, by the way.
Starting point is 01:01:01 One thing that I think we didn't really realize early on, or like fully internalized at least, was just like how deep the market is. Same more? It was so universal where we really struggled to do some of the basic startup stuff around define your like ideal customer profile and like, you know, segment your users because our users were everyone. Like we had parents using it with their kids.
Starting point is 01:01:26 We had really old people using it. We had people using it for work. So that was kind of like mind-boggling. You still did customer segmentation or are you saying it doesn't matter? I'm saying it was hard to do. Like we tried. And we have a sweet spot in Korea. It's like 25 to 45 more professional, more white collar.
Starting point is 01:01:43 but it's very long tail on either side. Yeah, I think it's a huge market. And I think it's a very special moment in time right now where it's obvious that a lot of the tech is here. I think it's really good for humanity if we make a lot of progress here. So I'm really excited for his company too. We started asking about the Tiel Fellowship,
Starting point is 01:02:05 so maybe we can wrap with one of TIL's favorite questions, which is, what's something you believe in today that most people will not agree with you on? I think that people, if you recall, expected the world to kind of explode when GPT4 came out. And, you know, like everything would change. And I think if you, like, go to another state outside of the Bay Area, probably even in California, outside of the Bay Area, and then you ask somebody how much their life has material changed,
Starting point is 01:02:33 it's like pretty close to zero. Real world inertia is enormous. Obviously, AI is probably the most transformative, of technology we've ever built. But I think in a very real sense, the world hasn't changed that much either. And that's a really weird thing. Right. So I think we need more builders. We need more people building applications. It's weird to me that speak is actually like not that many net new consumer AI native applications at scale. Like there should be way more. I would love for there to be way more. Consumer is hard. Yeah. I'm intimidated. But like, you know, it was just like there was never any
Starting point is 01:03:09 alternative for us. Yeah. You didn't have a choice. but also you're very smart. But also maybe you have some growth hack things that you can advise people on that people could learn. But yeah, I agree. I think the general take actually is this is what we want, which is slow takeoff short timeline.
Starting point is 01:03:26 That's fair. Right? This is the two by two that everyone always talks about in AI safety. You've seen slow takeoff and like maybe don't complain. We have a heads up because or you know, Dario's right and like half of us lose our jobs in the next two years. Yeah. It's so hard to appreciate it.
Starting point is 01:03:42 Yeah. Sometimes I get AI anxiety and then I just... You get anxiety? Yeah. Okay. And I just focus on our users. It's a perfect place to wrap. Thank you so much for taking the time. Yeah. Thank you so much. This is great.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.