Lenny's Podcast: Product | Career | Growth - The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Episode Date: December 7, 2025

Edwin Chen is the founder and CEO of Surge AI, the company that teaches AI what’s good vs. what’s bad, powering frontier labs with elite data, environments, and evaluations. Surge surpassed $1 bil...lion in revenue with under 100 employees last year, completely bootstrapped—the fastest company in history to reach this milestone. Before founding Surge, Edwin was a research scientist at Google, Facebook, and Twitter and studied mathematics, computer science, and linguistics at MIT.We discuss:1. How Surge reached over $1 billion in revenue with fewer than 100 people by obsessing over quality2. The story behind how Claude Code got so good at coding and writing3. The problems with AI benchmarks and why they’re pushing AI in the wrong direction4. How RL environments are the next frontier in AI training5. Why Edwin believes we’re still a decade away from AGI6. Why taste and human judgment shape which AI models become industry leaders7. His contrarian approach to company building that rejects Silicon Valley’s “pivot and blitzscale” playbook8. How AI models will become increasingly differentiated based on the values of the companies building them—Brought to you by:Vanta—Automate compliance. Simplify security.WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUsCoda—The all-in-one collaborative workspace—Transcript: https://www.lennysnewsletter.com/p/surge-ai-edwin-chen—My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/180055059/my-biggest-takeaways-from-this-conversation—Where to find Edwin Chen:• X: https://x.com/echen• LinkedIn: https://www.linkedin.com/in/edwinzchen• Surge’s blog: https://surgehq.ai/blog—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Edwin Chen(04:48) AI’s role in business efficiency(07:08) Building a contrarian company(08:55) An explanation of what Surge AI does(09:36) The importance of high-quality data(13:31) How Claude Code has stayed ahead(17:37) Edwin’s skepticism toward benchmarks(21:54) AGI timelines and industry trends(28:33) The Silicon Valley machine(33:07) Reinforcement learning and future AI training(39:37) Understanding model trajectories(41:11) How models have advanced and will continue to advance(42:55) Adapting to industry needs(44:39) Surge’s research approach(48:07) Predictions for the next few years in AI(50:43) What’s underhyped and overhyped in AI(52:55) The story of founding Surge AI(01:02:18) Lightning round and final thoughts—Referenced:• Surge: https://surgehq.ai• Surge’s product page: https://surgehq.ai/products• Claude Code: https://www.claude.com/product/claude-code• Gemini 3: https://aistudio.google.com/models/gemini-3• Sora: https://openai.com/sora• Terrence Rohan on LinkedIn: https://www.linkedin.com/in/terrencerohan• Richard Sutton—Father of RL thinks LLMs are a dead end: https://www.dwarkesh.com/p/richard-sutton• The Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html• Reinforcement learning: https://en.wikipedia.org/wiki/Reinforcement_learning• Grok: https://grok.com• Warren Buffett on X: https://x.com/WarrenBuffett• OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil (CPO at OpenAI, ex-Instagram, Twitter): https://www.lennysnewsletter.com/p/kevin-weil-open-ai• Anthropic’s CPO on what comes next | Mike Krieger (co-founder of Instagram): https://www.lennysnewsletter.com/p/anthropics-cpo-heres-what-comes-next• Brian Armstrong on LinkedIn: https://www.linkedin.com/in/barmstrong• Interstellar on Prime Video: https://www.amazon.com/Interstellar-Matthew-McConaughey/dp/B00TU9UFTS• Arrival on Prime Video: https://www.amazon.com/Arrival-Amy-Adams/dp/B01M2C4NP8• Travelers on Netflix: https://www.netflix.com/title/80105699• Waymo: https://waymo.com• Soda versus pop: https://flowingdata.com/2012/07/09/soda-versus-pop-on-twitter—Recommended books:• Stories of Your Life and Others: https://www.amazon.com/Stories-Your-Life-Others-Chiang/dp/1101972122• The Myth of Sisyphus: https://www.amazon.com/Myth-Sisyphus-Vintage-International/dp/0525564454• Le Ton Beau de Marot: In Praise of the Music of Language: https://www.amazon.com/dp/0465086454• Gödel, Escher, Bach: An Eternal Golden Braid: https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. To hear more, visit www.lennysnewsletter.com

Transcript
Discussion (0)
Starting point is 00:00:00 You guys hit a billion in revenue in less than four years with around 60 to 70 people. You were completely bootstrapped, haven't raised any VC money. I don't believe anyone has ever done this before. We basically never wanted to play this Silicon Valley game. I always started ridiculous. I used to work in a bunch of the big tech companies. And I always felt that we could fire 90% of people and we would move faster because the best people wouldn't have all these distractions. So when we started a surge, we wanted to build it completely differently with a super small, super elite team.
Starting point is 00:00:26 You guys are by far the most successful data company out there. We essentially teach AI models what's good and what's bad. People don't understand what quality even means in the space. They think you could just throw bodies out of the problem and get good data. That's completely wrong. To a regular person, it doesn't feel like these models are getting that much smarter constantly. Over the past year, I've realized that the values that the companies have will shape the models. I was asking Claude to help me dropped an email the other day.
Starting point is 00:00:51 And after 30 minutes, yeah, I think it really crafted me to perfect email when I sent it. But then I realized I spent 30 minutes doing something that didn't matter at all. If you could choose the perfect model behavior, which model would you want? Do you want a model that says, you're absolutely right? There are definitely 20 more ways to improve this email, and it continues for 50 more iterations. Or do you want a model that's optimizing for your time and productivity and just says no? You need to stop. Your email's great. Just send it and move on. You have this hot take that a lot of these labs are pushing AI in the wrong direction.
Starting point is 00:01:18 I'm worried that instead of building AI that will actually advance us as a species. Curing cancer, solving poverty, understanding the universe. We are optimizing for AI stop instead. to be optimizing your models for the types of people who buy tabloids at the grocery store. We're basically teaching our models to chase dopamine instead of truth. Today, my guest is Edwin Chen, founder and CEO of Surge AI. Edwin is an extraordinary CEO and Serge is an extraordinary company. They're the leading AI data company powering training at every frontier AI lab. They are also the fastest company to ever hit $1 billion in revenue
Starting point is 00:01:54 in just four years after launch, with fewer than one-one-one-year-old. 100 people and also completely bootstrapped. They've never raised a dollar in VC money. They've also been profitable from day one. As you'll hear in this conversation, Edwin has a very different take on how to build an important company and how to build AI that is truly good and useful to humanity. I absolutely loved this conversation and I learned a ton.
Starting point is 00:02:19 I'm really excited for you to hear it. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It helps tremendously. And if you become an annual subscriber of my newsletter, you get a ton of incredible products for free for an entire year, including Devon, lovable, replid, bolt, n8-n, linear, superhuman, D-script, whisper, flow, gamma, perplexity, warped, granola, magic patterns, Rickach, chip, your D, mob, and post hog, and stripe atlas. Head on over to lenny's newsletter.com and click product pass.
Starting point is 00:02:48 With that, I bring you Edwin Chen after a short word from our sponsors. My podcast guests and I love talking about craft and taste and agency and product market fit. You know what we don't love talking about? SOC2. That's where Vanta comes in. Vanta helps companies of all sizes get compliant fast and stay that way with industry-leading AI, automation, and continuous monitoring. Whether you're a startup tackling your first SOC2 or ISO-27-001 or an enterprise managing vendor risk, Vanta's trust management platform makes it quicker, easier, and more scalable. Vanta also helps you complete security questionnaires up to five times faster so that you can win bigger deals sooner.
Starting point is 00:03:29 The result? According to a recent IDC study, Vanta customers slashed over $500,000 a year and are three times more productive. Establishing trust isn't optional. Vanta makes it automatic. Get $1,000 off at banta.com slash lenny. Here's a puzzle for you. What do OpenAI, Cursor, Perplexity, Versailles, Plat, and hundreds of other winning companies have in common? The answer is they're all powered by today's sponsor, WorkOS. If you're building software for enterprises, you've probably felt the pain of integrating single sign-on, skim, Rback, audit logs, and other features required by big customers. WorkOS turns those deal blockers into drop-in APIs with a modern developer platform built specifically for B2B SaaS.
Starting point is 00:04:16 Whether you're a seat-stage startup trying to land your first enterprise customer or a unicorn expanding globally, WorkOS is the fastest path to becoming enterprise-ready and unlocking growth. They're essentially strike for enterprise features. Visit WorkOS.com to get started or just hit up their Slack support where they have real engineers in there who answer your questions super fast. WorkOS allows you to build like the best with delightful APIs, comprehensive docs, and a smooth developer experience. Go to WorkOS.com to make your app. Enterprise Ready today. Edwin, thank you so much for being here and welcome to the podcast.
Starting point is 00:04:55 Thanks so much for having me. I'm super excited. I want to start with just how absurd what you've achieved is. A lot of people and a lot of companies talk about scaling massive businesses with very few people as a result of AI and you guys have done this in a way that is unprecedented. You guys hit a billion in revenue in less than four years with less than 60, around 60 to 70 people. You're completely bootstrapped, haven't raised any VC money. I don't believe anyone has ever done this before. So you guys are actually achieving the dream of what people are describing will happen with AI.
Starting point is 00:05:30 I'm curious, just do you think this will happen more and more as a result of AI? And also just where has AI most help you find leverage to be able to do this? Yeah, so we hit over a billion of revenue last year with under 100 people. And I think we're going to see companies with even crazier ratios, like 100 billion per employee in the next few. years. AI is just going to get better and better and make things more efficient. So that ratio just becomes inevitable. I used to work in a bunch of the big tech companies.
Starting point is 00:05:58 And I always felt that we could fire 90% of people and we would move faster because the best people wouldn't have all these distractions. And so when we started a surge, we wanted to build it completely differently with a super small, super elite team. And yeah, what's crazy is that we actually succeeded. And so I think two things are colliding. one is that people are realizing that you don't have to build giant organizations in order to win. And two, yeah, all these efficiencies from AI.
Starting point is 00:06:26 And they're just going to lead to a really amazing time in company building. Like the thing I'm excited about is that the types of companies are going to change to. It won't just be that they're smaller. We're going to see fundamentally different companies emerging. Like if you think about it, fewer employees means less capital. Less capital means you don't need a raise. So instead of companies started by founders who are great at pitching and great at high You'll get founders who are really great at technology of product.
Starting point is 00:06:51 And instead of products optimized for revenue and what VCs want to see, you'll get more interesting ones built by these tiny obsessed teams. So people building things they actually care about, real technology of your innovation. So I'm actually really hoping that the Silicon Valley startup team will actually go back to being updates for hackers again. You guys have done a lot of things in a very contrarian way. And one was actually just not being like on LinkedIn posting viral posts, not on Twitter, constantly. promoting surge. I think most people hadn't heard of surge until just recently. And then you just came out and like, okay, the fastest growing
Starting point is 00:07:23 company at a billion dollars. Why would you do that? I imagine that was very intentional. We basically never wanted to play this Silicon Valley game. And like I always always always always started as ridiculous. Like what did you dream of doing when you were a kid? Was it building a company from scratch yourself and getting into weeds of your code and your product
Starting point is 00:07:39 every day? Or was it explaining all your decisions DVCs and getting on this giant PR and fundraising hamster wheel? And it definitely made things more difficult for us because, yeah, when you front raise, you just naturally get part of this kind of Silicon Valley industrial complex where people will, your VC's will tweet about you, you'll get the tech around childlines, you'll get announced in all the newspapers because you raise that this massive valuation.
Starting point is 00:08:07 And so it made things more difficult us because the only way we were going to succeed was by building a 10 times better product and getting word of mouth from researchers. But I think it also meant that our customers were people who really understood data and really cared about it. Like I always thought it was really important for us to have customers, early customers, who are really aligned with what we were building and who really cared about having really high quality data and really understood how that data would make their AI models so much better because they were the ones helping us. They were the ones giving us feedback on what we're producing. And so just having that kind of like very, very close mission alignment with our customers actually helped us early on.
Starting point is 00:08:43 So these are people who basically just buying our product because they knew how different it was and because it was helping them rather than because they saw stuff around that current child line. So it made things harder for us, but I think in a really good way. It's such an empowering story to hear this journey for founders that they don't need to be on Twitter all day promoting what they're doing. They don't have to raise money. They can just kind of go heads down and build. So I love so much about the story of Serge.
Starting point is 00:09:10 For people that don't know what Serge does, just to give us a quick, explanation of what surge is. We essentially teach AI models what's good and what's bad. So we trained them using human data and just a lot of different products that we have. Like SFT, RHF, Rubrics, Verifiers, R environments, and so on and so on. And then we also measure how well they're progressing. So essentially, we're a data company. What you always talk about is the quality has been the big reason you guys have been so
Starting point is 00:09:39 successful, the quality of the data. what does it take to create higher quality data? What do you all do differently? What are people missing? I think most people don't understand what quality even means in the space. They think you can just throw bodies out of problem and get good data, and that's completely wrong.
Starting point is 00:09:57 Let me give you an example. So imagine you wanted to train a model to write an eight-line poem about the moon. What makes it a good, high-quality poem? If you don't think deeply about quality, you'll be like, is this a poem? Does it contain eight lines? does it contain a word moon?
Starting point is 00:10:12 You check all of these boxes, and it's so sure, yeah, you say it's a great problem. But that's completely different from what we want. We are looking for a Nobel Prize winning poetry. Like, is this poetry unique? Is it full of subtle imagery? Does it surprise you and talk about your heart? Does it teach you something about the nature of moonlight? Does it play through motions?
Starting point is 00:10:29 It doesn't make you think. That's what we are thinking about when we think about a high-collate poem. So it might be like a haiku about moonlight on water. It might use internal rhyme and meter. There are a thousand ways to write a poem out of the moon. and each one gives you all these different insights into language and imagery and human expression. And I think thinking about quality and this way is really hard. It's hard to measure. It's really subjective and complex and rich. And that's a really high bar. And so we have to build all of this
Starting point is 00:10:55 technology in order to measure it, like thousands of signals on all of our workers, thousands of signals on every project, every task. Like we know at the end of day if you are good at writing poetry versus good at writing essays versus great at writing technical documentation. And so we have to gather all these signals on what your background is, what your expertise is, and not just that, like how you're actually performing when you're writing all these things. And we use those signals to inform whether or not you are a good and for these projects and whether or not you are improving to models. And it's really hard. And so it's a bit all this technology to measure it. But I think that's exactly what we want AI to do. And so we have these really, really deep
Starting point is 00:11:34 notions of a quality that we're always trying to achieve. So what I'm hearing is there's kind of a Just going much deeper in understanding what quality is within the verticals that you are selling data around. And is this like a person you hire that is incredibly talented at poetry plus evals that they, I guess, help write that tell them if this is great? What's the mechanics of that? The way it works is we essentially gather thousands of signals about everything that you're doing when you're working on platform. So we are looking at your keyboard strokes. We are looking how fast you answer things. We are using reviews.
Starting point is 00:12:13 We are using code standards. We are using like we're training models ourselves on the outputs that you create. And then we're seeing whether they improve the models performance. And so in a very similar way to how Google search, like when Google search is trying to determine what is good web page, there's almost two aspects of it. One is you want to remove all the worst to the worst webpages. So you want to remove all the spam, all the, just like low quality content, all the pages that don't load. And so it's almost like a condom moderation problem. You just want to remove the worst to worst.
Starting point is 00:12:44 But then you also want to discover the best of the best. Okay, like this is the best web page or, you know, just the best person for this job. They are not just somebody who writes the equivalent of high school level poetry. Again, like they're not just for biolicky writing poetry that checks all these boxes, checks all these explicit instructions. But rather, yeah, they're writing poetry that makes you emotional. And so we have all these signals as well that, again, like completely differently from moving to worse to the worst, we are finding the best of the best. And so we have all these signals. Again, just like Google search uses all these signals and feeds them into their ML algorithms and uses them predict certain types of things.
Starting point is 00:13:19 We do the same with all of our workers and all our tasks and all our projects. And so it's almost like a complicated machine learning problem at the end of today. And that's how it works. That is incredibly interesting. I want to ask you about something I've been very curious about over the past couple of years. If you look at Claude, it's been so much better at coding and at writing than any other model for so long. And it's really surprising just how long it took other companies to catch up, considering just how much economic value there is there, just like every AI coding product stand on top of clot because it was so good clock code and writing also. What is it that made it so much better?
Starting point is 00:13:55 Is it just the quality of the data they trained on or is there something else? I think there are multiple hearts to it. So a big part of it certainly is the data. Like I think people don't realize that there are, there's almost like this infinite amount of choices that all the frontier labs are deciding between when they're choosing what data goes into their models. It's like, okay, are you purely using human data? Are you gathering the human data in XYZ way? When you are gathering the human data, what exactly are you asking that people were creating it to create for you? Like maybe you create, maybe you care more, for example, in the coding realm,
Starting point is 00:14:31 maybe you care more about front encoding versus back encoding. Maybe when you're doing front encoding, you care a lot about the visual design of the front end applications that you're creating. Or maybe you don't care about it so much. And you care more about, I don't know, deficiency of it or the pure correctness over that like visual design. And then other questions like, okay, are you carrying balls? Are you like how much synthetic data are you throwing into the mix? How much do you care about these 20 different benchmarks? Like some companies, they see these benchmarks and you're like, okay,
Starting point is 00:14:57 for PR purposes, even though we don't think that these academic benchmarks matter all that, all that much, maybe we just need to optimize for them anyways because we, our marketing team needs to show certain progress or certain standard valuations that every other company talks about. And if we don't show good performance here, it's going to get it for us, even if like ignoring these academic benchmarks makes us better at every other tasks. Other companies are going to be principal and be like, okay, yeah, no, I don't care about marketing. I just care about how my model performs on these real world tasks at the end of the day, and so I'm going to optimize for that instead.
Starting point is 00:15:31 And it's almost like there's a tradeoff between all of these different things, and there's like a, like one of the things I often think about is that there's a, it's almost like there's an art to post-strainning. It's not purely a science. Like when you were deciding what kind of model you're trying to create and what it's good at,
Starting point is 00:15:48 there's this notion of taste and sophistication. Like, okay, do I think that these, Going back to example of how good the model is at visual design, like, okay, maybe you have a different notion of visual design than what I do. Like, maybe you care more about minimalism and you care more about, I don't know, like 3D animations than than I do. Maybe a seller person prefers things that look a little bit more broke. And there's all these notions of taste infiscation that you have to decide between
Starting point is 00:16:18 when you're signing your post-training mix. And so that matters as well. So long and short, I think there's all these different factors. And certainly the data is a big part of it. But it's also like what is, like, what is the objective function that you're trying to optimize your model towards? That is so interesting. Like the taste will, the taste of the person leading this work will inform what data they ask for, what data they feed it. But it just, it's wildest shows the value of great data.
Starting point is 00:16:42 Anthropic got so much growth and win from essentially better data. Yeah, yeah, exactly. And I could see why companies like yours are growing so fast. There's just so much. And that's just one vertical. That's just coding, and then there's probably a similar area for writing. I love that it's interesting that AI, you know, it feels like this artificial computer binary thing, but it's like taste, human judgment is still such a key factor in these things being successful.
Starting point is 00:17:09 Yep, yep, exactly. Like, again, going back to the example I said earlier, certain companies, if you ask them what is good poem, they will simply robotically check off all of these instructions on our list. But again, I don't think that makes for good poetry. So certain frontier labs, the ones with more taste insidst that is, they will realize that it doesn't reduce to this six set of checkboxes, and they'll consider all of these kind of implicit, very subtle qualities instead. And I think that's what makes them better at a sudden end of day.
Starting point is 00:17:37 You mentioned benchmarks. This is something a lot of people worry about is there's all these models that are always, like basically it feels like every model is better than humans at kind of every stem field at this point. But to a regular person, it doesn't feel like these models are getting that much smarter constantly. what's your just sense of how much you trust benchmarks and just how correlated those are with actual AI advancements? Yeah, so I don't trust the benchmarks at all. And I think that's for two reasons. So one is, I think a lot of people don't realize, even researchers within the community,
Starting point is 00:18:10 they don't realize that the benchmarks themselves are often honestly just wrong. Like they have wrong answers. They're full of all this kind of messiness. And people trust on this for, like for the popular ones, people have maybe realized this to some extent, but the vast majority, they just have all these flaws that people don't realize. So that's one part of it. And the other part of it is these benchmarks up in the day, they are often, they often have well-defined objective answers that make them very easy for models that he'll climb on in a way that's very, very different from the messiness and ambiguity to real world.
Starting point is 00:18:47 I think one thing that often say is that it's kind of crazy that these models can win IMO-go-me-Bel medals, but they still have trouble parsing PDFs. And that's because, yeah, even though IMO gold medals seem hard to the average person, yeah, like they are hard off the end of the day. But they have this notion of objectivity that's okay, yeah, parsing and PDF sometimes doesn't have. And so it's easier for the frontier labs they'll climb on all these than to solve all these messy ambiguous problems in the real world. So I think there's a lack of direct correlation there.
Starting point is 00:19:17 It's so interesting. The way you described it is hitting these benchmarks is kind of like a marketing piece, when you can launch, say Gemini 3 just launch and it's like, well, number one of all these benchmarks. Is that what happens? They just kind of train their models to get good at these very specific things. Yes. So there's, again, maybe two parts of this.
Starting point is 00:19:33 So one is sometimes, yeah, these benchmarks, they accidentally leak in certain ways or the frontier labs will tweak the way they evaluate their models on these benchmarks. Like they'll tweak their system prompt or they'll tweak the number of times they run their model and so on and so on and a way, that games these benchmarks. The other part of it, though, is it's like by optimizing for the benchmark, instead of optimizing for the real world,
Starting point is 00:20:02 you will just naturally climb on the benchmark. And yeah, it's basically another form of giving it. Knowing that, with that in mind, how do you kind of get a sense of if we're heading towards AGI? How do you measure progress? Yes, so the way we really care about measuring model progress is by running all these human evaluations. So, for example,
Starting point is 00:20:22 But what we do is, yeah, we will take core human annotators and we'll ask them, okay, go have a conversation with a model. Maybe you're having a conversation with the model across all of these different topics. So, okay, you are a noble prize winning physicist. So you go have a conversation about pushing different tier of your own research. You are a teacher and you're trying to create lesson plans for your students. So go talk to the model about these things. Or you are a, yeah, you're a coder and you're working at, one of these big tech companies and you have these problems every day.
Starting point is 00:20:54 So go talk to the model and see how much it helps you. And because or surgers or annotators, they are experts at the top of their fields. And they are not just giving me your responses. They're actually working through the responses deeply themselves. They are, yeah, they're going to evaluate the code at the rights. They're going to double check the physics equations that it writes. They're going to evaluate the models in a very deep way. So they're going to pay attention to accuracy and instruction following it,
Starting point is 00:21:22 all these things that casual users don't when you suddenly get a pop-up on your chat chabit response, asking you to compare these two different responses. Like people like that, they're not evaluating the models deeply. They're just viving and picking whatever response looks slashediest. Orators are looking closely at responses and evaluating them for all of these different dimensions. And so I think that's a much better approach than these benchmarks or kind of these random online, maybe this.
Starting point is 00:21:48 Again, I love just how central humans continue to be. this work that we're not totally done yet. As there are going to be a point where we don't need these people anymore, that AI is so smart that, okay, we're good. We got everything out of your heads. Yeah, I think that will not happen until we've reached AGI. Like, it's almost like by definition. If we haven't reached AGIEI yet, then there's more for the models to learn from.
Starting point is 00:22:10 And so, yeah, I don't think that's going to happen anytime soon. Okay, cool. So more reason to stress about AGI. We don't need these folks anymore. What's your, I can't not ask just any, it's people that work closely with this stuff. I'm always just curious, what's your AGI timelines? How far do you think we are from this? Do you think we're in a couple of years or is it like decades?
Starting point is 00:22:28 So I'm certainly on the longer time crisis front. Like I think people don't realize that there's a big difference between moving from 80% performance to 90% performance to 99% performance to 99.9% performance and so on and so on. And so like in my head, I probably bet that within next one or two years, yeah, the models are going to automate 80% of, you know, the average, six software engineers job. It's going to take another few years, do you move to 90%, and another few years to 99%, and so on and so on.
Starting point is 00:22:57 So I think we're closer to a decade or decades away than that folks. You have this hot take that a lot of these labs are kind of pushing AI in the wrong direction. And this is based on your work at Twitter and Google and Facebook. Can you just talk about that? I'm worried that instead of building AI
Starting point is 00:23:16 that will actually advance us as a species, curing cancer, solving poverty, understand universe, all these big grand questions. we are optimizing for AI slop instead. We're basically teaching our models to chase dopamine instead of truth. And I think this relates to what we're talking about regarding these benchmarks. So let me give you a couple examples.
Starting point is 00:23:35 So right now, the industry is played by these terrible leaderboards like LM Arena. It's this popular online leaderboard where random people from around the world vote on which AI response is better. But the thing is, like I was saying earlier, they're not carefully reading or fact-checking. They're skimming these responses for two seconds. picking whatever looks last year. So a model can hallucinate it can completely hallucinate,
Starting point is 00:23:58 but it will look impressive because it has crazy emojis and boating and markdown headers and all these superficial things that don't matter at all, but it catch your attention. And these Alamarana users love it. It's literally optimizing your models
Starting point is 00:24:10 for the types of people who buy a tablet at the grocery store. Like we've seen this in their data ourselves. The easiest way to climb Alamarina, it's adding crazy boating, it's doubling the number of emojis, it's tripling the length
Starting point is 00:24:20 for your model response. even if your model starts for hallucinating and getting the answer completely wrong. And the problem is, again, because all of these frontier labs, they kind of have to pay attention to PR because their sales team, when they're trying to sell all these enterprise customers,
Starting point is 00:24:35 those enterprise customers will say, well, well, but your model's only number five on Elamarino, so why should I buy it? They have to, in some sense, pay attention to the these leaderboards, and so what our researchers all will tell us is, like, they'll say, the only way I'm going to get promoted at the end of the year
Starting point is 00:24:51 is if I climb this leaderboard, even though I know that climbing is probably going to make my model worse and accuracy and the search of following. So I think there's all these negative incentives that are pushing working in the wrong direction. I'm also worried about this trend towards optimizing AI for engagement.
Starting point is 00:25:07 Like, I used to work on social media, and every time we optimize for engagement, terrible things happened. You'd get clickbait and pictures of bikinis and Bigfoot and horrifying skin diseases just filling your feeds. And I think I worry to the same thing's happening with AI. Like, if you think about all the sick-fancy issues with chat, you're absolutely right. What an amazing question. Like, the easiest way to hook users is to tell them how amazing they are. And so these
Starting point is 00:25:30 models, they constantly tell you you're a genius. They'll feed into a delusions and conspiracy theories. They'll pull you down these rabbit holes because Silicon Valley loves maximizing time spent and just increasing number of conversations driving with it. And so, yeah, companies are spending all the time hacking these leaderboards and benchmarks. And the scores are going up, but I think it actually mass of the models with the best scores, they are often the worst or just have all these fundamental failures. So I think I'm really worried that all of these negative ascendants are pushing AGI into the wrong direction. So what I'm hearing is AI is being slowed down by these basically the wrong objective function, these labs, paying attention to the wrong, basically benchmarks
Starting point is 00:26:10 and e-vals. Yep. I know you probably can't play favorites since you work with all the labs. Is there anyone doing better at this and maybe kind of realizing this is the wrong direction? I would say I've always been very impressed by Anthropic. Like, I think Anthropic takes a very principled view about what they do and don't care about and how they want their models to behave in a way that feels a lot more, a lot more principal to me. Interesting. Are there any other mistakes, big mistakes you think labs are making just that are kind of slowing things down or heading in the wrong direction? Where we've heard just, you know, chasing benchmarks, this engagement focus. Is there anything else you're saying of just like,
Starting point is 00:26:51 okay, we got to work on this because it'll speed everything up? I mean, I think there is a question of what products they're building and whether those products themselves or something that kind of help or hurt humanity. Like, I think a lot about Sora and
Starting point is 00:27:07 I was thinking that's what it entails. And it's kind of interesting. It's like, it's like, which companies would build Sora and which wouldn't? And I think that answer to that I mean, I don't know the answer is myself. I have any idea in my head, but I think the answer to that question maybe reveals certain things
Starting point is 00:27:26 about what kinds of AI models those companies want to build and what direction and what future they want to achieve. So I think about that a lot. The Steelman argument there is, you know, it's like fun, people want it. It'll help them generate revenue
Starting point is 00:27:43 to grow this thing and build better models. It'll train data in an interesting way. It's also just like, you know, really fun. Yeah, it, I think it's almost like, do you care about how you get there? And in the same way, so I made this tabloid analogy earlier, but like, would you sell tabloids in order to fund, I don't know, some other newspaper? I'm sure, like in some sense. If you don't care about the path, then you just do whatever it takes. but it's possible that it has negative consequences in of itself
Starting point is 00:28:20 that will harm the long-term direction of what you're trying to achieve and maybe it'll distract you from all the more important things. So, yeah, I think the path you take matters a lot as well. Along these lines, you talked a bunch about this of just Silicon Valley and kind of the downsides of raising a lot of money being in the echo chamber. What do you call the Silicon Valley machine? You talk about how it's hard to build important companies in this way and that you might actually be much more successful if you're not going down the VC
Starting point is 00:28:52 path. Can you just talk about what you've seen their experience and your advice essentially to founders? Because they're always hearing, you know, raise money from fancy VCs, move to Silicon Valley. What's kind of the countertake? Yes. So I've always really hated a lot of Silicon Valley Mantras. This standard playbook is to get product market fit by pivoting every two weeks and to chase growth and chase engagement with all of these dark patterns.
Starting point is 00:29:14 and to blitzscale by hiring as fast as possible. And I've always disagreed. So, yeah, I would say don't pivot. Don't blitzscale. Don't hire that Stanford grad who simply wants to add a hot company to your resume. Just build the one thing only you could build, the thing that wouldn't exist without the insight and expertise that only you have. And you see these buy-to-book companies everywhere now,
Starting point is 00:29:35 some founder who was doing crypto in 2020 and then pivoted NFTs in 2022, and now you're an AI company. There's no consistency. There's no mission. just chasing evaluations. And I've always hated this because Silicon Valley loves to score on Wall Street for focusing on money. But honestly, most of the Silicon Valley is chasing the same thing. And so we stayed focused on our mission from day one, pushing that frontier of high-quality complex data.
Starting point is 00:30:01 And I always love that because I think startups, I have this very romantic notion of startups. Like startups are supposed to be taking big risks to build something that you really believe in. But if you're constantly pivoting, you're not taking any risks. You're just trying to make a quick walk. and if you fail because the market isn't ready yet, I actually think that's way better. At least you took a swing at something deep and novel and hard instead of pivoting into another LM rapper company.
Starting point is 00:30:23 So yeah, I think the only way you build something that matters since that's going to change the world is if you find a big idea you believe in and you say no to everything else. So you don't keep on pivoting when it gets hard. You don't hire a team of 10 product managers because that's where every other cookie cutter startup does. You just keep building that one company that wouldn't exist without you. And I think there are a lot of people on Silicon money now who are sick of all the grift
Starting point is 00:30:44 who want to work on big things that matter with people who actually care. And I'm hoping that that will be a future of how we how we big technology. I'm actually working on a post right now with Terence Rohan, this if you see you that I really like to work with. And we interviewed five people who picked really successful generational companies early and joined them as really early employees. Like they joined Open AI before anyone thought it was awesome. Stripe before anyone knew was awesome.
Starting point is 00:31:10 And so we're looking for patterns of how people find these generational companies before anyone else. And it aligns exactly what you described, which is ambition. They have wild ambition with what they want to achieve. They're not, as you said, just kind of looking around for product market fit no matter what ends up being. And so I love that what you described very much aligns with what we're seeing there. Yeah, I absolutely think that you have to have huge ambitions and you have to have a huge
Starting point is 00:31:37 belief in your idea that's going to change the world. And you have to be willing to double down and keep on doing whatever it takes to make it happen. I love how counter your narrative is to so many of the things people here. And so I love that we're doing this. I love that we're sharing this story. Today's episode is brought to you by Coda. I personally use Coda every single day to manage my podcast and also to manage my community. It's where I put the questions that I plan to ask every guest that's coming on the podcast. It's where I put my community resources. It's how I manage my workflows. Here's how CODA can help you. Imagine starting a project at work and your vision is clear,
Starting point is 00:32:11 you know exactly who's doing what and where to find the data that you need to do your part. In fact, you don't have to waste time searching for anything because everything your team needs from project trackers and OKRs the documents and spreadsheets lives in one tab, all in Kota. With CODA's collaborative all in one workspace, you get the flexibility of docs, the structure of spreadsheets, the power of applications, and the intelligence of AI. all in one easy to organize tab.
Starting point is 00:32:38 Like I mentioned earlier, I use Kota every single day, and more than 50,000 teams trust Kota to keep them more aligned and focused. If you're a startup team looking to increase alignment and agility, Coda can help you move from planning to execution in record time. To try it for yourself, go to Kota.io slash Lenny today and get six months free of the team plan for startups. That's CODA.io slash Lenny to get started for free and get six months of the team plan. Coda.io slash Lenny.
Starting point is 00:33:07 Slightly different direction, but something else that was maybe a counter narrative. I imagine you watched the Dwar Keshe and Richard Sutton podcast episode. And even if you didn't, they basically have this conversation, Richard Sutton. He was a famous AI researcher, had this whole bitter, the bitter lesson meme. And he talked about how LMs almost are kind of a dead end. And he thinks we're going to really plateau around LMs because of the way they learn. What's your take there? Do you think LMs will get us to AGI or beyond? Or do you think there's going to be something new or a big breakthrough that needs to get us there?
Starting point is 00:33:42 I'm in a camp where I do believe that something new will be needed. The way I think about it is when I think about training I, I take a very, I don't know if I would say biological point of view. But I believe that in the same way, that there's a million different ways that humans learn, we need to build models that. that can mimic all those ways as well. And maybe they don't have a different distribution of the focuses that they have. I know they don't be different from humans, so maybe they have a different distribution.
Starting point is 00:34:14 But we want to be able to mimic their learning abilities of humans and make sure that we have the algorithms and the data for models to learn in the same way. And so to extent that LMs have different ways of learning from humans, then yeah, I think something new move needed. This connects to,
Starting point is 00:34:32 reinforcement learning. It's something that you're big on and something I'm hearing more and more is just becoming a big deal in the world of post-training. Can you just help people understand
Starting point is 00:34:41 what is reinforcement learning and reinforcement learning environments and why they're going to be more and more important in the future? Reinforcement learning is essentially training your model to reach a certain reward.
Starting point is 00:34:54 And let me explain what the RRMIRM is. An RRN is essentially a simulation of your world. So think about building a video game with a fully fleshed out universe. Every character has a real story. Every business has tools and data you can call,
Starting point is 00:35:09 and you have all these different entities interacting with each other. So, for example, we might build a world where you have a startup with Gmail messages and Slack threads and Jira tickets and GitHub PRs and a whole code base. And then suddenly AWS goes down and Slack goes down. And so, okay, model, what do you do? Like, the model needs to figure it out. So we give the model tasks in these environments, we design interesting challenges for them, and then we run them to see how they perform. And then we teach them, we give them these rewards when they're doing a good job or a bad job. And I think one of the interesting things is that these environments really showcase where models are weak at end-to-end tasks in every world. You have all these models that seem really smart
Starting point is 00:35:50 on isolated benchmarks. Like they're good at a single-step tool calling. They're good at single-step instruction following. But suddenly you dump them into these messy worlds where you have confusing Slack messages and tools they've ever seen before. And they need to perform right actions and modify the databases and interact over longer time horizons where what they do in step one affects what they do in step 50. And that's very very different from these kind of academics, single step environments that they've been in before. And so the model just fails catastrophically in all these crazy ways.
Starting point is 00:36:21 So I think these armed environments are going to be really interesting playgrounds for the malls to learn from that will essentially be simulations and mimics in the real world. And so they'll hopefully get better and better at your tasks compared to all these contrived environments. So I'm trying to imagine what this looks like. Essentially, it's like a virtual machine with, I don't know, browser or spreadsheet or something in it with like, I don't know, surge.com. Is that your website of surge.com?
Starting point is 00:36:47 Let's make sure we get that right. So we are actually surgehq.aI. Searchhq.aI. Check it out. We're hiring it. I imagine. Yes. Okay.
Starting point is 00:36:58 So it's like, cool, here's searchhq.ai. Your job, here's your job as an agent, let's say, is to make sure it stays up. And then all of a sudden it goes down and the objective function is figure out why. Is that an example? Yeah. So the objective function might be, or the goal of the task might be, okay, go figure out why and fix it. And so the objective function might be, it might be passing a series of unit tests. It might be writing a document, like maybe it's a retro containing certain information that matches the same.
Starting point is 00:37:28 exactly what happened. There's all these like different rewards that we might give it that determine whether or not it's succeeding. And so the models were basically teaching models to achieve that reward. So essentially it's like running, it's often running. Here's your goal. Figure out why the site went down and fix it. And it just starts trying stuff. We're using all the intelligence it's got. It makes mistakes. You kind of help it along the way rewarded if it's doing the right sort of thing. And so what you're describing here is this is where model, this is the next phase of models becoming smarter. More RL environments focused on very specific tasks that are economically valuable, I imagine. Yeah. Yeah. So just in the same way that there were all these different methods for models
Starting point is 00:38:08 learning in the past. Like originally we had SFT and RHF and then we had rubrics and verifiers. This is the next stage. And it's not the case that the previous methods are obsolete. This is, again, just a different form of learning that complements all the previous types. So it's just like different skilled model, not only learn how to do. And so in this case, it's less some physics PhD sitting around, talking to a model, correcting it, giving it evils of here's what the correct answer is, creating rubrics and things like. More it's like this person now designing an environment. So another example I've heard is like a financial analyst.
Starting point is 00:38:45 Just like, here's an Excel spreadsheet. Here's your goal. Figure out our profit and loss or whatever. And so this expert now is instead of just sitting around writing rubrics, they're designing this are all in permanent. Yeah, exactly. So that financial analyst might create a spreadsheet. They may create certain tools that the model needs to call in order to help fill out the spreadsheet.
Starting point is 00:39:07 Like it might be, okay, the model needs to access Bloomberg Terminal. I need to learn how to use it. And it needs to learn how to use this calculator. And it needs to learn how to form this calculation. So it has all these tools that it has access to. And then the reward might be, okay, it's like maybe I will download that spreadsheet. and I want to see does cell B-22
Starting point is 00:39:28 contain the correct profit and loss number? Or does tab number two contain dispeas information? And what's interesting, this is a lot closer to how humans learn. We just try stuff, figure out what's working and what's not.
Starting point is 00:39:43 You talk about how trajectories are really important to this. It's not just here's the goal and here's the end. It's like every step along the way. Can you just talk about what trajectories are and why that's important to this?
Starting point is 00:39:55 I think one of the things that people don't realize is that sometimes, even though the model reaches the correct answer, it does so in all these crazy ways. So it may have, in the intermediate trajectory, it may have tried 50 different times and fail, but eventually it just kind of like randomly lands on a correct number. Or maybe it, sometimes it just does things very, very inefficiently, or it almost reward hacks a way to get at the correct answer.
Starting point is 00:40:27 And so I think paying attention to the directory is actually a really important. And I think it's also really important because some of these trajectories can be very, very long. And so if all you're doing is checking whether or not the model reaches the final answer, it's like there's all this information about how the model behaved in the immediate step that's missing. Like sometimes you want models to get to the correct answer by reflecting on, what it did. Sometimes you wanted to get the correct answer by just one-sharning it. And if you ignore all of that, it's just like teaching, teaching it's just missing a lot of information that you could be teaching them all to do. I love that. Like it just, yeah, it tries a bunch of stuff and eventually
Starting point is 00:41:06 gets it right. You don't want it to learn. This is the way to get there. There's often a much more efficient way of doing it. You mentioned all the kind of the steps we've taken along the journey of getting, of helping A model to get smarter. Since you've been so close to this for so long, I think this is going to be really helpful for people. What's kind of like been the steps along the way from the first of post training that has most helped models advance? Like where to evals fit in the oral environments? Just like what's been like the steps and now we're heading towards oral environments? Originally the way models started getting post trained was purely through SFTE. And what does that stand for? So SFTE is stands for supervised fine tuning. And it's
Starting point is 00:41:46 like so again, I think often in terms of these human analogies, and so is a lot by, is a lot like mimicking a master and copying what they do. And then R-LHF became very dominant. And the knowledge of dare would be like sometimes you learn by writing 55 different essays and someone telling you which one they like the most. And then I think over the past year or so, rubrics and verifiers have become very important. And rubrics of verifiers are like learning by being graded and getting detailed feedback on where you went wrong.
Starting point is 00:42:16 And those are e-vals. Another word for that. Yeah, yeah. So I think evals often covers two terms. One is you are using the evaluations for training because you're evaluating whether or not the model did a good job. And when it does do a good job, you're rewarding it. And then there's a salary notion of e-vails where you're trying to measure the model's progress. Like, okay, yeah, I have five different candidate checkpoints.
Starting point is 00:42:41 And I want to pick the one that's best in order to raise it to the public. So I've run all these e-vails on these five-to-five. checkpoints in order to decide which one which one is best. Awesome. Yeah. And yeah, now now we have our environment so it's kind of like a hot new thing. Awesome. So what I love about this business here and it's just there's always something new. There's always this like, okay, we're getting so good at just all this beautiful data for companies and now they need something completely different. Now we're setting up all these virtual machines for them and all these different use cases. And it feels like that's a big part of this industry you're in. It's just adapting to what labs are asking for.
Starting point is 00:43:14 Yeah, yeah. So I mean, I really do think that we are going to need to build. a suite of products that reflect the million different ways that humans learn. And like, for example, think about becoming a great writer. You don't become great by memorizing a bunch of grammar rules. You become great by reading great books and you practice writing and you get feedback from your teachers and from the people who buy your books in a bookstore and the reviews. And you notice what works and what doesn't. And you develop taste by being exposed to all these masterpieces and also just terrible writing.
Starting point is 00:43:46 So you learn through this endless cycle. practicing reflection and each type of learning that you have again like these are all very very different methods of learning to become a great writer so just in the same way that it's a thousand for ways that the great writer becomes great i think there's going to be a thousand different ways that the arles need need to learn it's so interesting this just ends up being like just like humans in so many ways it makes sense because in a sense neural networks deep learning is modeled after how humans have learned and how our brains operate but it's interesting just to make them smarter. It's how do we come closer to how humans learn more and more. Yeah, it's almost like
Starting point is 00:44:23 maybe the end goal is just throwing you into the environment and just seeing how you evolve. But within that evolution, there's all these different sub learning mechanisms. Yeah, which is kind of what we're doing now. So that's really interesting. This might be the last step of until we hit AGI. Along these lines, something that's really unique to surge that I learned is you guys have your own research team, which I think is pretty rare. Talk about just why that's, something you guys have invested in and what has come out of that investment? Yeah. So I think that stems from my own background.
Starting point is 00:44:55 Like my own background is as a researcher. And so I've always cared fundamentally about pushing the industry and pushing the research community and not just about revenue. And so I think what a research team does is a couple different things. So we almost have two types of researchers at our company. one is or four deployed researchers who are often working hand in hand with our customers to help them understand their models. So we will work very closely with our customers to help them understand, okay, this is where your model is today. This is where you're lagging behind all the competitors.
Starting point is 00:45:31 These are some ways that you could be improving in the future given your goals. And we're going to design these datasets, these evaluation methods, these training techniques to make your models better. So this very, very notion, this very, very kind of collaborative notion of working with our customers, like being researchers themselves, just a little bit more focused on the data side and where you handle and handle with them to do whatever it takes to make them the best. And then we also have our internal researchers. So our internal researchers are focused on slightly different things.
Starting point is 00:46:03 So they are focused on building better benchmarks and better leaderboards. So I talked a lot about how I worry that the leaderboards and benchmarks out there today or steering models in the wrong direction. So, yeah, so the question is, how do we fix that? And so that's what our research time is focused on really heavily on, really focused really heavily on right now. So they're working a lot on that. And they're also working on these other things like,
Starting point is 00:46:24 okay, we need to train our own models to see what types of data performs the best, what types of people for performing the best. And so they are also working on all these chronic training techniques and evaluation of our own datasets to improve or data operations. And the internal data products that we have that determine what makes something good quality. It's such a cool thing because I don't think, like basically the labs have researchers helping them advance AI. I imagine it's pretty rare for a company like yours to have researchers actually doing primary research on AI. Yeah, yeah.
Starting point is 00:47:00 I think it's just because it's something I fundamentally always care about. Like I often think about us more like a research lab than a startup because that is my goal. It's kind of funny, but I've always said, I would rather be Terence Tao than Warren Buffett. So that notion of creating research that pushes their frontier forward and not just getting some valuation. Like, that's always been what drives me. And it's worked out. That's the beautiful thing about this. You mentioned that you were hiring researchers.
Starting point is 00:47:30 Is there anything there you want to share folks you're looking for? So we look for people who are just fundamentally interested in data all day. So the types of people who could literally spend 10 hours digging through a dataset and playing around with models and thinking, okay, yeah, this is where I think the model is failing. This is a kind of a behavior you want the model to have instead. And just this aspect of being very hands-on and thinking about the qualitative aspects of models and not just the quantity data parts. So again, it's like this aspect being hands-on with data and not just caring about these kind of abstract algorithms. Awesome. I want to ask a couple broad AI kind of market questions.
Starting point is 00:48:11 What else do you think is coming in the next couple years that people are maybe not thinking enough about or not expecting in terms of where AI is heading? What's going to matter? I think one of the things that's going to happen in the next few years is that the models are actually going to become increasingly differentiated because of the personalities and behaviors that the different labs have and the kind of. of objective functions that they are optimizing their models for. Like, I think it's one thing I didn't appreciate a year or so ago. Like a year or so ago, I thought that all of the HAA models would essentially become very very commoditized. They would all behave like each other.
Starting point is 00:48:53 And sure, one of them might be slightly more intelligent in one way today, but sure, the other ones would catch up in the next few months. But I think over the past year, I've realized that the values that the companies have will shape the model. So let me give an example. So I was asking Claude to help me dropped an email the other day. And it went through 30 different versions.
Starting point is 00:49:19 And after 30 minutes, yeah, I think it really crafted me to perfect email and I sent it. But then I realized that I spent 30 minutes doing something that didn't matter at all. Like, sure, now I got the perfect email, but I spent 30 minutes doing something
Starting point is 00:49:30 I wouldn't have worried at all before. And his email probably didn't even move to needle on anything. anyways. So I think there's a deep question here, which is, if you could choose the perfect model behavior, which model would you want? Do you want a model that says, you're absolutely right? There are definitely 20 more ways to improve this email, and it continues for 50 more iterations, and it sucks up all your time and engagement. Or do you want a model that's optimizing for your time and productivity and just says, no, you need to stop. Your email's great. Just send it and move on with your day.
Starting point is 00:49:59 And again, again, just because in the same way to do it's kind of like a fork in a road between how you could choose how your model behaves for this question. It's like for every other question that models have, the kind of behavior that you want will fundamentally affect it. It's almost like in the same way that when Google builds a search engine, it's very, very different from how Facebook would build a search engine, which is very, very different from how Apple would build a search engine. Like, they all have their own principles and values and things that they're trying to achieve in a world that shape all the products that they're going to build. And in the same way,
Starting point is 00:50:37 I think all the Allodellums will start behaving very, very differently too. That is incredibly interesting. You already see that with Grock. It's got like a very different personality and a very different approach to answering questions. And so what I'm hearing is you're going to see more of this differentiation. Yep. Kind of another question along these lines.
Starting point is 00:50:55 What do you think is most under-hyped in AI? They think maybe people aren't talking enough about that is really cool. and what do you think is overhyped? So I think one of the things that was underhyped is the built-in products that all of the chaparts are going to start having. Like, I've always been a huge fan of college artifacts. And I think it just works really, really well.
Starting point is 00:51:17 And actually, the other day, I don't know if it's a new feature or not, but it's asking me to help me create an email. And then it just created. So it didn't allow me to send an email. But what it created instead was like a little, I don't know we call it like a little box where I could click on it and it would just text someone at this message. And I think that concept of taking artifacts to the next level where you just have these like mini apps, mini UIs within the chopouts themselves.
Starting point is 00:51:49 I feel like people aren't talking enough about that. So I think that that's one underhyped area. And in terms of overhyped areas, I definitely think that vibe coding is overhyped. I think people don't realize how much it's going to make your systems unmancanable in the long term. It does simply dump this code into their co-bases. You have the seems to work out right now. So I kind of worry about a future coding. It's just going to keep on happening.
Starting point is 00:52:17 These are amazing answers. On that first point, there's something I actually asked. I had the chief product officer of Anthropic and OpenA, Kevin Wheel and Mike Rieger on the podcast. And I asked them just like, as a product team, like, you have this. dig a brain intelligence. How long do you even need product teams? You think this AI will just create the product for you. Here's what I want.
Starting point is 00:52:37 It's like the next level of vibe coding. It's just like tell it. Here's what I want. And it's just building the product and involving the product as you're using it. And it feels like that's what you're describing is where we might be heading. Yeah. Yeah. I think there's a very, very powerful notion where it helps people just achieve their ideas in a
Starting point is 00:52:54 in a much core. Something we haven't gotten into that I think is really interesting. is just the story of how you got to starting Surge. You have a really unique background. I always think about these, Brian Armstrong, the founder of Coinbase, once gave this talk that has really stuck with me, where he kind of talked about how his very unique background allowed him to start Coinbase. You had like economics background.
Starting point is 00:53:18 He had a cryptography experience, and then he was an engineer. And it's got this like the perfect then diagram for starting Coinbase. And I feel like you have a very similar story with Surge. Talk about that in your background. there and how you led, how that led to surge. Going way back, I was always fascinated by math and language when I was a kid. Like, I went to MIT because it's obviously one of the best places for math and CS, but also because it's the home of Noam Shomsky.
Starting point is 00:53:42 My dream in school was actually to find some underlying theory connecting all these different fields. And then I became a research at Google and Facebook and Twitter, and I just kept running into the same problem over and over again. It was impossible to get the data that we needed to train our models. So I was always this huge believer in a need for high quality data. And then GB3 came out in 2020. And I realized that, yeah, if we want to take things to the next level and build models that could code and use tools and tell jokes and write poetry and solve every mind hypothesis and cure cancer, then yeah, we were going to need a completely new solution. Like the thing that always drove me crazy when I was out of these companies was we had a full power of the human mind in front of us.
Starting point is 00:54:21 And all the details students out there were focused on really simple things like image labeling. So I wanted to build something, focus on all these advanced, complex use cases instead that would really help us build an extra-inition models. So, yeah, I think my background in kind of cross-math and computer science and linguistics really, really informed what I always wanted to do. And so I started surgery a month later with our one mission to basically build the use cases that I thought were going to be needed to push the frontier of AI. And you said a month later, a month later, after what?
Starting point is 00:54:52 After a GPD3 launch in 2020. Oh, okay. Wow. Okay. a great decision. What just kind of drives you at this point of other than just the epic success you're having, what keeps you motivated to keep building this and, you know, building something in this space? I think I'm a scientist at heart.
Starting point is 00:55:09 I always thought I was going to become this math or a CS professor and work on trying to understand a universe and language and the nature of communication. Like, it's kind of funny, but I always had this fanciful dream where if aliens ever came to visit Earth and we need to figure out how to communicate with them, I wanted to be the one a government would call, and I'd use all this fancy math and computer science and linguistics to decipher it. So even today, what I love doing most is every time a new model is released,
Starting point is 00:55:37 we'll actually do a really deep dive into the model itself. I'll play around with it. I'll run e-vowls. I'll compare where it's improved, where it's addressed. I'll create this really deep dive analysis that we send our cost stars. And it's actually kind of funny, because a lot of times we will say it's from our data science team, but often it's actually just for me.
Starting point is 00:55:54 And I think I could do this all day. Like I have a very hard time being a meeting this all day. I'm terrible at sales. I'm terrible at doing the typical CEO things that people expect you to do. But I love writing these analyses. I love jamming with a research team and what they're seeing. Sometimes I'll be like up until 3. I'm just talking on a phone with somebody on a research team and digging
Starting point is 00:56:12 a tree model. So I love that. I still get to be really hands on working on the data and then the science all day. And I think what drives me is that I want surge to play this critical role in a future of AI, which I think is also the future of humanity. Like we had these really unique perspectives on data and language and quality
Starting point is 00:56:29 and how to measure all this and how to ensure it's all going on the right path. And I think we're uniquely unconstrained by all of these influences that can sometimes steer companies in a negative direction. Like what I was saying earlier, we built Surge a lot more like a research lab
Starting point is 00:56:45 than a typical startup. So we care about curiosity and long-term incentives and intellectual rigor. And we don't care as much much about quarterly metrics and what's going to look good in a board deck. And so my goal is to take all these unique things about us as a company and use that to make sure that we're shaping AI in a way that's really beneficial for species and a lotter. What I'm realizing in this conversation is just
Starting point is 00:57:07 how much influence you have and companies like yours have on where AI heads, the fact that you help labs understand where they have gaps and where they need to improve. And it's not just, you know, everyone looks at just like the heads of open AI, anthropic and all these companies. I was there the ones ushering in AI, but what I'm hearing here is you have a lot of influence on where things had to you. Yeah, I think there's this really powerful ecosystem where, honestly,
Starting point is 00:57:33 people just don't know where models are headed and how do you want to shape them yet? And how do you want humanity kind of play a role in the future of all this? And so I think there's a lot of opportunity to just continue shaping this discussion. Along that thread, I know you have a very strong thesis on just why this work matters to humanity and why this is so important. Talk about that. I'll get a bit philosophical here, but I think the question itself is about the philosophical, so bear with me. So the most straightforward way of thinking about what we do is we train and evaluate AI.
Starting point is 00:58:11 But there's a deeper mission that I often think about, which is helping our customers think about their dream objective functions. Like, yeah, what kind of model do they want their model to be? and once we help them do that, we'll help them train your model to reach that North Star and we'll help them measure that progress. But it's really hard because objective functions are really rich and complex. It's kind of like the difference
Starting point is 00:58:32 we need having a kid and asking them, okay, what test do you want to pass? Do you want them to get a high school and an SAT and write a really good college essay? Like, that's a simplistic version. Versus what kind of person do you want them to grow up to be? Will you be happy if they're happy, no matter what they do?
Starting point is 00:58:46 Or are you hoping you'll go do a good school and be financially successful? Again, if you take that notion, it's like, okay, how do you define happiness? How do you measure whether they're financially successful? Like, it's a lot harder than simply measuring whether or not you're getting a high score on SAT. And what we're doing is we want to help our customers reach, again, they're dream nor stars and figure out how to measure them. And so I talked about this example of what you want models to do when you're asking them to write 50 different email iterations. Do you just continue them for 50 more?
Starting point is 00:59:22 Or do you just say, no, just move on with the day because this is perfect enough. And the broader question is, are we building these systems that actually advance humanity? And so how do we build the datasets to train towards that and measure it? Are we optimizing for all these wrong things? Just systems that suck up more and more of our time and make us lazy on lazier. And yeah, I think it's really relevant to what we do because it's very hard and difficult to measure and define whether something. is genuinely advancing humanity. It's very easy to measure all these proxies instead, like clicks and likes. But I think that's why our work is so interesting. We want to work on hard, important metrics
Starting point is 00:59:58 that require the hardest types of data and not just easy ones. So I think one of the things I often say is you are your objective function. So we want a rich complex objective functions and not these simplistic proxies. And our job is to figure out how to get the data to match this. So yeah, we want data, we want metrics to measure whether AI is making your life richer. We want to train our systems this way. And we want tools that make us more curious and more creative, of not just lazier. And it's hard because, yeah, humans are kind of inherently lazy, so AI software deals are the easiest way to get engagement,
Starting point is 01:00:26 make all your metrics school up. So I think just question about choosing the right objective functions and making sure that we're optimizing towards them and not just these easy proxies is really, really important for our future. Wow. I love how what you're sharing here gives you so much more appreciation of the nuances of building AI, training, AI, the work that you're doing. You know, from the outside, people could just look at surge
Starting point is 01:00:46 and companies in the space of Kegel, they're just, creating all data, feeding into AI, but clearly there's so much to this that people don't realize. And I love knowing that you're at the head of this, that someone like you is thinking through this so deeply. Maybe one more question. Is there something you wish you'd known before you started surge? A lot of people start companies. They don't know what they're getting into. There's something you wish you could tell your earlier self.
Starting point is 01:01:11 Yeah. So I definitely wish I known that you could build a company by being heads down and doing great research and simply building something amazing. and not by constantly tweeting and hyping and fundraising. It's kind of funny, but I never thought I wanted to start a company. Like, I loved doing research. And I was actually always a huge fan of DeepMind because they were this amazing research company that got bought and still managed to keep on doing amazing science.
Starting point is 01:01:32 But I always thought that they were just magical IRA urine corn. So I thought if I started a company, I'd have to become a business person looking at financials all day and being in meetings all day and doing all this stuff that sounded incredibly boring. And I always hated. So I think it's crazy that didn't end up being true at all. Like I'm still in the weeds into data every day. And I love it.
Starting point is 01:01:52 Like I love that I get to do all these analyses and talk to researchers. And it's basically applied research where we're building all these amazing data systems that really push the frontier of AI. So yeah, I wish I know that you don't need to spend all your time fundraising. You don't need to constantly generate hype. You don't need to become someone you're not. You can actually build a successful company by simply building something so good that it cut through all that noise. And I think if I known this was possible, I would have.
Starting point is 01:02:15 started even sooner. So I hope you know that. That is such an amazing place to end. I feel like this is exactly a founders need to hear. And I think this conversation is going to inspire a lot of founders and especially a lot of founders that want to do things in a different way. Before we get to a very exciting lightning round, is there anything else you want to share anything else you want to leave our listeners with? We covered a lot of ground. It's totally okay to say no as well. So the thing I would end with is I think a lot of people think of data labeling as a really simplistic work. Like labeling cat photos and drawing bounty box on cars. And so I've actually always hated word data labeling because it just paints this very simplistic picture when I think
Starting point is 01:02:52 what we're doing is completely different. Like I think a lot about what we're doing as a lot more like raising a child. You don't just feed a child information. You're teaching them values and creativity and what's beautiful and these infinite subtle things about what makes somebody a good person. And that's what we're doing for AI. So I just often think about what we're doing as almost like the future of humanity or how are we raising humanity's children. So I'll leave it at that. Well, I love just how much philosophy there is in this whole conversation that I was not expecting. With that, Edwin, we've reached our very exciting lightning round. I've got five questions for you. Are you ready? Yeah, let's go. Here we go. What are two or three books that you find yourself recommending most other people? Yes. So, three books I often recommend are first, Story of Real Life by Ted Chang. It's my all-time
Starting point is 01:03:51 favorite short story, and it's about a linguist flirting in alien language, and I basically read it every couple years. And that's what the Interstellar was about? Is that... Yeah, so there's a movie called Arrival. Which is based on the story, which I love as well. Great. Okay, keep going. And then second,
Starting point is 01:04:07 Myth of Sisyphist by Camus. I actually can't really explain why I love this, but I always find a final chapter, somehow are really inspiring. And then third, Le Tombo de Maro by Douglas Hofstadter. And so I think Gerdot O'Sherbach is his more famous book, but I've actually always loved this one better. It is he takes single French poem and translates it 89 different ways
Starting point is 01:04:28 and discusses all the motivations behind each translation. And so I've always loved the way embodies this idea that translation isn't this robotic thing that you do. Instead, there's a million different ways to think about what makes a high-quality translation, which makes a lot of the ways I think about data and quality and alums. All these resonates so deeply with all the things we've been talking about, especially that first one, if that was your goal after school, is like, I want to help translate alien language.
Starting point is 01:04:53 I'm not surprised you love that short story. Next question, do you have a favorite recent movie or TV show you've really enjoyed? One of my new all-time favorite TV shows is something I found recently. It's called Travelers. It's basically about a group of travelers from the future who are sent back in time to prevent our call clips. Sorry, I just wrote that same section. And then I actually just rewatched contact, which is one of my all-time favorite movies. So, yeah, one of the things you notice about me is that, yeah, I love any kind of book or film that involves scientists,
Starting point is 01:05:24 deciphering alien communication. Again, just this dream I always had as a kid. That's so funny. Yeah, I love it. Okay. Is there a product you recently discovered that you really love? So it's funny, but I was in SF earlier this week, and I finally took away Mofer for the first time. Honestly, it was magical and it really felt like a living in future.
Starting point is 01:05:43 Yeah. the thing that you can people hype it like crazy but it always exceeds your expectations. It deserves the hype. It was great. Yeah, it's absurd. It's like, holy moly. Like if you're not an SF, you don't realize just how common these things are. They're just like all over the place. Just driverless cars constantly going about and when you like go to an event at the end, there's just like all these waymos lined up picking people up. Yeah. Yeah, Waymo. Good job. Good job over there. Do you have a favorite life motto that you find yourself coming back to in work or in life. So I think I mentioned this idea that founders should build a company that only they could
Starting point is 01:06:15 build, almost like it's this destiny that their entire life and experiences and interests shaped them towards. And so I think that principle applies pretty broadly, not just the founders, but the people create anything. Well, let me follow that thread to enlightening this answer. Do you have any advice for how to build those sorts of experiences that help lead to that? Is it, you know, follow things that are interesting to you? Because, you know, it's easy to say that it's hard to actually acquire these really unique sets of experiences that allow you to create something really important. Yeah. So I think it would always be to really follow your interests and do what you love. And it's almost like a lot of decisions I make about surge. Like I think one of the things that I didn't think about a couple years ago, but didn't someone said it to me. It's that companies in a sense are an embodiment of their CEO. And it's kind of funny. I hadn't thought about that because I never quite knew. what a CEO did. I always thought a CEO was kind of generic and it's like, okay, you're just
Starting point is 01:07:12 doing whatever VP's in your board and whatever tell you to do and you're saying yes to decisions. But instead, it's this idea where when I think about certain big, hard decisions we have to make, I don't think what would the company do. I don't think what metrics are we trying to optimize. I just think what do I personally care about? Like, what are my values and what do I want to see happen in the world? And so I think following that idea about, okay, So ask myself, what are the values you care about? What are the things you're trying to shape and not what will look at on the dashboard? I think that that results are pretty important.
Starting point is 01:07:49 I love how you're just full of endless, beautiful, and very deep answers. Final question. Something that you got quite famous for before starting surge is you built this map at Twitter while you were at Twitter that showed a map of the world and how and what people called whether they called it soda or pop. I don't know if it was called soda or pop. What was the name of this map? Yeah, it was like the soda versus pop data set. Soda versus pop map.
Starting point is 01:08:16 And so it's like a map of the United States and tells you where people say pop versus soda. So do you say soda or pop? So I say soda. I'm a soda person. Okay. And is that just like that's the right answer? Or it's like whatever you are, it's totally fine.
Starting point is 01:08:32 I think I'll look at you a little bit funny. You say pop and I wonder where you came from. But I won't. I won't scorn you too. much. That's how I feel too. Edwin, this is incredible. This was such an awesome conversation. I learned so much. I think we're going to help a lot of people start their own companies, help their companies become more aligned with their values and just building better things. Two final questions. Where can folks find you online if they want to reach out? What roles are you hiring for?
Starting point is 01:08:58 How can listeners be useful to you? Yeah. So I used to love writing and blog, but I haven't had time in the past few years. But I am starting to write again. So definitely check out a surge blog. H.k.a.a.s. blog. And, yeah, hopefully I'll be running a lot more dare. And I would say we're definitely always hiring. So for people who just love data and people who love this intersection of math and language and of your science, if I reach out anytime. Awesome.
Starting point is 01:09:24 And how can listeners be useful to you? Is it just, I don't know. Yeah. Is there anything there? And he asks. So I would say definitely tell me blog topics that you like me to write about. Okay. And then I'm always fascinated by all of these AI failures.
Starting point is 01:09:38 that happen in the real world. So whenever you come across a really interesting failure, that I think illustrates some deep question about how we want models to behave. There's just so many different ways that model can respond. I just often sometimes think they're just not a single right answer.
Starting point is 01:09:53 And so whenever there's one of these examples, I just love seeing them. You need to share these on your blog. I'm also, I would love to see these. Edwin, thank you so much for being here. Thank you. Bye, everyone. Thank you so much for listening.
Starting point is 01:10:08 If you found this valuable, You can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lenniespodcast.com. See you in the next episode.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.