Lenny's Podcast: Product | Career | Growth - Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Starting point is 00:00:00 Our question that got asked a lot and a lot is how do we keep up to date with the latest AI news? Why do you have to keep up to date with the latest AI news? If you talk to the users, you understand what they want or they don't want, look into the feedback. Then you can actually improve the application way, way, way more. A lot of companies are building AI products. A lot of companies are not having a good time building AI products. We are in an idea crisis. Now, we have all this really cool tools to have everything from scratch.

Starting point is 00:00:24 It has your design. It can have your write code. You can have your website. So in theory, we should see a lot more. But at the same time people are somehow stuck. They don't know what to build. All those AI hype, the data is actually showing most companies try it, doesn't do a lot. They stop.

Starting point is 00:00:37 What do you think is the gap here? It's really hard to measure productivity. So I do ask people to ask their managers, would you rather help give everyone on the team very expensive, coding Asians subscriptions? Or you get an extra headcount. Almost everyone, the managers would say headcount. But if you ask VP level or someone who manage a lot of teams, they would say one AI assistant. Because as managers, you're not.

Starting point is 00:00:59 are still growing. So for you, having one HR has a crowd is big. Whereas for executive, maybe we have more business metrics that you care about. So you actually think about what actually drive productivity metrics for you. Today, my guest is Chip Huen. Unlike a lot of people who share insights into building great AI products and where things are heading, Chip has built multiple successful AI products, platforms, tools. Chip was a core developer on Nvidia's Nemo platform, an AI researcher at Netflix. She She taught machine learning at Stanford. She's also a two-time founder and the author of two of the most popular books in the world of AI, including her most recent book called AI Engineering, which has been the most read book on the

Starting point is 00:01:39 O'Reilly platform since its launch. She's also gotten to work with a lot of enterprises on their AI strategies, and so she gets to see what's actually happening on the ground inside a lot of different companies. In her conversation, Chip explains a lot of the basics, like what exactly does pre-training and post-training look like? What is RAG? is reinforcement learning, what is RLHF? We also get into everything she's learned about how to build great AI products,

Starting point is 00:02:03 including what people think it takes and what it actually takes. We talk about the most common pitfalls that companies run into, where she's seeing the most productivity gains, and so much more. This episode is quite technical, more technical than most conversations I've had, and is meant for anyone looking for a more in-depth conversation about AI. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. And if you become an annual subscriber of my newsletter, You get a year free of 16 incredible products,

Starting point is 00:02:31 including Devon, lovable, replet, bolt, N-8N, linear superhuman, D-Script, whisperflow, gamma, perplexity, warp, granola, magic patterns, recast JIPRD, and Mobbin. Head on over to Lenny's newsletter.com and click ProductPass. With that, I bring you Chip when, after a short word from our sponsors. This episode is brought to you by Descout.

Starting point is 00:02:50 Design teams today are expected to move fast, but also to get it right. That's where DeScout comes in. D-Scout is the all-in-one research platform built for modern product and design teams. Whether you're running usability tests, interviews, surveys, or in the wild fieldwork, D-scat makes it easy to connect with real users and get real insights fast. You can even test your Figma prototypes directly inside the platform. No juggling tools, no chasing ghost participants.

Starting point is 00:03:17 And with the industry's most trusted panel plus AI-powered analysis, your team gets clarity and confidence to build better without slowing down. So if you're ready to streamline your research, speed of decisions, and design with impact, head to Descout.com to learn more. That's dsc-o-U-T.com. The answers you need to move confidently. Did you know that I have a whole team that helps me with my podcast and with my newsletter? I want everyone on that team to be super happy and thrive in their roles.

Starting point is 00:03:47 JustWorks knows that your employees are more than just your employees. They're your people. My team is spread out across Colorado, Australia, Nepal, West Africa, San Francisco. My life would be so incredibly complicated to hire people internationally, to pay people on time and in their local currencies, and to answer their HR questions 24-7. But with JustWorks, it's super easy. Whether you're setting up your own automated payroll, offering premium benefits, or hiring internationally, JustWorks offer simple software and 24-7 human support from small business experts for you and your people. They do your human resources right so that you can

Starting point is 00:04:22 do right by your people. JustWorks. For your people. Chip, thank you so much for being here and welcome to the podcast. Hi, Lenny. I've been a big fan as a podcast for a while, so I'm really excited to be here. Thank you for having me. I want to start with this table slash chart that you shared on LinkedIn a while ago that went super viral and I think it went super viral because it hit a nerve with a lot of people. And let me just read this and we'll show this on YouTube for people that are watching. So it's this very simple table you share it of what people think will improve AI apps and what actually improves AI apps. What people think will improve AI apps, staying up to date with the latest AI news, adopting the newest

Starting point is 00:05:03 agentic framework, agonizing what vector databases to use, constantly evaluating what model is smarter, fine-tuning a model. And then you have what actually improves AI apps, talking to users, building more reliable platforms, preparing better data, optimizing into end-to-end workflows, writing better prompts. Why do you think this had such a nerve with people? And just what, if you had to boil it down, what do you think is, what do you think people are missing about building successful AI apps. What I mentioned that could ask a lot and a lot is that how do we keep up to date with the latest AI news?

Starting point is 00:05:35 And I'm like, why do you need to keep up to date with the latest AI news? And I know it's how very counterintuitive, but there's so much news out there. A lot of people also ask me questions like, how do I choose between two different technologies? Like maybe like recently like MCP versus like Asians, right, like protocol and it was like, which one is better or like this or that? And the same is a serious question you should ask them. It's like, first, like, if, how much of the improvement could you get, like, from, like, optimal solutions versus non-optimal solutions, right?

Starting point is 00:06:07 And sometimes you were, like, actually, it's not much, right? And I was like, okay, if it's not much improvement, that why do you want to spend so much time debating something that doesn't make a much difference to your performance? And another question they asked is, like, if you adopt a new technology, like how hard it could be, shouldn't switch that out to your enough? And sometimes it were like, oh, I think it would be like a lot of work switching it out. And I was just like, hmm, let's say he's a new technology.

Starting point is 00:06:34 It hasn't been tested by a lot of people. And if you adopt it, it would be stuck with it forever. Like, do you actually want to adopt it? Right. Maybe you want to think twice about like overcommit to like new technologies that hasn't been better tested. I love your just broader advice is just simple. Like to build successful apps, talk to users, build better data.

Starting point is 00:06:56 write better prompts, optimize the user experience. Versus just like, what is the latest and greatest? What's the best model to use right now? What's happening in AI? Let me follow this thread of this idea of fine-tuning and basically post-training. There's all these terms that people hear in AI, and I think this is going to be a really good opportunity for people to learn what we're actually talking about. Since you actually do these things, you build these things, you work with companies doing these things.

Starting point is 00:07:20 And there's a few terms I want to sprinkle in through the conversation. But let's start with this one. what's the simplest way for someone to understand? What is this difference between pre-training and post-training and then just how fine-tuning fits into that, just what fine-tuning actually is? Disclamor, I don't have, like, full visibility into what, like, this big secretive, like, frontier labs are doing.

Starting point is 00:07:40 But right from what I heard, right? So I think it's like one is like supervised fine-tuning when you have demonstration data, and you have like a bunch of like experts, okay, here's a prop, right? And here is what the answer should be like. And you just train it on like to like simulate, like emulate what the human expert could be like. And that's also like what a lot of people would like, so the open source models are doing as they do it by distillation. So instead of having human experts to like write really good sounding great answers to like prompts,

Starting point is 00:08:14 they get like very popular, famous, good models to like jerry the response to it and like getting this trained smaller model to emulate. So sometimes you see people like, so that's because I really appreciate open source community by the way, but like going from like having been a widget trainer models that can emulate a existing good model. It's very different from

Starting point is 00:08:36 being in a chichity trained good models like an output for existing good model. So it's a big step there. So yeah, so like we have my supervised fire tuning and another thing that's like very big. I'm not sure you have guests talking about it already, but like reinforcement learning. It's like everywhere.

Starting point is 00:08:52 Let's pull on that because I would definitely want to spend time on that. And that's such a cool topic that's merging more and more in my conversations. But just to even summarize the things you just shared, which I think is really, really important stuff. So the idea here is a model essentially this algorithm piece of code that someone writes, and say the frontier models are feeding it just like the entire internet of content. And basically it's trying to test itself on predicting across all that data, the next word. Essentially, token is the correct way to think about it, but a simpler way to think about is like the next word in text.

Starting point is 00:09:25 And as it gets it wrong, it adjusts these things called weights, essentially. Just like, is that a simple way to think about it, even though that's, even that's just like very surface level? So I think of language modeling as a way of encoding statistical information about language, right? So let's say that we both speak English. So we kind of get a sense of like what is more statistically likely. Like if I say my favorite color is, then you would like, okay, that should be another color. Like the word blue would be much more likely to appear than the word like table, right?

Starting point is 00:09:58 Because statistically blue is more likely to get my favorite color is. So it's a sign get is, it's a way of encoding information. So like when language modeling, when it's a large amount of data, like it sees a lot of languages, a lot of domains. So you can tell, like, okay, you guys say this standard, then it uses a prompt. would come like with the next most likely token. So by the way, it's not a new idea. Actually, I really, so it's an idea that comes very, very old, like from the 1951 papers. The English entropy, I think it's like Kloshenan.

Starting point is 00:10:32 It's a great paper. And I think there's a story I really like is from, did you read Sherlock Holmes, by the way? Yeah, I read a few Sherlock Holmes, yeah. Yeah, so this is a story of when Sherlock Holmes was using this statistical information to, like, have shown a case. So he was getting, so this is this story. There is somebody left message with a lot of like stick figures. So Stratle Hall was like, okay, he knows that in English, the most common letter is E. Then the most common stick figure must be E.

Starting point is 00:11:03 Right. And then he goes, he starts like that. It was really so, so the code. So I think there's language. So in a way, it's like simple language modeling, right? But instead of like at a word level, he does it as like character level. And token is something in between, right? A token is not quite a word, but it's bigger than a character.

Starting point is 00:11:22 So let's say we say token because it helps us like read what, how does reduce vocabulary because with character is like smallest amount of like vocabulary right now. So I'm having like 26 character, but words can have like millions and millions, right? Whereas tokens you can like be able to like get like the sweet spot be the two. So let's say that we have like a new word like, how to say like podcasting, right? Let's say it's a new word, but it can divide in a podcast and ink.

Starting point is 00:11:54 So people want to say, okay, podcast, we know the meaning. You know that ink is like a verb, like gerund, whatever it is. So we know the word like podcasting. So that's why the token comes in. But yeah, that's like the pre-tuning is basically like encoding statistical information of language to have you predict what is most likely. I think that most likely is a more simple way of doing it, because it's more like building distributions of like, okay, so next token could be like 90% of the channel, it could be like

Starting point is 00:12:25 a color, like 10% of the time could be nice something else, right? So it's basically distribution, so language would like pick, like, depending on your sampling strategy. Like do you want it to always pick the most likely token or do you want it to pick something more creative? You know, so, so I think my sampling strategy, I think is something extremely important. important. It can have your boosts a performance in a huge way and very, very underrated. Okay, awesome. So essentially, a model is just code with this whole set of weights, essentially the statistical model that has learned to predict what comes next after certain

Starting point is 00:13:01 words and phrases. Yeah. And then post-training and fine-tuning specifically is doing that same thing. So pre-training, you get like GPT-5. Fine-tuning is someone taking GPT-5. And and doing the same sort of thing, adjusting these weights a little bit for specific use cases on data that they find as necessary to do their very specific use case. Is that a simple way to think about it? Yeah, I think like weights as like functions, right?

Starting point is 00:13:27 So let's say it's just like you have, maybe it has a functions of like maybe Lenny's height is maybe like 1x, like 1x plus something, like 2x, like 1 and plus something is the weight, right? So you change it until you fit the, So there's the correct data, which is like my height and your height, right? So you can take the weight is just like a weight like they function. So you like chain adjust the weights so they can fit the data, which is a training data.

Starting point is 00:13:54 Awesome. Okay. So we're talking about pre-training, post-training, fine-tuning. Is there anything else here that's important to share about just like what this is exactly, what people need to understand about these parts of training? So the first majority of time, we don't touch on like pre-train anymore. As users, we don't use it. Right. It's already done for us.

Starting point is 00:14:13 Yeah. So I think my action is a bit of fun process, like, when my friend's training models, I try to play with their pre-truiting model and they're horrendous. They're like saying taste as like, no, it's like, oh my gosh, it's like, yeah, it's crazy. So it's very interesting to look at like how much of like post-training can change the model behavior. Yeah, and I think that's where like a lot of time is that a lot of people are spending energy on nowadays. They function a lap is on like post-training. because pre-training, I think, so pre-training have been used to, like, increase the general capacity of a model, capabilities of a model.

Starting point is 00:14:51 And it depends on, it means a lot of data and, like, model size, like, to increase, to increase the model capabilities. And at some point, we are actually, like, have kind of max out on the Internet data, right? And people, like, text data, baby, max out. I think a lot of people are doing, like, with other data, like, audios and videos, and videos. And everyone's trying to think of, like, what is the new source of data. But we're like post-trading, but like, middle course of like, is it more of like, everyone can have very similar pre-trading data. It's that post-trating is where they make a big difference nowadays. This is a good segue to, you talked about supervised learning versus unsupervised learning.

Starting point is 00:15:24 I love we're getting into this, by the way. This is super interesting. So you're talking about labeled data. Basically supervised learning is AI learning on data that somebody has already labeled and told it. Here's correct versus incorrect. For example, this is spam versus not spam. This is a good short story. This is not a good short story.

Starting point is 00:15:41 We've had the CEOs of a lot of these companies that do this for labs, Mercor and scale, handshake, there's micro, there's a few others. So is that essentially what these companies are doing for labs, giving them label data, high quality data to drain on? It is in a way, but I think it's more like a product of big equations. So there are a lot more different components than that. So that's why we're talking about reinforcement learning. I'm not sure if you're a CEO that you interview bring up like that.

Starting point is 00:16:10 term. So the idea is that you want people to like, so like let's say you have a model, give the model like a prompt, right, and it produce an output, right? You want to buy, like once you reinforce the model to produce an output that is better, right? So like that now like now it comes to like how do we know that the answer is good or bad, right? So easily people realize on like signals. So one way to get like a first one good or bad is like human feedback, right? It happens we have two responses. You can, okay, this one is better than the other. And we do that is because, like, as humans, we tend to, it's very hard to give, like, concrete score, but it's easier to do comparisons, right? Like, if you ask me, okay, give this

Starting point is 00:16:54 song a score, I'm not a musician, like, and don't know, like, how hard it is, like, yeah, I don't know, like, what, like, how 10, I'm going to six, you know, and then, if you ask me, again, a month from now, and I completely forgotten, and say, okay, maybe now seven, only four, I don't know, but then if you ask me, okay, here are two songs, and which one good you prefer to play for the birthday party. It was like, okay, I can't play it before this song. So like comparison is a lot easier. So you have human feedback, and then you use this human feedback to trade a reward model. So you like tell, like, and then the reward model will help you, like, okay, the model will produce this response. It's a robot model can score. Is this good or bad?

Starting point is 00:17:31 And you're charging bias toward, like, producing better model, the better responses. Another way is that you can instead of using a human, you can use like AI, right? Like, because of it. response, say yes, good or bad, right? Or sometimes the thing is that people are very big on nowadays, like verifiable rewards, which is like kind of natural. So basically, they give it a math problem and then math solutions. Like, it's a model output solution. It's, you know, so okay, so expected response should be in a 482 and it doesn't provide 402, then it's wrong, right? And it's not a good response. So, so, yeah, so like a lot of time people are using this, human labor, or like human labor should like produce like math,

Starting point is 00:18:10 like how does I say expert questions and I say expected answers. And in the ways it's like designed systems that like verifiable. So that the models can be trained on. Okay, I'm really glad you went there. This is essentially RLHF reinforcement learning with human feedback, which is exactly what I wanted to also talk about, right? Yeah. So I think it's like it's general.

Starting point is 00:18:31 It's like it's a way of learning. It's like training is going to be able to learning. and whether it learned from human feedback or like AI feedback or like very terrible rewards. I think I say it's just a different way of like clipping signals. Awesome. Yeah. We had that

Starting point is 00:18:46 C of Anthropic on the podcast and he talked about their version of RLHF, which is AI-driven reinforcement learning. I love the way you phrased it where you basically, you want to help the model, you want to reinforce correct behavior and correct answers and this is the method to do it, whether it's say an engineer, seeing

Starting point is 00:19:02 an output from a model being like, no, here's how I would code it differently. And then training, and it's training a different model that the original model works with to tell it, am I correct or not correct? Is that right? Yeah. I think that's a way of looking into it. And I think that's a space is so exciting nowadays because there's so many, like, domain

Starting point is 00:19:24 experts tasks that the model, like that model developers want models to do well on, right? Let's say you're like accountant, right? Like maybe once you use a model, they have an accounting task. So I need a lot of, like, accounting data, like examples for my accountant. So you need to hire a lot of them to, like, do it. Or everyone's a physics problem. I want to do, I don't know, like legal questions and stuff, or, like, engineering questions. Or, like, somebody was telling me they want you to do, like, using, like, coding for,

Starting point is 00:19:52 to solve scientific problems and not just, like, coding to build a product. Which is another different whole realm of things. And I also, like, using very specific tooling. Like, yeah, like, I'm not sure what apps you use, but maybe, like, for aiding app or like QuickBooks or like Google Excel. Like they have very specific like tool specific expert expertise. So you want the models which you learn. So like they need a lot of like humans experts in this area to like create data to trade them.

Starting point is 00:20:19 And it's a massive thing. It's like people because everyone wants a lot of data and like won't slaps at like unlimited budget. But whether I think this is so like a little bit of low key interesting economics. I'm not sure you've talked to like the guess about. I thought it's very interesting if I think about, because it's very lopsided, right? Because, like, they only, like, a very small number of Frontier Labs, right? And they want a lot of data.

Starting point is 00:20:44 And there's, like, a massive amount of, like, startups or companies of providing data. So, like, you can see these companies, like, this startup, like, doing data labeling, that they have, like, maybe they have, like, massive AR. But you're also, like, like, okay, so how many customers you have? And they could be, like, a very small numbers. I'm not sure. I'm not sure you, you, you, so you're smiling. Yeah, we chat. We chat about that.

Starting point is 00:21:05 Yeah, so I'm like a bit bit like, look me uneasy. I have a company is growing like crazy, but it's like heavily dependent on like two or three companies. And at the same time, like if I was this company from TLAF, what would be the right economic thing for me to do, right? Now I want a lot of startups. I want to have a lot of providers so you can pick and choose. And then these providers can also like to compete each other

Starting point is 00:21:30 to lower the price and it's so dependent on me. it would sound to me regardless. So I feel like, yeah, so, so, you know, this economics, the whole economics is very interesting to me, and I'm curious to see how it plays out. What I'm hearing is you're bearish on the future of these data labeling companies because, as you said, they don't have a lot of leverage over pricing because they have so few customers,

Starting point is 00:21:51 and there's so many people getting into the space. So basically, even though there's some of the fastest growing companies in the world, you're feeling like there's a challenge up ahead. I'm actually having some bearish on it. I think I'm curious because I think things have has a way of work out in ways that I don't expect. So I think that maybe these companies, they have a lot of data, maybe they wouldn't be able to use that to like have some insights that helps them like stay ahead of the curb. You know, so I don't know. A very fair answer.

Starting point is 00:22:24 Okay, while we're on this topic, I want to chat about evals, which is a very recurring topic in this podcast. This is the other piece of data content these companies share that AI labs really need. Can you just talk about what an Eval is the simplest way to understand it and then how this helps models get smarter? So I think people approach Eval. I think they're like two very different problems. One is a app builder, right? And like can I say have an app that do like maybe a chatbot?

Starting point is 00:22:54 It's very simple. It's the first thing I came to my mind. And I want you to you. to know each chatbot is good or bad, right? So I need to come away with like if I let's the chatbot. Another thing is, I think of this as a task-specific evolve design. So let's say I'm a model developer and I want to make my model better at curb writing. Right?

Starting point is 00:23:14 And I was like, okay, but how do I even imagine current right? So I even need someone to like, okay, understand curb writing and think about like, what makes good story, like what makes a story good? and then designed the whole dataset and that criteria to evaluate creative writing. So, yeah, so I think there's that, I think it's more like eval design. That is very interesting. Kameh work criteria, come I work guide, how to do it. And then also, like, train people, like how to do it effectively.

Starting point is 00:23:47 So I guess, in a case, I think evolve is really, really fun because it's extremely creative. I was looking at, like, different evolves and people were built. And it was like, wow. Like, is this not dry at all? It's just like super, super, super fun. We had a whole podcast and e-vals with Hamill and Shreya. And that's exactly what they talked about. It's just it's actually really fun to create evils for, for companies especially.

Starting point is 00:24:10 So let's still dig into that one a little bit more. There's this kind of debate online that I don't know how big of a deal this debate is, but it feels like people spend a lot of time thinking about this, this idea of, do we need evals for AI products? Some of the best companies say they don't really do e-vails. they just go on vibes. They're just like, is this working well? Can I feel it or not? What's your take on just the importance of building e-vals

Starting point is 00:24:33 and the skill of e-vals for AI apps, not the model companies? You don't have to be like absolutely perfect a thing to win. You just need to be like good enough and being consistent about it. Okay, this is not the philosophy I follow, but like I have worked with enough companies to see that play out.

Starting point is 00:24:53 So when I say like why company don't need eval, right? let's say you are like an executive, right? And you want to have a new use case. So here's a use case you started out. It built and it's like it works well, right? The customers are somewhat happy. You don't have the exact metric for it. But like, so traffic keeps increasing, like people seem happy.

Starting point is 00:25:08 People keep buying stuff, right? And now here's our engineer coming like, okay, we need Eval for it. And so it's not an exacting. It was like, okay, how much effort do we need to go into Eval? And they were like, okay, maybe like two engineers as much as much. And they could maybe, would improve that. And it was like, okay, so how much expected? can I get from it? And the engineer would be like, oh, maybe you can improve it from like

Starting point is 00:25:28 80% to like 82% and 25% right. And I was like okay, but it will take like that two engineers and be able to launch a new future. Then it could give me like so much more like improvement, right? So I think it's like one of them I say eva, sometimes people think of Evar is like, okay, this is good enough to touch it. Like if you do spend a lot of energy on Evar, it would like only incremental improvement where it expands the energy on like another use case. And maybe you know, it's scared that you're good enough because it's vibe check it, right? So, so I do things that's like, maybe like that's a deep bit it's about. I do things that's like a lot of times people just like get things to the place

Starting point is 00:26:04 when it's like, okay, good enough. People run. But, and then, but of course, it's like there's a lot of risk associated with it because if we don't have a clear metric, you have good feasibility to how the application and the morrow is performing. It might do something very dumb or it can cause you like, I know, something that crazy can happen. So, yeah, so, so I do think Eval is very, very important if you have, if you operate a scale

Starting point is 00:26:33 and where, like, failures can have, like, catastrophic consequences, then you do need to be very tyrannical about, like, what you put in front of the users, understand different failure modes, like, what could go wrong. And also maybe in a space when it's a feature, as a product is as a competitive advantage, right? You want to be the best at it. So you want to have, like, a very strong. understanding of like where you are and like where you are with the competitors. But it's just something that's like more like a low key. Okay, it's like something is like,

Starting point is 00:27:01 okay, it's not the core or like it helps with our users. Then maybe you don't need to be so so obsessed or like theoretical about it. It's like, okay, that's good enough for now. And if it fails, then it fails. Like, okay, I know it's like it's such terrifying, but like, yeah. Yeah. I think it's on about the question of like, regional investment. I'm a big fan of Eva. I love reading Eva. And I say it's like, I understand. why some people would choose to not focus on about right away and choose bringing on new functionalities instead. Awesome. That is a really pragmatic answer.

Starting point is 00:27:34 What I'm hearing is e-vals are great, very important, especially if you're operating at scale, but pick your battles. You don't need to write e-vails for every little feature. Something that Hamlin Shreya shared is that people need just like, I don't know, five or seven e-vails for the most important elements of their product. Is that what you see or do you see a lot more in production that people build and need? I don't think of like just a fixed number on like the evolves like what was the going to evolve right as a goal of Eval is to guide the product development so so like you see

Starting point is 00:28:05 eval because I think I'm a big fan of Eval is that it helps you uncover opportunities where the progress are doing well so sometimes we're seeing a very office it was like okay we look at EBA and we realize it's like okay it performed really poorly on this like specific segment of users and then we're looking at you like what what's what's what what's wrong with it. And it turns out it's like we just like don't have a good messaging to it. So like maybe we should like just focus on the taste of building polio can improve significantly. Yeah, so I kind of like the number of evolve is really depends. Like we have seen product with like hundreds of different metrics.

Starting point is 00:28:40 Oh wow. Like going crazy. This is because like that product is like general, right? It has different types of like one evolve for like I don't know like a verbosity, have like one evolve for like user sensitive data. And like another is like for length. But, like, has a number of, like, okay, let's just play a good example, complete example, like, deep research.

Starting point is 00:29:00 So, so you have the application, you have, like, build a model to, like, do deep research for you, right? Like, okay, like, have a prompt. Like, me say, okay, do me your comprehensive research on online podcast and have me, like, propose, like, show me report on what kind of topics he's interested in, what kind of videos get the most views, or, like, what topics that he's missing on that he should cover it, right? Like, have this kind of, like, prompt. Then how do you evaluate the result, right? I don't think there's, like, one, like, metrics it would help. Maybe it's just like, maybe you

Starting point is 00:29:34 have, like, a hundred, I think somebody has a benchmark and they get, like, a hundred expert, like, write a bunch of prompts and they go through, like, all the, on the answers on AI, and, like, it's, like, it's extremely costly and slow, right? But if you might have something else, for example, like, one way I was thinking about it, I was talking to a friend about it, and, and one way is like, how do you produce the result of the summary? First, you need to go to gather information. And to gather information, you need to do a lot of search queries. You gather, grab the search results, and then some of the search reasons, you aggregate,

Starting point is 00:30:12 and then maybe say, okay, I'm still missing on this. You have to do another route, like another route, and then NCN has a summary. So every step of the way, you need evaluations. You don't need to the end. So maybe it was a search query, in my first thing about, like, okay, now I write five search queries. Am I looking to, like, how good is this search queries? Like, do they, like, as they, like, similar to each other? Because in the five search queries, that are very similar, like, okay, let me podcast,

Starting point is 00:30:35 then it, many podcasts, last month, landed podcast, like, two months ago, right? It's not, it's not very very exciting. But, like, if the quality is the podcast, like, the keywords are, like, more, more diverse, right? And then I look at the results of the search query, and they say you answer the search query, like Lenny Postcat data labeling. And then they come up with like 10 pages, 10 results. And then you come up with like, oh, Lenny podcast on, I don't know, like, Frontier Labs and have like 10 results.

Starting point is 00:31:05 And I look at the different web page, like how much of them overlapping. Like I would, I would do in both like the breadth, like getting a lot of page? But also like, do we have depth? And also they have relevant because we come up with the search queries that are completely irrelevant to the original problem. So I feel like every aspect of it would need a way of evaluating. Right. So I don't think it's like, how many evolve should I get?

Starting point is 00:31:28 But like how many evolve should, do I need to get a good coverage, a high confidence in my application's performance? And also to help me understand like where it is not performing well so that I can fix it. Awesome. And I'm hearing also just especially for the very core use case, like the most common path people take in your product is where you want to focus. Yeah, so yeah. Okay, let me, there's one more term I want to cover,

Starting point is 00:31:57 and I want to go to somewhat different direction. Rag, people see this term a lot, R-A-G, what does it mean? So RAC is then for retrieval augmented generations. It also not a specific true J-D-D-EI. So the idea is just like for a lot of questions, we need contacts to answer. So I think it came pretty, I think it's from the paper 2017. So someone was like,

Starting point is 00:32:21 So they realize it's like for a bunch of like benchmark, when the question answering benchmarks, they realize it's like, okay, if we give the model information about the questions, then the answer can be much, much better. So what they do is just try to retrieve information from Wikipedia. So for a question about topics, it's like retrieve that and then put into the context and like answer it does much better. So I feel like it sounds like a no-grinner, right? I mean like obviously.

Starting point is 00:32:46 So I think that's what racket as a simplest sense is just like providing the model with a relevant context so that they can answer the questions. And that's why things get like really more, more interesting because traditionally when it started out, a rack is mostly like text. So we talk about like a lot of ways like how to prepare data so that the model can retrieve effectively. Let's say it's like not everything is a Wikipedia page right. Like Wikipedia page is pretty contained and like you know, okay, everything is about it is about a topic. But a lot of times have documents, like, it's true many lot, right? And, like, they have a weird way of, like, structures of documents. Let's say that, you have documents about Lenny podcast, right? And in the future,

Starting point is 00:33:28 in the beginning, it's like, from now on, podcast wouldn't refer to Lenny's podcast, right? So let's say somebody in the future is like, okay, tell me about Lenny, right? And because the rest of the document does not have the term, Lenny, you just don't know, you might not retrieve it. And the document is long enough that it's chunk into a different part. So, like, the second part has doesn't have the word mimic, so you cannot reach it. So I have to find a way to process data so that makes sure it's like it can retrieve the information that's relevant to the query, even though it might not immediately like obvious that is related.

Starting point is 00:34:02 So people come up with like only thing if I think like contextual retrieval, like giving extravagance of the data that relevant like maybe in a summary metadata so that it knows. All the same people use it like as a hypothetical question, it's very interesting. like for even the chunk of like documents, I must generate a bunch of questions that the chunks can help answer. So it's like when I have a query, it's like, okay, does it match any of the like hypothetical questions? So it can fetch it. So it's a very interesting approach. Okay, so maybe before I go to the next thing, I just want to say this like data preparations for rack is extremely important. And I would say that's like in the, a lot of the companies that

Starting point is 00:34:42 I have seen, that's like the biggest performance in their rack solutions coming from like better data preparations, not agonizing over what very databases to use. We've got very database. Of course, it's very important to care about things like latency or like if you have like very specific access patterns, like read heavy or write heavy. Of course it's like it matters. But in terms of like pure quality answers, right, I think the data preparation is like hands out. When you say data preparation, what's an example to make that real and concrete for us to understand?

Starting point is 00:35:12 So like one way is just mentioned as in like you have like chunk. so data, so we think about like how big of each chunk should be, right? Because if it's like, so the thing about like if the context you want to maximize, maybe you can, it's a very simple example right now, you want to retrieve like a thousand words, right? So if X chums data is too long, then so if a data chunk is long, then it's more likely to contain more relevant metadata, so you can retrieve more. But if it's too long, like then you have a thousand words and so chunk is like a thousand words to get a rich one chunk, so it's not very useful. But if you choose short, then you can retrieve more relevant information.

Starting point is 00:35:53 Like also it can retrieve a wider range of like documents and chunks. But at the same time each chunk is too small, she contains relevant information. So we have like very nice like chunk design, like how big each chunk should be. You add like contextual information like summary, metadata, hypothetical questions. Somebody was telling me just like a very big performance I got is that from, um, rewriting their data in the question-answering format. So they have a podcast, right? Instead of it's just like, reframe, rewrite it into like, here's a question, here's answers,

Starting point is 00:36:28 and produce a lot of them. It can use AI for that as well. So that's one example of data processing. A lot of examples I see is like for people helping, like using AI, you have like specific tool news and documentations, right? And we write documentation usually to our document, documentation today is written for human reading. And AI reading is different because it's different because humans, we have like common sense.

Starting point is 00:36:52 And we can't know what it is. So one thing is all like, human for human experts, they have the context that AI doesn't quite have. So somebody told me that, like, what's a big change they have is like, let's say that you have a function, a document, documentation for this, maybe the library. And the library says, okay, the output of this one is like maybe talking for, like, I don't know, some crazy term. crazy term, maybe there's some temperature or something under grab. It should be like one, zero or minus one. And as a human expert, maybe understand the scale, like, what one does this scale mean? But like for AI, it just really doesn't understand what that means. So,

Starting point is 00:37:27 so actually have like another annotation layer for AI. It's like, okay, what temperatures equal one means like that? It's not like it's an absolute temperature. It's more like, as associated with the scale over there. So like just saving all this data processing to make it easier for AI to retrieve the relevant information to answer the questions. This episode is brought to you by Persona, the verified identity platform helping organizations on board users, fight fraud, and build trust. We talk a lot on this podcast about the amazing advances in AI, but this can be a double-edged sword. For every wow moment, there are fraudsters using the same tech to wreak havoc, laundering money, taking over employee identities, and impersonating

Starting point is 00:38:07 businesses. Persona helps combat these threats with automated user, business, and employee verification. Whether you're looking to catch candidate fraud, meet age restrictions, or keep your platform safe, persona helps you verify users in a way that's tailored to your specific needs. Best of all, Persona makes it easy to know who you're dealing with without adding friction for good users. This is why leading platforms like Etsy, LinkedIn, Square, and Lyft, trust Persona to secure their platform. Persona is also offering my listeners 500 free services per month for one full year. Just head to withPersona.com slash Lenny to get started. That's withPersona.com slash Lenny. Thanks again to Persona for sponsoring this episode. Awesome. Okay. So you've talked a bit about

Starting point is 00:38:53 how you work with companies on these sorts of things, on their AI strategies, on their AI products, how they build, which tools they build, all these things. I want to spend a little time here. Because a lot of companies are building AI products. A lot of companies are not having a good time building AI products. Let me ask a few questions along these lines of what you've learned working with companies that are doing this well. One is just, I guess, in terms of AI tool adoption and adoption in general within companies, there's all this talk recently of just like all this AI hype.

Starting point is 00:39:21 The data is actually showing most companies try it. It doesn't do a lot. They stop. And so there's all this just like maybe this isn't going anywhere. So in terms of just adoption of tools and AI within companies, what are you seeing there? For Gen AI in company, I think they're two. type of Gen AI tooling that have been, I have seen, like, once is to, like, internal productivity, right? Like, have coding tools, like chatbot, internal knowledge, like, a lot of big

Starting point is 00:39:48 enterprises have some kind of, like, a rubber, like, model. So, but, like, with access, like, maybe some different kind of a rack solution, I think we'd talk about data, a kind of, like, text-based rack. I haven't talked about, like, agentic rack or, like, haven't so, like, Montemotor rack yet, but it's like, yes, it's a whole very exciting area around that. Yeah, so like, basically to allow the employee to, like, access internal document. Some ways I'm going to ask, like, okay, I'm having a baby. What could be the maternal or paternal policy, right? Or, like, am I having these operations? Could the hell benefit, like, cover that? Or, like, I want to, like, interview, or I want to, like, refer my friend, but could be the process for that. So a lot of

Starting point is 00:40:30 it's like having chatbot internal chatbot to help with internal operations. And another thing, another category is more like customer facing. So or like partner facing. So what a customer support chatbot is a big one. If a hotel chain, you might have like a booking chatbot, which is like somehow massive, like a lot of booking chatbot because I guess it's, it's, I do have this theory of like a lot of applications are companies pursued because they can't measure the concrete outcomes. And I feel like booking on a sales chatbot is very clear, right? There was a conversion

Starting point is 00:41:04 rate right now with a chatbot with human operators and what could be a conversion rate with a chatbot. And it's something I think it's like very clear outcomes and companies are easier to buy into this solutions. So a lot of companies have that like customer facing chatbot. So yeah, so that is another category of pool. And I think that, um, I don't know for customers or external facing tools because people are driven to, people are driven to choose applications with clear outcomes. So the questions of adopting them is really based on whether they see the outcome or not. Of course, it's not perfect because sometimes the outcome can be bad,

Starting point is 00:41:48 not because the idea or like the applications, idea, sell is bad. It's just because the process of building it is like not that great. Yeah, so it's tricky. For the internal adoptions of like tooling, like internal productivity, that's where it gets tricky. I would say like a lot of companies, what's the thing of AI strategy?

Starting point is 00:42:09 Like I think of AI strategies have like usually have very, have like two key aspect, right? It's like use cases. And the second is talent. You might have like great data for great use cases, but you don't have talents and you cannot do it. So a lot of time in the beginning with Gen. And it's still,

Starting point is 00:42:25 and sometimes I'm really admired a lot of companies for that It's just like, exactly, we need our employees to be very gen AI aware, like very AI literate, right? So what they do is as I start like, maybe like adopting a bunch of tools for the team to use. They have an upskilling workshops, like they anchorage learning. And it's like a really, really good thing. And it's also like willing to spend a lot of money into like adopting like giving people like Chachapiti subscriptions, cursory subscriptions, cloud code subscriptions.

Starting point is 00:42:56 to get the employees to be more AI literate. And that's the thing. It's like a lot of the security in the country may say, okay, we spend a ton of money as it's too late. But then we don't see, because you can see the usage. And it's like, but people don't seem to use them as much. And what is the issue? So, yeah, so I think that is tricky.

Starting point is 00:43:19 What do you think is the issue? Is it just they're not, they're like, they don't know how to use them? Like, what do you think is the gap here? Do you think we'll get to a place of just like, wow, work is completely different because of AI for a lot of companies? The main thing is like it's really hard to measure productivity again. So I taught you a lot of people on the world side. First of all, on the example, it's coding, right?

Starting point is 00:43:41 A lot of companies are not using coding agents or like coding AI acid coding. And I was asking, I was like, do you think that like it helps with your productivity? And a lot of times the questions are very hand-weighting. Just like, okay, say we're like, okay, I feel like it's been better, right? And I said, okay, because we have more PRs, we see more code and then immediate correctness. Okay, but I, of course, code number of life is not a good metric for that. Right. So it's really, really tricky.

Starting point is 00:44:11 And it's something funny. So, so I do ask people to ask their managers because I work with like either the VP level, so you have like multiple teams under them. So I asked them like, okay, do you ask some managers? like, okay, would you rather have access, would you rather give everyone on the team like very expensive coding agent subscriptions or you get an extra headcount, right?

Starting point is 00:44:36 Let's say it's like maybe like, and almost everyone could say the managers could say headcount. But if you ask VP level, or like someone who managed a lot of teams, they would say it's like they could want AI, assist them, assistive tools. And the reason is that people say like, okay, because as managers, right, because you are still growing.

Starting point is 00:44:55 Like, you're not as a level when you manage hundreds of thousands of people. So for you, like, having one HR headcount is big. So you want that not for productivity reasons, but because you just want to have more people working for you. Whereas for executive, you care more about, like, maybe you have more like business metrics that you care about. So you actually think about what actually drive productivity metrics for you. So, yeah, so it's tricky.

Starting point is 00:45:24 And I think that's like the question of like productivity is not, I'm not sure it's like fundamentally is the subject, but it's just like we don't have a good way of measuring productivity improvement. Another thing is also very widely. And I think that people do tell me that they notice different buckets of employees, like different reactions to AI assistive tools. Like, first of all, I keep going by chip coding because it's a lot. is big and it's like easier to my reasons somehow.

Starting point is 00:45:55 So I say it's like, I have different reports. Like one team would tell me is that like, one of the people tell me, okay, amongst all his engineers, he thinks it's like senior engineers would get the most output, like would be more productive because like, okay, so that person is very interesting. So he actually divided his team to like three buckets, but he didn't tell them obviously. He was like, okay, here's more like currently like best performing, average performing. and lowest performing. And then there's a randomized trial.

Starting point is 00:46:24 So they give like half of each group access to like cursor. And then who's noticed like over time, it was like, okay, something funny, like the group that get the biggest performance boost, like in his opinion, like who's very close in his team, there's the biggest boom boost like the senior engine, like the highest performing. So the highest performing engineer get the biggest boost out of it. And then the second group is just like the average performing. So his opinion is like, okay, the highest performing engineers, they also know more proactive. They would say no such a soul problem.

Starting point is 00:46:58 So I have some sort problem better. Whereas the people who already have the lowest performing, they only don't care much about work. Right. So like this is easier to just like go on autopilot, get it to like Jarrett like that code and just like do it. And I always just don't know how to do it. Another company, however, they tell me just like actually senior engineers are the one most resistant to like to like use. using AI as this tooling because they said it's like, okay, but AI, because they are more opinionated and they have very high standard. It was like, okay, but AI code, Jared

Starting point is 00:47:29 cool just sucks. So just like very, very resistant in using this. So I don't know, I haven't quite be able to reconcile very different reports on that yet. This is so interesting. So just to make sure I'm hearing the story. So there's a company work with that did a three bucket test with their engineering team where they created three sorts of groups, the highest performing engineers, mid-performing engineers, lowest-performing engineers, and gave some of them, so they gave some of them access to, say, cursor. Was it cursor, or what did they give them access to? It was cursor. I think by saying it was cursor. Okay, cool. And so within I didn't work with them, this is more like a friend company. Okay, it's a friend's company.

Starting point is 00:48:09 So did they give like half of the higher performing engineers cursor and half not, or how did they do the split there? Yeah, so like they give like half of the entire company, but like half for each bucket, yeah. And then they observe the difference in like productivity. I see. Yeah. So how do they even do that? They're just like, okay, you get cursor, you don't get cursors. That how do they do that? Yeah, I didn't get just the mechanics of it. But I was like at respect here for doing a randomized trial. That is so cool. Yeah. Okay. Wow. How large was this engineering team? Was it like hundreds of people? It's not that large. It's about like maybe 30 to 40. Yeah. 30 to 40. Okay. Yeah. Wow. Okay. So they found that the high

Starting point is 00:48:47 highest performing engineers had the most benefit from using AI tools. And then behind them was the middle tier engineers and the worst performers. Yeah. But also not the same everywhere. Right, right, right. Right. Right. This other example we shared of just senior engineers in this one example are most resistant to changing the way they work, which I get.

Starting point is 00:49:11 I do feel like the most valuable people right now, other than ML researchers, and AI researchers like yourself are senior engineers because it feels like junior engineers are just like so much of this is now done by AI but an engineer that knows what they're doing that understands how things work at a large scale with AI tools

Starting point is 00:49:31 just basically like infinite junior engineers doing their bidding. It feels like an extremely valuable and powerful asset. Yeah, I definitely like really appreciate as you see companies like we appreciate engineers who are

Starting point is 00:49:44 have a good understanding of the whole systems and being able to have good problem-solving skill or thinking holistically instead of like locally. Or when our company have seen the way they work, as they told me, they work completely different now. So they actually restructured engineering org so that, like, they get more senior engineers to be more into peer review.

Starting point is 00:50:05 Because they've like to get like sort of writing guidelines on what is the good engineering practices. What is the process would be like? Or they'd be like, okay, so they've write like a lot of like processes on how to work well. And then they have more junior engineers just produce code and like some APR,

Starting point is 00:50:25 but senior engineer more in the reviewing case. So I think it might be prepared for the future. So another company actually told me something very similar. So that kind of paper in the future when they only need a very small group of very, very strong engineers to like create like processes and like reviewing code to get into production, but I get like AI or like junior engineers,

Starting point is 00:50:47 should I produce code. But then the question becomes just like, how does one become a very strong? Right, that's right. That's right. I feel like, yeah. Yeah, so I don't know what's the process. We was thinking about like, yeah.

Starting point is 00:51:01 No one's thinking about it. It's just, it's a problem. We won't have anymore in 10, 20 years. There will be no more engineers because no one's hiring Jude engineers. Although I could make the case, junior engineers, people just getting into computer science science right now are just native,

Starting point is 00:51:14 AI native. And in theory, you could argue they will become really good, really fast. If they're curious, aren't just delegating learning and thinking to AI, but learning how to actually using it to learn how to code well and architect correctly, like you could argue they will be the most successful engineers in the future. I do think that what I mentioned is that load into architect. I think I grouped that in like system thinking. I do think it's a very important skill because I think AI can help automate a lot of like destroyed skills. But like knowing how to utilize these skills together to solve a problem is very, it's hard. So there's a webinar between Miran Sami was my favorite professors.

Starting point is 00:52:01 He was a chair of the curriculum as a CS department at Stanford. So he spent a lot of time thinking about CS educations, right? like what should students learn nowadays in the era of like AI coding? And then the other person is like Andrewung, which is, of course, is like a legend in the AI space. And Neera-Arabri present like Sami, such a thing very interesting. It's like he said like a lot more things that CS is about coding, but it's not.

Starting point is 00:52:24 Like coding is just a means to an end. Like CS is about system thinking, like using like coding to zone actual problem. And problem is something will never go away because like what like AI can automate more stuff, the problem is just get big. But as a process of understanding what costs the issue and how to design step-by-step solution to it, will always be there.

Starting point is 00:52:48 So I think an example of, I actually have a lot of issues with AI for like in the way of like it's debugging. So I'm not sure you use a lot of AI for coding, but like something I've noticed and also seen for my friends, it's like it is pretty good when you have very clear well-defined task, maybe write documentation, fix its specific features. or like build an app from scratch, right? Like, it doesn't have to interact with a large existing code base. But you added something like a little bit more complicated.

Starting point is 00:53:16 Maybe it would be quite interesting with a lot of components and stuff. It's usually like not that good. And for example, like it was using AI to like use to deploy applications. And it was testing out a new posting service I was not familiar with. It was like, okay, like usually they form me. So what the AI does give me is like confidence to try new tool. Like before what AI is like sharing new tools, his route, not documentation, for the beginning, but I was like, okay, just try it out and learn.

Starting point is 00:53:42 So I was testing as a new hosting service, and it kept getting a buck that was like very, very annoying. And it was like, okay, I asked a card code, like fix it. And it kept keeping, it kept changing the way, like maybe change the environment variable, fix the code, maybe change from the function, choose this function, maybe change the language, maybe it doesn't process JavaScript, well, I don't know, whatever. And it didn't work. And it was like, okay, that's it. I'm also going to read documentation myself and see what's wrong. And it turns out just like I'm on another tier. Like the fish that I want did not, is not available in this tier, right?

Starting point is 00:54:19 So I feel like, okay, so the issue with Clark was just trying to focus on fixing things from a very different component versus the issue is from a different component. So I think I think of like, okay, be understanding how different components work together and where the source of the issue might come from. You need to give a holistic view of it. And it's made me thing it's like, okay, how do we teach AI, like system thinking? Like that, right?

Starting point is 00:54:41 I think I have all the human experts, like having, like, right, like very much, people going to scaffold. It's just like, okay, for this kind of problem, look into this, look into that, look into that, and then stuff. So I think that could be one way. But that's what made me think is, like, how do we teach humans, like system thinking?

Starting point is 00:54:58 Yeah. So, yeah, so I think it's very interesting skill. I do think it's very important. That's exactly the same insight, Brett Taylor shared on the podcast. He's the co-founder Sierra. He created Google Maps. He was CEO of Salesforce, quip, a few other things. And I asked him just like, should people learn to code? And his point is exactly what you said, which is learning, taking computer science classes is not about learning Java and Python. It's learning how systems work and how code operates and how software works broadly, not just here's like a function to do a thing.

Starting point is 00:55:31 One thing that I wanted to help people understand, you're with this book, called AI engineering, which is essentially helping people understand this new genre of engineer. And you have this really simple way of thinking about the difference between an ML engineer and an AI engineer, which has a really good corollary to product managers now of just like an AI product manager versus a non-AI product manager. The way you describe it and fill in what I'm missing is just ML engineers built models themselves. AI engineers use existing models to build products. Anything you want to add there? One thing I really dislike about writing books is that they have to defy like this.

Starting point is 00:56:11 And I think it's like no definitions to be perfect because they always be like edge cases. But yeah, in general, I think it's like AI as a service, like models of service, like when somebody builds the models for you and the base model performances are pretty strong. So it's like it's enable people to just like, okay, now I want to integrate AI into my product. I don't need to learn what grade and design is. even though knowing that would really help. But yeah, it's like it makes an entry barrier really low for people who want to use AI to build correct.

Starting point is 00:56:42 And at the same time, AI capabilities are like so strong. It's like it's also like increased like the possibilities, like the type applications that AI can be used for. So I think like, yeah, so it both entry barriers like super low and like the demand for like AI applications like a lot bigger. So it feels it's very, very exciting. It opens up like a whole new ball of possibilities. Yeah, it's like, now you don't have the time.

Starting point is 00:57:06 I don't even spend time building this AI brain. Now you can just use it to do stuff. Such an unlock. Okay, maybe just a final question. You get to see a lot of what's working, what's not working, where things are heading. I'm curious just if you had to think about in the next two or three years, just where things are heading. What do you think? How do you think building products will be different?

Starting point is 00:57:30 How do you think companies working will be different if you had to think of? maybe the biggest change we expect to see in the next few years in terms of how companies work. I think in a lot of organizations, they don't move that fast, right? But at the same time, they also move faster than I expected. Because, again, I think it's like bias, like, and don't work with a dinosaur company. It's a dot care. I think a lot of executives who come to me are, like, very forward-looking. So maybe for me, I'm very biased.

Starting point is 00:58:01 The world like organizations is, like, move fast. So, so yeah, so I think one big change I see is just like in organizational structure. I think it's like a lot of value plays in like, so before, right, we have like a lot of destroyed team. Like we have very clear like engineering team, product team. But then the question of like who should write EVA, right? Like who should own the metrics? And it turns out it's like EVA is not a, it's not a separate problem, it's a system problem, right?

Starting point is 00:58:31 Because you need to look into different components. components, how they interest each other, you need to use the behaviors, because you need to know what users care about so that you can, so that you can, like, write Eval, because it's, like, reflect what users care about. So, so on of that, like, you can sort it from, like, in looking to different component architectures, place guard rails and stuff. So it's just engineering, but understanding users is, like, what product, right? So, so because of, like, a lot of things, and Eval, it's extremely important.

Starting point is 00:58:56 So, like, the kind of bring product team and, like, engineering team, even, like, marketing team, like user acquisition, like very close each other. So, so, yeah, since, you know, ways, like people are structuring so there's more communications between, like, previously, very distinct functions. Another thing is, like, I also see as teams, of course, like, think about, like, what can be automated in the next few years and what cannot be automated. And I see that people already, like, shedding, like, actually is a little bit, like, scary to think about it, but I also think it's like, the team, people have told me.

Starting point is 00:59:29 It's like, okay, this is a good and you and me, but we're, we like covered of these functions, right? Like for a lot of things like previously outsource, for example, like traditionally is a business outsourcing this core to them and like can be done with like not, can be more systemized. So with that, you can actually like use AI, actually automate a lot of that. And also like the separation, people think of more of like what is the value of like junior engineers or senior engineers, how to restructure engineering org for that.

Starting point is 00:59:59 So yeah, so I do different things that. But it's one thing to success organization, people are just moving pieces around and like thinking about like use cases, whether you're going to like spin out new use cases and who would lead the new effort. And like, yeah, that is one big change. Another thing in terms of like AI, I think there's, I'm not sure how true this is. I guess I'm also like on the camp of like thinking that it has merit is, it's a camp of like, okay, base models, we have probably, like, not quite max out, but we want, we are

Starting point is 01:00:38 unlikely to see, like really, really strong, like crazily strong model. So, like, you remember like when we have like GPD, right? And the GPD2, which is a big step up, like, an order of magnitude, like, like, better than like GPD. And then GPD3, which like much much bigger. And GPD four, much, much, much bigger. And I, of course, I'm here, GPD5, but like, is GPD5, like that scale of like much bigger like a step jump compared to like the previous I think it's a debatable right so so I think it's like we had disappointment like the base model performance improvement is not going to be like my blowing and it was in the last three years so so I think it's like a lot of like improvements we're going to see in the post training phase in the application

Starting point is 01:01:24 building phase. And, yeah, so I think that's where I feel, I would see a lot of improvement there. I also very, like, interest in, like, mounting modality. So we've seen a lot of text-based, but I think there's a lot of audio, videos, use cases. That is very, very exciting. And I think audio is not quite as song as well. I think because I do work with, like, with, like, a couple of, like, voice startups. And when you talk to think about voice, it's an entirely different list.

Starting point is 01:01:57 So let's say you have chatbot, right? We go from a text chatbot to voice chatbot. It's like the concerns of completely different. Because now with voice chatbot, right, we need to think about like latency. Because I like multiple steps first like have like voice to text, text to text and text question and text answer and then text to voice answer. So we have like manageable hops. And like latency become very important.

Starting point is 01:02:20 And there's a question like what does it. make to sound natural. So for example, like people think of like, in AI and humans, when humans touch each other, like if I say, if I say, you try to interrupt me and say, um, Chip, that's right, I would like pause

Starting point is 01:02:36 and I try to hear you out, right? But sometimes I just, I just say, say some word, like, acknowledge when I'm like, mm-hmm, mm-hmm, then I shouldn't stop, I just continue. So the question of like, for interruption, like, whether it's like, I should, should I stop or not?

Starting point is 01:02:51 Like, it's a big in what perceived as like natural conversations. And that's also regulations, right? Because, like, a lot of time people want to build AI chatbot, voice chatbot as style like humans, try to, like, trick users into thinking they're talking to humans. But also, right, maybe potential regulation saying, like, okay, you have to disclose to users when you talk, if the bot is talking to is human or AI.

Starting point is 01:03:17 So I think that's like, just a whole space. I think it's not quite as soul. as you think, is it, but it's all not quite like an AI foundation model problem, right? Because like a human interruption detection is actually a classic commotioning problem. Like you, it's a different framing that, like, you can be classifier for that. Or like the question of like, let us see, it's actually have a massive engineering challenge, not an AI challenge. Of course, they can be an AI challenge because people are trying to build a voice-to-voice model.

Starting point is 01:03:47 So instead of having like having to first transcribe the voice, from me into text and then get a model to generate text answer and get another model to like turn from text to speak. You can just like to voice your voice directly. So that is something called working on but it's like very hard. Yeah. So yeah, so like even audio, I think of it,

Starting point is 01:04:06 it's like the easier than video, right? Because you do have like both image and voice. It's already like pretty hard. So I think it's a lot of challenges in that space. That was an awesome list of things. Let me mirror back real quick. So what you're predicting in the next few years, things that will change in the way we work.

Starting point is 01:04:23 And these actually resonate with so many conversations I've had on this podcast. So it says just kind of doubling down on where things are heading. One is the blurring of lines between different functions instead of just like design engineering. Everyone's going to be doing a lot of different things now. Two is just more of work being automated with agents and all these AI tools and just in theory, productivity going up. Third is shifting from pre-training models to post-training. fine tuning and things like that because, to your point, models maybe are slowing down

Starting point is 01:04:56 and how smart they're getting. Although, I'll point folks to the chat with the co-founder of Anthropic. He made a really good point here. He's like, we're really bad at understanding what exponentials feel like. We're in the middle of that. And also, models are being released more often. So the difference between them, we may not notice because they're just happening more often versus GPT3 came out like a year. I don't know, after JPT2. So maybe true, maybe not. And then the fourth point you made is this idea of multimodal investing in multimodal experiences. I cannot wait for chat GPT voice mode to get better at interruption, like exactly what you're saying. I'm just like talking to it. And so it makes a little sound. It's like,

Starting point is 01:05:32 okay. And then you have to and then it's like, and then it's like, and then it's like, and then it's like, and then it's like, and then it's like, and then it's like, and then it's like, and then I don't have better voice assistant at home yet. I think I have been testing out a bunch. Almost like, I keep hoping. Oh my God. Zach would be the one. And then I don't know how many of them I just like had to get a boy because they're not that good. I think it's coming. I hear it's coming. Anthropics working with someone that I don't know if it's a launcherer not yet. Yeah. I'm sorry, I want to bring back to what you mentioned about like the, as your guest, like from Anthropic, mentioned about the performance improvement. I think there's a big change.

Starting point is 01:06:03 I think like this difference between a model-based capability. So I'm talking about like the pre-trained model, right, versus the perceived performance. So let's say it's like, um, I'm ashamed thought about like, are you familiar with the chairman of test time compute? I don't think so. Yeah. So the idea is like, okay, like you have a fixed amount of compute, right? So you're going to spend a lot of compute on pre-shuting or training the model. Pre-training.

Starting point is 01:06:31 And then I've spent a lot of some compute on like five-trutely. And the ratio like pre-truiting and a post-sharing compute is like crazy, very different, even different map. And also like since then it has to spend compute on like jering inference. When I have a trend and five-tung the model and now you want to like serve it to users. So I might type of questions or prom and it's like Jared like do inference like and Z requires a compute. And like you say people about discussion of like should I spend more compute on like pre-tuning

Starting point is 01:07:00 or fine tuning or inference, right? Because like inference and people found I was like test time compute. So like spending more compute on inference is like called like test time like compute like as a strategy of like just allocating more resources, compute resource to Jared inference. When I shouldn't bring better performance and how. What does that do it? Let's say you have a math question, right? And maybe instead of just jerry one answer, again, just like four different answers and

Starting point is 01:07:27 say, okay, whichever is the best according to some standard. Or like, okay, it has four answers and then maybe like three of them say four eight two and one of them says like 20, okay, three of them in agreement. So the answer should be four eight two. Right. So like just people shouldn't generate a bunch of it. Or another thing is like a lot of time like reasoning, thinking it's just like people should like jerry it more thinking tokens.

Starting point is 01:07:49 spend more time thinking before showing the final answers. It's like require more compute, but it's like give me more, more and more better performance. So, yeah, so I think it's like from the user perspective, right? Like when the model spend more time exploring different potential answers, thinking longer, it can give you much better final answers. But the base model itself does not change.

Starting point is 01:08:15 Does it make sense? Yes, that does. Absolutely. Yeah. That is a good correlation. Larry to Ben Mann's point. Yeah. Chip, we covered a lot of ground.

Starting point is 01:08:25 I've gone through everything I was hoping to learn and more. Before we get to a very exciting lightning round, is there anything else that you wanted to share, anything else you want to leave listeners with? So I do work at a few companies that does these things of, like, they want employees to, like, come up with ideas. So there's a big debate on, like, what is a better way for your strategy, right?

Starting point is 01:08:43 Should it be topped out or, like, bottom up? Right. Should, like, executive come up with, like, one or not true, like, kill a youth, case and like everyone like allocate resources to that or like should you give engineers and PMs and smart people like come up with ideas and it makes sense to make sure of both so some companies it was like okay we hire a bunch of smart people like let's see like what they come up with and they organize like more than hackathons or like internal challenge to get people to build product and one thing that um I noticed it's like a lot of people just like don't know

Starting point is 01:09:14 what you built and it shocked me like why I feel like we are in some kind of like an idea a crisis, right? Now, we have all this really cool tools to have you, like, do everything from scratch. I can have you, like, design, it can have your, like, write code, you can have your website. So in theory, we should see a lot more. But at the same time, people are, like, somehow stuck, like, they don't know what to build. And I think it's like, maybe it's a lot of had to do with, like, maybe, like, society expectations. Because, like, we have gone through, we have gone into this phase of, like, specializations, like, people, like, very highly specialized and people are supposed to do, like, focus on one thing really well, instead of

Starting point is 01:09:53 having a big picture. And we don't have a big picture of you. It's hard to come up with, like, ideas of what you build. So, so I know what, like, when I work with this company, I just hackathon, like, we do work out, like, how to come up with a guide eye, like, how to come up with ideas. And usually what we think of is, like, okay, like, one tip is, like, go look from the last week, right? Like, for a week, just, like, pay attention to what you do, and what frustrate you? And what's something frustrated you think, but, like, is there, anything we can do? Is there like, can be that a different way?

Starting point is 01:10:22 So it's not frustrating. And I can talk, like, people can swap or accept sub-nobes or teams. And if you see they come on frustrations, maybe just something you can think about it, just to build something around that. So yeah, so I feel like just like notice like how we work,

Starting point is 01:10:35 thinking of like we constantly ask questions, like how can it be better? And then I just build something to like address the frustrations. I think it's a good way. It's just like world and adopt AI. I think people have felt exactly what you're describing every time they open up one of these vibe coding tools,

Starting point is 01:10:51 whether you could just describe anything you want. I'm like, I don't know, what do I want? And I love this very tactical piece of advice, just like what frustrates you, just pay attention to where you're frustrated. For example, I just built a very cool little vibe coded app. I was working on a newsletter post inside Google Docs. And I pasted all these images into the Google Doc from screenshots and stuff.

Starting point is 01:11:12 And then I forgot, oh, yeah, you can't take images out of Google Docs. It's like this Hotel California experience where you can paste stuff. into it, very hard to get images back out. So I just went to all the vibe coded tools and just build an app that I can give you a Google doc URL and let me download all the images automatically. And it worked amazingly well. And I made it really cute and I'll link to it in the show notes. Oh, I'm very bullish on using AI just create like micro tools. It's just something that's like make your life a bit easier. And 100%. I feel like that's one of the main ways people are using these tools, just like

Starting point is 01:11:44 a little niche problem they have. With that, We've reached our very exciting lightning round. I've got five questions for you. Are you ready? Yeah, always. I don't know. It depends on how the questions or questions. They're very consistent across every guest.

Starting point is 01:12:01 So I imagine you've heard them before. First question, what are two or three books that you find yourself recommending most to other people? I'm really terrified of like book recommendations because I feel like what books or people should read really depends on what they want and where they're in life and I always want to get you. But I just several books that I do think is like she really changed the way I think to see the world. So one thing is a selfish gene. It's like to understand. It actually changed.

Starting point is 01:12:29 It actually helped me with a question like whether I want to have kids or not. Because it's like understanding more of like, yeah, a lot of our functions of where we operate is the functions of our genes. And genes want you to do one thing, it's like to procreate. So yes, in a little way. But it's like, the book also proposed another thing. It's like, so everyone wants to live forever, right? And maybe it's not like consciously, but subconsciously.

Starting point is 01:12:55 We do want that. And I say two ways, like, one is like by jeans. Like jeans, one is just like, like, want to continue forever. But so there are two ideas. I think there's something going to mean. It's like being able to have some ideas out there. And then it's like the last for a long time. There's the own way to live on.

Starting point is 01:13:11 I know it's like, it's a little bit like abstract, but it was very interesting. The other books I really, really like is like from like, it's a book from Singapore and previous I think he's as a father of Singapore, I know. Like, Lin-Di-Gwang-Yo, I'm not really what's the title it, but like he was the one who led Singapore

Starting point is 01:13:30 from, he changed Singapore from a third world country to a fourth-book country within 25 years. And I have never seen any country leaders that spent so much effort into like putting down his thought on like how to build a country. like that. Yeah, and I say talk a lot of, like, public policy,

Starting point is 01:13:49 like how to, like, create policies of encourage people to do the right things. That's good for the nations. And I'm so talking about, like, foreign affairs, foreign policies, like the liberation of, like, the country, but other. So it's a really good book to think about, for me, it's like system thinking.

Starting point is 01:14:05 But, like, it's a different kind of system, which is a country, which a lot of us don't get a chance to, like, ever experiment in our life. So it's good to learn about that. What was the name of that second book? It's called from third to first world flash. I think we have it somewhere here.

Starting point is 01:14:19 Yeah. There it is. Show and tell. That's awesome. I definitely want to read that. That's a really good tip. I've heard a lot about just the impact he's had, and I've seen all these videos on Twitter

Starting point is 01:14:28 of just his really wise insights into how to build a thriving society. And clearly it worked. How does he have time to write this such a thick book? It's like insane. That is. Claude, please summarize. I'm just joking. By the way, selfish teen,

Starting point is 01:14:44 also absolutely love that book. That is such a good choice. It's such an under the radar kind of book that really changed the way I see the world as well. So really good pick. Okay, next question. Do you have a favorite recent movie or TV show? You really enjoyed it? So I watch a lot of movie and TV shows as research because I'm working on my first novel and I recently sold it. So I'm interesting like what makes it. It's a drama. It's not a science fiction or anything that like take people usually read. So it's very like I know it's a very, out of the left field and like very um so it's almost like reading watching tv to see like what kind of stories become popular trying to understand the trope and stuff like that so i'm not sure

Starting point is 01:15:26 if the audience are like well what's one what's one that taught you something about writing i think like umci palais is a chinese tv show cool okay haven't heard that went on the podcast before okay cool next question do you have a life model that you often think about, come back to when you're dealing with something hard, whether it's in work or in life? This sounds very nihilist. I think to say it's like in the end,

Starting point is 01:15:54 nothing really matters. Usually think of like in the grand scale or thing like in a billion years, nothing will, like no one will never be there. I think, okay, someone will argue with me about that. So I go to think like, so my theory is like in a billion years. Like none of us will never exist.

Starting point is 01:16:09 So like whatever like messy things, like crazy things we do or like how bad. that we do it. I mean, no one wouldn't be there should remember it. And I think in a way, it's like, it sounds scary, but it's very liberating because this allows me, so, okay, let's just try things out, right? Like, why does it matter? And then it's a story of, like, recently, so we have a family member who passed away recently. And I was talking to my dad because I couldn't be home for that. I was asking me, dad, like, okay, say anything I can do to make a person like, something like comfort

Starting point is 01:16:42 so that anything you can get that person and my dad was just like what can he possibly want at this moment like it's made me real like at the end of life like there's nothing that can bring you like mature can bring your choice no like money no product

Starting point is 01:16:56 nothing and in a way it's being feeling like what really do I really care about at the end of the day so I guess it's like I think about it it's like okay maybe I fail it maybe I don't get that contract maybe do things like in the past at the end of life I don't think that actually really matters. So in a ways, it's like it's kind of liberating.

Starting point is 01:17:15 I know you said it might be nihilistic. This is what Steve Jobs shared too, and one of his most famous speeches is just, we will all die someday, so don't take things so seriously. And it is freeing, absolutely. It just makes you appreciate it every moment, every day you have, just like, yeah, let's just do something hard and scary. Okay, final question.

Starting point is 01:17:33 You talked about how you're writing a novel. Most people in tech have never written something creative and fiction. What's just like one thing you learned in the process about how to write better story is better fiction? A lot of time when we read, we get tripped up by some small things. So I think I want to do creative writing because I just want to go a better writer. And it tells us like maybe try my, like a different audience could have me like become better like anticipating what this different type of audience would want you hear and like what they care about. So it's the way from me to get up. So I think about writing or like even like any

Starting point is 01:18:09 kind of like content creations is about like predicting the user's reactions, right? The next token. Just kidding. Yeah. So like you do a podcast, it's like, okay, what kind of things that the users could find engaging, right? And I find it's like a little bit like a lot of companies like you have like launch a product. You have a narrative coming out. So okay, what kind, how do we position this product in a ways of like users want, right?

Starting point is 01:18:33 So I feel like I have done technical writing for a while and I felt like I have had some experience like trying to predict what engineers would want you hear, all I care about. But then I don't have an experience like this completely different type of audience. So that's what I want to do, like, create writing, writing a story. And that's why I was doing a lot of research. I'm like, I mean, going to research, I actually enjoy a lot, like, watching a lot of dramas.

Starting point is 01:18:57 I just see, like, what people are like. So one thing that I care about is just like, I think a lot is like for like emotional journey. It was from an editor, right? So, like, when we write something like we care about, like, how users would feel, like, across the story. Like, we want something in the beginning, right? We want something just, like, we need to have a hook so that people continue reading.

Starting point is 01:19:18 But we also don't want too much of, like, drama because we'll get, like, too tired, right? Like, because, like, the emotion is exhausted, like, because it's like you being, like, emotionally manipulated, like, a lot of time. So we care about, like, emotional, emotional journey. Maybe we have, like, some climax or, like, something more chill, like, maybe, like, and so care about another thing that I did. realize like for me for technical writing you entirely focus on the content like the argument it's very impersonal right like it like for example like people like email compilers like doesn't

Starting point is 01:19:49 matter if they like the person telling them about compiler or not right because it's just like objective like but like for and for novel people care about like character likeability so so like in the first version is my story and makes the character like a little bit more like very very logical, very rational, and just does everything just like very rationally. And then the feedback I got is, I have a very good friend read it. And he was, he's an amazing person. He's a great person. And he was like, cheap, I'll be honest.

Starting point is 01:20:17 I hate that person. So it doesn't matter as a story. It's just like the person is so unlikable. He does a bunch of crudity. So in the second version, I makes that person the character more likable. Like how she makes that character more likable is that you put in some vulnerability. Like, sometimes that, oh, maybe it's a person like, have setback because sometimes we can relate to it. So in a lot of ways, it's very interesting.

Starting point is 01:20:38 It's like a lot of it is like, yeah, a lot of it is about like understands the emotional bit, like how the users feel, not just about the story, but also about the characters. That is so interesting. Wow. I learned a lot more there than I thought. That was awesome. Really good example. Chip, two final questions.

Starting point is 01:20:57 Where can folks find you online if they want to reach out and maybe work with you or maybe even just share the stuff that you offer if folks want to reach out? And then how can listeners be useful to you? I'm like, I'm not social media, LinkedIn, Twitter. I don't post a lot, but I keep telling myself that I should do more because I kind of like the composition with readers. So I'm actually about to start a sub-spec. So I have like a placeholder for a suspect right now,

Starting point is 01:21:26 and I'm thinking of doing it for more system thinking because I think it's a very interesting skill. And so like thinking of doing a YouTube channel on book review, So basically books that help you think better. So I think the first book I'm going to review is more like this book because it's like my favorite book growing up and I have been like keep on reading it. So yeah, so how can it be helpful?

Starting point is 01:21:47 Like send me books that you like, books that help you have changed the way you think or change the way you do anything. So I would appreciate it. Amazing. I'm excited to read that book. Chip, thank you so much for being here. Thank you so much, Lenny, for having me. Bye, everyone.

Starting point is 01:22:10 Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at Lenny'spodcast.com. See you in the next episode.

Lenny's Podcast: Product | Career | Growth - Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.