The Pragmatic Engineer - AI Engineering with Chip Huyen

Episode Date: February 5, 2025

Supported by Our Partners• Swarmia — The engineering intelligence platform for modern software organizations.• Graphite — The AI developer productivity platform. • Vanta — Automate compli...ance and simplify security with Vanta.—On today’s episode of The Pragmatic Engineer, I’m joined by Chip Huyen, a computer scientist, author of the freshly published O’Reilly book AI Engineering, and an expert in applied machine learning. Chip has worked as a researcher at Netflix, was a core developer at NVIDIA (building NeMo, NVIDIA’s GenAI framework), and co-founded Claypot AI. She also taught Machine Learning at Stanford University.In this conversation, we dive into the evolving field of AI Engineering and explore key insights from Chip’s book, including:• How AI Engineering differs from Machine Learning Engineering • Why fine-tuning is usually not a tactic you’ll want (or need) to use• The spectrum of solutions to customer support problems – some not even involving AI!• The challenges of LLM evals (evaluations)• Why project-based learning is valuable—but even better when paired with structured learning• Exciting potential use cases for AI in education and entertainment• And more!—Timestamps(00:00) Intro (01:31) A quick overview of AI Engineering(05:00) How Chip ensured her book stays current amidst the rapid advancements in AI(09:50) A definition of AI Engineering and how it differs from Machine Learning Engineering (16:30) Simple first steps in building AI applications(22:53) An explanation of BM25 (retrieval system) (23:43) The problems associated with fine-tuning (27:55) Simple customer support solutions for rolling out AI thoughtfully (33:44) Chip’s thoughts on staying focused on the problem (35:19) The challenge in evaluating AI systems(38:18) Use cases in evaluating AI (41:24) The importance of prioritizing users’ needs and experience (46:24) Common mistakes made with Gen AI(52:12) A case for systematic problem solving (53:13) Project-based learning vs. structured learning(58:32) Why AI is not the end of engineering(1:03:11) How AI is helping education and the future use cases we might see(1:07:13) Rapid fire round—The Pragmatic Engineer deepdives relevant for this episode:• Applied AI Software Engineering: RAG https://newsletter.pragmaticengineer.com/p/rag • How do AI software engineering agents work? https://newsletter.pragmaticengineer.com/p/ai-coding-agents • AI Tooling for Software Engineers in 2024: Reality Check https://newsletter.pragmaticengineer.com/p/ai-tooling-2024 • IDEs with GenAI features that Software Engineers love https://newsletter.pragmaticengineer.com/p/ide-that-software-engineers-love—See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠—Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Transcript
Discussion (0)
Starting point is 00:00:00 How would you define AI engineer or AI engineering? Yeah, so before, when you wanted to build a machine link applications, you need to build their own models. So that means that you need our own data and you need expertise, how to train a babysit a model. However, nowadays, if you want to build an application leveraging machine link or AI, you can just like send a direct API call and access to this wonderful capability. So that's like really, really lowers the entry barrier to people.
Starting point is 00:00:24 Like you don't need data anymore. You don't need a fancy AI degree anymore. It's a shift of four. from less machine learning and more engineering and more product. Chip Huyen is a computer scientist and writer and author of the book, AI Engineering. This book is currently the most red title on the O'Reilly platform. Previously, Chip was a research at Netflix,
Starting point is 00:00:44 a core developer of Nemo, Nvidia's Gen. Gen. Snarkle AI and founded and sold an AI startup called ClayBud AI. She taught machine learning system design at Stanford, and her current book is the second one on ML& AI Engineering. It's safe to say she's one of the most read ML Engineering and AI engineering experts in the world. In our conversation today, we cover what is AI engineering and why does it feel a lot more full-stack than ML engineering did? What are typical steps to build an AI application from choosing a model through using RAG all the way to fine-tuning?
Starting point is 00:01:17 What are practical ways for software engineers to get started building AI applications? And a lot more on this very timely topic. If you enjoy the show, please subscribe to the podcast on any platform and on YouTube. Thank you. This greatly helps the show get to even more listeners and viewers. Chip, welcome to the podcast. Hey, hi, I'm Chip. I'm very excited to be here. So I've been following your substack for a while. So it's like really, I was really looking forward for the chat, to the chat.
Starting point is 00:01:44 So first of all, I really wanted to congratulate you on this book. I've started to read the book. So I've not read the whole thing. I have started with some chapters and I went deeper and others. And what I found is when I looked at a table of contents, I was like, well, you know, this, this looks okay. like in terms of it'll span a broad sense because it goes from how do you understand foundation models, how do you evaluate them, what about what is prompt engineering, and you go into things like a fine-tune rack, fine-tuning dataset engineering.
Starting point is 00:02:13 But then on each of those sections, it just starts to get like, you know, initially there's an introduction, but it starts to then go deeper. So for example, for evaluation mythology, like here, I'm just looking at a table of contents, but I start to read this. It's like, well, we know it's important to evaluate AI models. We know it's harder to do. But then you go into things like AI as a judge or ranking models with comparative evaluation and challenges of it. And then it goes, that's where some parts in my sense is that I had to slow down.
Starting point is 00:02:42 I had to like look things up. So it does go really deep into a lot of these sections, which I found very refreshing that it's got a mix of breath, but it also goes deep. So this is definitely not a fast read for me, but it's one of those things where I'm just going to keep coming back to it. Thank you. It was not a fast write either. It took quite a while and a lot of references. I think I'm so published. So I think in the book I cited about over a thousand references.
Starting point is 00:03:11 So like I actually read even more like papers. I've gone into a lot of good basis. I have a tracking of like a thousand's like repose or like at least like now 800 stars now. GitHub stores. So eventually those codebases and a lot of blog posts, other books from the 80s, from the 70s, 90s, you understand AI. So I also publishes like at least about like approximately 100 links, reference links. I felt like really, really useful for me in the process of writing the book. So if you just want to like look at those references on the own, like it's on my GitHub. Yeah. And I was surprised with some of the really original research. Like it's not just like,
Starting point is 00:03:51 oh, here's, I wrote about these papers or here was what I read. But as you mentioned, the 1,000 repos, you actually have a paragraph or a section about how many, how did GitHub repositories change over time, the ones that were about infrastructure, AI, application level, other things. And you actually have more than 900 repos mapped out. I never saw anything like that. And clearly that was you, you know, like kind of slicing and dicing and doing your own kind of research.
Starting point is 00:04:19 Yeah, I feel like I have this thing. Like, I do a lot of manual labor. I do think, I get a lot of value out of doing things and non-optimally. I feel like a lot of focus on like, yeah, what is the quickest way to do it? What is the finest way to do it? But sometimes if you're willing to put into effort into things that, like, a lot of people are not willing to, I feel like you can get some guys inside that other people like don't get. This episode is brought to you by Swarmia, the engineering intelligence platform
Starting point is 00:04:44 from modern software organizations. Swarmia gives everyone in your organization the visibility and tools they need to get better and getting better. Engineering leaders use Swarmia to balance the investment between different types of work, stay on top of cross-team initiatives, and automate the creation of cost capitalization reports. Enduring managers and team leads get access to a powerful combination of research-backed engineering metrics and developer experience surveys to identify and eliminate process bottlenecks. Software engineers speed up their daily workflows of Swarmia's two-way slack notifications,
Starting point is 00:05:16 working agreements, and team-focused insights. You can learn more about how some of the work. world's best software organizations, including Miro, Docker, and WebFlow, use Swarmia to bit better software faster at Swarmia.com slash pragmatic. That is S-W-A-R-M-I-A-A-com slash pragmatic. This episode is brought to you by Graphite, the developer productivity platform that helps developers create, review, and merge smaller code changes, stay unblocked, and ship faster. Code review is a huge time sync for engineering teams. Most developers spend about a day per week or more reviewing code or blocked waiting for a review.
Starting point is 00:05:53 It doesn't have to be this way. Graphite brings stack pull requests, the workflow at the heart of the best in class internal code review tools at companies like meta and Google, to every software company on GitHub. Graphite also leverages high signal code-based aware AI to give developers immediate actionable feedback on their poll requests, allowing teams to cut down on review cycles. Tens of thousands of developers at top companies like Asana, Ramp, Tecton, and Versel, rely on graphite every day. Start stacking with graphite today for free and reduce your time to merge
Starting point is 00:06:25 from days to hours. Get started at gt.gathe.com slash pragmatic. That is G4 Graphite, T4technology.com So one thing that is a little interesting about this book, it is about AI engineering. And this field moves so quickly, you know, like just in a week. We've now had a new model come out, for example, deep seek that people are talking about in a few weeks. How did you, write this book, how are you able to write a book about such a fast-moving industry so about the time it's released, which was clearly a few months after you finish it, it will still be relevant? That is a great question. So when I started reading the book, I was thinking the same.
Starting point is 00:07:03 I was like, is now the right time to write it because, like, there's so many things are still changing. But then I start, like, when chat really came out, like a lot of people, I had this, like, existential crisis. I was in a group chat and I was like, oh, no, what does it mean for us, like, engineer? I feel like there are two things that I usually identify with, like one is being an engineer and the other is being a writer. And guess what you use cases that AI is really good at, like, writing code and writing, right? So I was like, oh, shoot, what does it mean? So I started to interview a lot of people.
Starting point is 00:07:36 I started reading so much. And I talked to a tongue of people. And I started making a lot of note. And as a process, what I realized is that a lot of those things that seem new, A lot of the fundamentals have been there for a while. So, so, so, for example, like, language modeling is not a new task. Like, Claus Shannon, like, introduced that back in the 1950s. Or, like, as a time we'll talk about rack, right?
Starting point is 00:08:03 Like, rack is actually not, not new. It's based on retrieval. Rage is in retrieval augmented generation, right? Yeah. And retrieval is, like, a very old technology. Like, it's powering, like, it's already powering a lot of, like, use cases on the internet. like search or recommended systems and a better databases have been around for a while and vector search is very like have so many cool algorithms already.
Starting point is 00:08:25 So I thought it's like, okay, first I see another thing that I knew. And the second is that like I try to focus on like when I try to focus on like asking the question like okay, so sometimes it was like oh, there's a problem with this, there's a solution to this problem. And I asked a question of whether this is this a due to fundamental limitations of a, AI or it's just due to the temporary capabilities of AI. And I try to see like, okay, if it's due to something of like, of the recent current capabilities, how fast is that could be changing, right?
Starting point is 00:08:59 So, for example, in the early days, a lot of people have shared a lot of like, prom tips. Like, for example, you can, you try to price and models, right? Like, hey, say, like, if you answer this correctly, I'm going to give you like $200. And like we talked about prompt robustness, like, how. how robust is a model to like prompt perturbations. And then I was reading about it, and I found it's like, actually, models is getting more and more robust to prompts.
Starting point is 00:09:27 For example, like we saw of like from the GPD3.5 compared to GPD3. It's already like so much more robust. Like that means that's a small changes to prompt actually reduce to a lot less variations in model performance. So like this kind of thing I felt like, hmm, it's probably not going to like stick around. So so it's just kind of tips I'm not going to be very relevant. So yeah. So already when like as people were still like this was at the height of people are saying there might be a job as a prompt engineer.
Starting point is 00:09:57 So you already saw this is likely trending down. Do I understand it correctly? So yeah. So I do things that's like just try. So writing I think it's like a look score is like making a bet. Like when you write a topic, you try to bet on whether it's going to stay relevant in the future. So I think it's the people have been trying to, like, looking at, like, progress and trying to, like, see what is going to be in, like, one or two years from now on.
Starting point is 00:10:22 So, for so, another example is, like, context length. So as a point, it was like, okay, we want long context length. But then I kept seeing people who are going really, really fast, right? From my, what, like, 8,000 context length to, like, 128, like, in a few months, it's, like, super, super fast. So it was like, okay, maybe the question is, like, less about, like, context length, but, like, context efficiency. because it's like, can a model use a context efficiency, like, really, really well.
Starting point is 00:10:47 So, like, there are certain, so, so, like, those are, like, cut the bed, and I do think that there are certain changes during the process of writing the books that made me feel more confident in the best I made. Or, like, another thing, like, multi-modelality, I do think it's, like, when I wrote about mantimodel back in 23, people told me that's, like, you're too early. Everyone was still working on language now. We're not there yet. But it was just like, yeah, it's just like inevitable, right?
Starting point is 00:11:16 Like I think like we learn how to work with language. But like I do want to do a lot of stuff with more than just language. And I think it's just like, and nowadays it's just everywhere, like almost own models. Nowadays I like multi-model. So the title of the book is AI engineering. And we also now have this term of AI engineer. It's a spreading like wildfire. What, how would you define AI engineering?
Starting point is 00:11:42 engineer or AI engineering, because I feel it's a little bit of a loaded term these days. It is. I feel like a lot of terms nowadays loaded. Like you're not allowed to use Asian anymore. You're not allowed to use like a lot of things anymore. So when I was, I was, like, I was agonizing over the title for the book. Because people was like, first, I know that we need a different term for machine engineering. And the reason is that why, why, when, when, when, with foundation,
Starting point is 00:12:12 models. There are a lot of fundamentals or like systematic approaches are still the same at machine engineering. There are a lot of new things. So for example, one thing is that before, when you wanted to build a machine application, you had to build old models. Just to be clear, you're comparing it with machine learning, right? So like what machine learning engineers did versus what AI engineers are doing. Yeah. So, so yeah, just trying to explain why we need a different term for machine engineering engineers to describe what we are doing today. So, yeah, so before, when you wanted to build a machine link applications, you need to build their own models.
Starting point is 00:12:50 So that means that you need our own data and you need, like, expertise in, like, how to train a babysit our model. However, nowadays, if you want to build, like, an application leveraging machine link or AI, you can just, like, send a direct API calls and access to this, like, wonderful capability. So that's, like, really, really lowers the entry barrier to people. Like, you don't need data anymore. You don't need, like, a fancy AI degree anymore. A second thing is that, like, before, right, like, you need distributions because you deploy
Starting point is 00:13:19 application as part of the existing applications. So, first of all, if you build a recommender system, you need, like, an e-commerce website so that you can have, like, recommend, or like, some kind of website, right? So it's deployed as part-a-city application. A fire detection is part of, like, maybe in a banking application or, like, some kind of, like payment app. But however, now you can just put it out, and then a lot of applications. You don't need an existing distribution channel, but having a distribution channel is like really, really useful.
Starting point is 00:13:52 So I think that another very big thing is that is a shift of focus from less machine learning and more engineering and more product. So before, right, like you start. from, if you could machine engineer, you start from data. Now, you have to gather data, you maybe have human annotations, and then you train a model. And now once a model, it's good, you deploy that into your product. But nowadays, like, you actually start with a bit of demo, right? So we have a cool idea. So, let's just try it out and see if it works. So you start with a product. And after you say, okay, it's like, it works pretty well. I want to make it, like,
Starting point is 00:14:29 better, right? So they started gathering more data, maybe like with, like, as more of, like, prom examples or like in very very rare case I don't recommend most people do it in the early days it's like fine tuning but but it's very rare but basically what it's like starting to think more data maybe for evaluation it's extremely important like having good evaluations become even way way more important with engineering so okay so you got data and then after that it was like maybe have been like sending a lot of API calls like open AI and anthropic or like google and but like okay now it's too expensive now I need to like my own model so you start like hosting your own model, using some open source alternative, or like, 500 model.
Starting point is 00:15:08 So, so, yeah, so like be focused machine engineer, right? You go from a data, model to product, and now with engineering, you go from product to data and to model. So it plays a lot more focus on product and data, which wouldn't be a competitive advantage when everyone shares a kind of similar base-like AI capabilities. So I do think that says need a different term, just separate it from machine and engineer. And I didn't know, like, what terms you use. And then I was like, okay, let's just ask the people.
Starting point is 00:15:36 So I surveyed like a bunch of people with how it was doing, but does this building applications on top foundation models. And almost it was like, air engineering. And I was like, okay, it's just, let's go with AI engineering. So do I understand correctly that, you know, the biggest difference is that machine learning engineers did a lot more kind of groundwork, getting the data, building the model, whereas AI engineering,
Starting point is 00:16:00 you have a lot of that, at least initially, you can start either as APIs or something. So there's more of engineering. You kind of hack things together. You put it together. And then over time, as things become more serious or your product is bigger, you do a lot more of the, you might build your own model or you might host it. You might one day even build your own model if it's there.
Starting point is 00:16:21 But it's just a lot later. So a lot of ML engineering comes down the path if it's big enough, if it works, et cetera. Whereas with ML engineering, it was the other way. You had to put in all this effort. and then see if it even works? So, first, I feel like every company may have different definition of the role. I think it's like even the same company, right, like people get the same title, same role can do very different things.
Starting point is 00:16:47 So it's not very, never like a clear-cut definition. Second thing is that I don't think the question is like machine or engineering. In the vast majority of JNIF AI system, I have seen, there are very strong traditional machine learning or classifier component. So, like, imagine you're building a customer support chatbot, which is, like, whenever I ask at a conference, I see, like, a lot of raise your hand. It's a very, very classic generic applications. And so, like, yeah, so you get a request from a customer, and maybe, like, you have, like,
Starting point is 00:17:23 several different potential solution for it, right? Like, maybe if it's an easy query, you might send it to a cheap model. Or if it's like a harder query, you might send it to more expensive model. But it makes something very sensitive, like, hey, why did you charge me twice for the bill last month? Right? Then you might want to send it to human operator. So you might have this on like a router or like an intent classifier. I like to choose what you send at you, which is like a traditional, classical machine learning model that you can build. Or like after you get a response from an AI model, you might think it's like, okay, does this contain like PII?
Starting point is 00:17:58 So because I don't want to send back to users, like responses that contain, like, private information. So that, like, PI detections can be, or, like, toxicity detections can be a classifier. Or in the RAC system nowadays, like, we talk about how a lot of it using retrieval, like, retriever systems, which also, like, I think it's like in the realm of, like, classifier, like classical machine learning that you can build yourself. So what are the most common techniques used when building? AI applications, things that, you know, the software engineer who's going into building AI applications I should know about and later I can go deeper into. So there's an assumption here that you have, you have like tried a lot of solutions and now
Starting point is 00:18:44 you try JDAI and you think this JETAI is a solution for you. And it thinks like what is the first, like what should I progress from there, right? Is that correct? Yeah. Yeah. It's just like what are some common approaches, you know, like I think things like Rack, fine tuning or other things that it's just good to know about. I should probably learn more about those topics.
Starting point is 00:19:04 Yeah. So I think those techniques are useful. And I was using to recommend some development, like, I'm not recommend, but like a common pattern have seen is a certain developmental path. So initially the first thing I would say is like trying to understand like what means a good response, what a good response is and what a bad response is. So like what you want is a model to share it. And it's not always intuitive, right?
Starting point is 00:19:30 So, for example, like, LinkedIn has a very great example. So they built a candidate, like, job fit assessments for candidates. And they found out that, like, a majority of their time spending just to understand, like, what candidates needed from the model. So initially, they focused on the correct. But then they realized it's, like, candidates, like, found it not helpful. Like, say, say, like, if a candidate asks the model, like, am I a good fit for this job? and the AI respond with your terrible fit. It was like, okay, when am I doing this information, you know?
Starting point is 00:20:03 So you need to try to understand, like, okay, here is, they want more, like, understanding what are the gaps and how they can fill the gaps or, like, get suggestions for other rules that are better fit for them right now. So, like, have this, like, a picture, like, clear understanding and then build a guideline. Like, okay, given this response answer like this, like be helpful, show them, like the gaps, or like show them the role, like, and I have very clear guide like for the model, like, in the prompt. So, so you try those prompts. And then you see, look at the output, and then maybe you can try to add more examples. And then just like go through, like,
Starting point is 00:20:41 get really good response. I try to evaluate, maybe like have a create a set of queries and like experience responses using both automated metrics like AIS judge and also like human evaluations to measure the progress. So, okay, so you have done prompting. We have, like, added more examples just as a prompt. And maybe you will, like, start, like, make it more complex. So you may use, like, you can give them the more or more context so they can answer better questions, right? So maybe when our users ask the questions, you can, like, have the model pull out, like, all the documents or all the job listings.
Starting point is 00:21:17 Related to this question, like, information about the company, like, gets a candidate resume. Right. So you build a system, like, you augment the context with, um, with the document. So that's a rack pattern. So I do think that's like rack is a very, very powerful pattern. It's nothing really, really fancy. And some interesting about rack is that a lot of people equate rack with vector search.
Starting point is 00:21:45 Right. Like when I see a lot of people like, oh, I want to use rack and here's like what very database I should use. Oh, so like people jump straight to vector search. Well, yeah, because I mean, you do have the chunks, you know, like the embeddings can be. stored as vectors, right? So as engineers, we're like, I need a vector search database.
Starting point is 00:22:02 Yeah, people love databases. Yeah, very interesting. But I feel like the first solution is probably not like jump search embedding base, retrieval because you have to use. Oh, interesting.
Starting point is 00:22:14 Yeah, you need to build like an embedding model. Like the quality is that really highly dependent on the quality of embeddings. You have bad embeddings and obviously retrieve things as well. Also, like, very databases can be quite expensive to, like, to run. And then add latency. Also, another thing is, like,
Starting point is 00:22:29 vector databases can, like, embeddings can also, like, obscure certain keywords. For example, I'm searching for, like, a specific error code, right? Like, through embedding, you don't really get the exact error code anymore. So, so, like, so, like, there's sort of challenging with vendor databases. And vector search. So I think it's, like, the usual, like, common approach. And, like, maybe just start with something as simple, like, keyword retrieval.
Starting point is 00:22:52 Like, if you ask, like, just extract all the keywords from, from the user query and file all the documents with the queries. And you say, okay, maybe the documents are true law. And now I can't fit into context, right? That's when you start with chunking. Okay, now how do I chunk this, like, documents into context, like, they can fit into the context length? And now with chunking have, like, other problems, right? So, for example, like, you have maybe, like, the keyword, like, the document is about
Starting point is 00:23:19 the company X, right? But, like, people say from now on, company X is refers to as company. So like it's the rest of the document, you don't have X appear anywhere. So you don't get the chunk. Like if you search for X, you don't get the chunk below. So now you might want you like, okay, now I need to extract the keywords from documents and like get metadata to every chunk. Or like get the title of the document.
Starting point is 00:23:45 Or like some people like started adding summary to it. Or like Anthropic has a very, very good article called like contextual retrieval. It's like as I asked Chachipi to generate the key. information metadata about each document and apparel net to like prepare that to X chunk to have you retrieve the right chunk. So I do think that's like having data preparations, right, actually give a really huge performance boost. Like I have seen that's like giving way, way better performance boosts and I'm focusing on like
Starting point is 00:24:15 which better databases I should use, you know? So what is a little bit of like, I'm not saying that they're not useful. It's just like in the beginning, you probably want. to try something simple, like with the biggest performance games. And then you started moving up, like, the complexity level. Also, like, for a lot of retrieval, someone told me very interesting, a little bit of, like, hot-techie. But he was just like, I'm not going to take any retrieval system seriously
Starting point is 00:24:42 if they don't benchmark against BM25. So BM25 is a pretty old school, like 20-plus year now. Retrival is a lot simpler, like term-based retrieval, not embedding. And it's a really, really hard to beat retrieval systems. And I think a lot of times we use it. Like, if you start a complex, you can now combine, like, both term-based retrieval, like simpler solutions with, like, very databases. So you have both the semantic, like, on the editing side,
Starting point is 00:25:10 but you have both of the term, the exact keyword match, like, on the term base. So a lot of things, the hybrid search is very common. So, okay, so we talk about, like, prom-engineering, add more examples, rack, right? And I think after that maybe having max out on a lot of those things, which usually take a while people. Maybe then you might consider FITuning. But I think that usually have a lot of reservations against FyTuning. Because FyTuning has a whole host of new problems that they need to deal with. Yeah, like FyTuning first, you need to think of like, now you have this model, you have FITTune and now you need to think of how to host it.
Starting point is 00:25:49 And like a lot of models are big. Like they're building the parameters, right? They are not easy to self. I actually read this part in the book. You actually go into the details of the problems with, you know, the memory size and also like you cover the alternatives where you might not need as much memory, but they also bring other tradeoffs. So it's tradeoffs within tradeoffs within tradeoffs, right?
Starting point is 00:26:10 Like you're starting to solve one problem, but you're going to get a bunch of others and you'll need to decide, you know, is it worth it? My time effort, resources. Yeah. Yeah. So I definitely think that also like when you fight a model, you kind of own the model. And now, I mean, like, you, the question is like, how do you maintain it? Like, anything, we have this whole world of very smart people, definitely new models. It's like, just completely increased like rapidly. So when it happened, like, how long can the fighter model like outperform, like those new models that were putting out? So you might spend a lot of energy
Starting point is 00:26:46 and effort into a fighter model. And then just a wicknitter, maybe like some, like, some, like, I don't know, random Chinese comment you never heard of, and release is like extremely fast and extra model, right? So, so, so, so, so, so yeah, so so yeah, it's quite, uh, challenging. So yeah, if you think it's the last resort, not, uh, not the first, uh, first life defense. Yeah, but what, what I've heard is basically like, if I got it correctly, like do a structured approach. Like, you know, start with prompting, start with simple, start with you getting responses
Starting point is 00:27:14 that makes sense, then add more data. You can do this with, you know, rag. You can do it with chunking, keyword extraction. Data preparation really, really makes a big difference, which a lot of people don't think about. And then you can go to more advanced things. There's a whole host of things that you could do. But my understanding is you're probably saying, like,
Starting point is 00:27:34 you'll probably get there over time. But initially, these things will keep you busy. And you'll probably be able to build a pretty good system just with the basics and a little bit of engineering and most importantly, understanding the problem that you're trying to solve as opposed to, you know, building whatever shiny technology or approach. Yeah. I'm so like in the, um, I saw like the approach kind of so be different for like,
Starting point is 00:27:58 um, individual, um, developers or like enterprise. Like if you look at the whole organization. So, so one thing I haven't seen is like for, especially in the early day technologies, usually like enabling, enabling new use cases. I should bring more returns on like this like incremental. improvement over existing use cases. So maybe if instead of like spending the effort into like investing a lot of energy, getting the etching out like a little bit of performance with like fancier like complexity, maybe like using the same suspect you have had and like opening up like new
Starting point is 00:28:29 applications. So yeah, so I see that's why I feel like a lot of companies will take a while to get to the five tuning phase. Trust isn't just earned, it's demanded. Whether you're startup founder navigating your first audit or seizing security professional skill in your governance risk and compliance program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Vanta can help you start or scale your security program by connecting with auditors and experts to conduct your audit and set up your security program quickly.
Starting point is 00:29:02 Plus, with automation and AI throughout the platform, Vantage gives your time back so you can focus on building your company. Businesses use Vantage to establish trust by automating compliance, needs across over 35 frameworks like SOC2 and ISO-27-001. With Vanta, they centralize security workflows, complete questionnaires up to five times faster, and proactively manage vendor risk. Join over 9,000 global companies to manage risk and prove security in real time. For a limited time, my listeners get $1,000 off Vanta at vanta.com slash pragmatic.
Starting point is 00:29:34 That is V-A-N-T-A dot com slash pragmatic for $1,000 off. So let's say that at my company, we decide to build an AI solution. And let's take the example that it's going to be customer service automation. What are typical approaches that I should know about? And you do cover some of these things in your book as well, like some of the kind of more common steps that I'll need to take. So customer support solutions. So I would say it's like the first thing I would look into is just like,
Starting point is 00:30:04 what are the bottlenecks for the solution right now? So, for example, I have worked with, another setup I work with, it actually has this challenge, oh, they had a lot of, like, customer support requests and they don't know what to answer. And the solution is very interesting. It's just like, okay, let's try to drive a lot of questions and choose a common channel, like public discord.
Starting point is 00:30:23 So that means that's like all the users can so help answer the questions. So, like, it's much, and also, like, in the future, if someone has a question, right? They can just, like, refer to previous discussion instead of, like, asking me. So, like, they try to make a lot of the discussion. and customer support request public. So, like, another solution was pretty popular in, like, in 2018, 2019, is that
Starting point is 00:30:46 you do, like, routing to the right department. So, so the challenge, then, it's, like, they realize the bottleneck is in, like, triaging. So, like, for someone, we get a request, I don't know which department you send it to show. So, like, well, a bunch of startups then is just like, okay, let's try to build a system, like, to predict, like, is this just go to, like, the finance department? I should go to like a technical support department, you know, like just a Skype routing, already reduced like the version a lot.
Starting point is 00:31:15 And if you think it's like, okay, when we need like Gen AI to do it. So I think it really recommend the frameworks that Microsoft introduced. The code is like cron work run, like going from like a slightly lower stick to like higher stick deployment. So first of the customer support chatbot, right? So maybe initially you can have a human user loop. So for example, you can have like for every request instead of like a human agent right into the response from scratch, you can have like AI suggest like a few options. And like the human can choose one or like can just like choose one as a starting point
Starting point is 00:31:55 and then make short a quick edit and send it. So like once you see this like okay maybe the acceptance rate is like getting really high. So first of all for this categories of queries, maybe. accept and there's like 90%, right? So you may feel more, more confident. So, like, wrote it out. So maybe it wrote it out should maybe a smaller user or you can run it out to even, like, internal use cases. And then, so, like, you give it more automations, but, like, reduce the scope of, like, the scope of, like, of deployment.
Starting point is 00:32:26 And then after you're really happy with it, you can, like, run out to, like, more users. So, yeah, so that's how it would go about something like customer support I thought. Nice because like my what I was kind of expecting you might say is like oh you know like you know just build this like you know AI framework deploy it. See it. I feel that's what a lot of companies are doing by the way. A lot of things is like oh, Jen AI. Oh, let's let's get you know this model from chat. GBT or Antrophic. Let's try to, you know, put it there. Let's let's put it out there. But I really like how what you're saying sounds like it's not really like you know what you explain. It's not really specific to Gen A.I that you can went through like you know, look at the business problem. Look at the problems. you have, look at the options, which include not just Gen AI, but more traditional machine learning, as you say, classifier. And then you look at if, you know, these tools help your problem and then, you'll make sure that it actually solves your problem when you roll it out. Don't just blindly roll it out, which, I mean, all of this sounds to me, it's not really new, is it? It's not, like, you could have said the same thing two or three years ago before Gen AI, except we would
Starting point is 00:33:30 have not had those Gen A.I. Tools to play with. Yeah. So, so I think actually before our chat, actually look up one of those talks before and you have this talk about like things that haven't changed, like things that feel very similar about engineering. And I definitely like, dealing with a new technology is one of the things that never change. So I feel like every time there's a new technology that comes out, I can hear like the collective side of like senior engineers everywhere saying like not everything is a nail. Like like people just try to get technology to work for everything. So, so yeah, so I do things that a very common, um, challenge I see is that like people just like jump straight into it. Like you just want to use
Starting point is 00:34:10 Gen AI for when they don't need Gen AI. So I think this is there are two different, there's two different headlines. So one headline is like I use Java AI, right? Or it's the headline that I sold the problem. So if you want to focus on the first headlines and yeah, you're Jad of AI, but if you want to sown the problem, then you need to understand like what, what is a problem? Like yeah, like what other challenge is there? What are the roadblocks and remove the roadblocks using the simplest of solutions, not the fans. is one. Yeah, I feel that there's a bit of a really strong fear of missing out across most tech companies that everyone knows this is such a transformative technology. It gives us so many
Starting point is 00:34:47 new capabilities that it will be important. I think everyone knows that their company will be using it. But now there's a fear of missing out of, oh, what if what if my team doesn't build it? What if someone else gets ahead of me? And so like a lot of companies, many teams are all building it and you're trying to just, you know, like using a hammer looking for nails, even though they might not need it at the time. I mean, I'm not sure if this is a bad thing necessarily because, you know, people at least get experienced with it and, you know, they will need to learn about it. But it's a very interesting time because it's rare to see, usually we see like a new back and framework come out or like something that's limited to
Starting point is 00:35:25 domain and people jump on it. But this is what the first time I've seen that the whole industry is jumping on it and everyone is trying to use it. And it. And it's, put it in whether it works or not. I definitely agree with you on this formal thing. I do think that's like everyone jumping on it. It's actually a pretty good thing. I feel like the energy is incredible. I have never before since like so many smart people focusing on the same problem.
Starting point is 00:35:52 It's like incredible and the progress is amazing. I do think however, like I think this is an irony. It's just like the more we want to not miss out things, the more things we will miss. Because why I think it's feel like if we try to keep up with news, right, we should try to jump from one piece of news to another, we will always stay at the surface level. We never really go deep into anything.
Starting point is 00:36:15 So I actually don't quite read news. So I find it's like a little bit distracting. So I think like I try to like I feel like my approach is like, okay, pick a problem that you care about. And then only care about things. It's like have you sold this problem. So like if there's some news coming out, I was just like, does this help me?
Starting point is 00:36:34 So on this problem? If it doesn't, I kind of wait, right? Because I feel like if there's something important, it will still be important, like two weeks from now on, like a month from now on. Like, I don't drop everything. It's like, okay, let's just go and understand what it is. Like, you know, so I feel like trying to get a more, try to stay a bit calmer. Yeah.
Starting point is 00:36:58 Yeah. So when you're building an AI system, one of the things that you will come across is you need to evaluate the output, how well it works, does it solve your problem? Why is it difficult to evaluate AI systems? And what are common ways to do that? So evaluation, I think, is like a billion dollar question, or even a trillion dollar given how much people investing in the eye right now. Yeah, no, it needs to go big.
Starting point is 00:37:25 And if it go big, better be go really big. So, so, so I think this is challenging because the smarter AI becomes the harder it is for us human to evaluate it. So before, right, like if AI was like incoherent, you can pretty tell if the response is bad. It's like, okay, it doesn't sound good, like it's a bad response. But nowadays, it's like it's pretty coherent, right? Like, for example, if you ask, it should like jerry the summary of a book, if the summary sounds convincing, you actually don't know if it's like a good summary or not, and you might have to read the entire book yourself just to evaluate whether it's a good summary.
Starting point is 00:38:04 Sorry. Or like the math, a lot of times, I personally use AI to ask a lot of questions because I don't know the answer. And because I don't know the answer, I don't know if the answer is correct, right? So an example is like a lot of people can tell if a math solution to a first grade question is correct. But a very few people can tell if it's like a fancy like equations like proof is like correct. So I remember when 01 came out, like Terence Tao. He's his amazing mathematicians. I think it's one of the best mathematicians of our time. He actually took time to evaluate O1.
Starting point is 00:38:45 And he says the experience of like using O1 is similar to advising a incompetent but not completely stupid. Okay, like a media government, not completely incompetent, a BG student. But this makes me think if we really need like the brightest minds today to evaluate AI, then we're soon to run out like really, really smart people to like evaluate AI. So I thought it's like so. So what work could be like the next step forward? So before a lot of time we use the whole human as a goal standard for AI performance. It's like, okay, so humans like starting writing out like here is how you should respond to this
Starting point is 00:39:27 and here's how you should do it. And like, yeah, I should try to copy human. But now we, for many, many tasks, like we have,
Starting point is 00:39:35 yeah, like, outperform like human way better. So, so I think there's several, so I thought about like several approach
Starting point is 00:39:42 of like, to deal with it. And I think that's why I separate the chapter, like those, so initially I had like one chapter on evaluation, but the more I write about it
Starting point is 00:39:51 and it was like, should, there's so much sure enough when I two pretty long chapters on evaluation. The first chapter is on general methodology. And the second chapter is about how to use different techniques to evaluate the AI system.
Starting point is 00:40:02 So like, so like the methodology, like one is that functional correctness. So it evaluates the output of application based on how well it performs as a task. So like if you say that like, hey, use AI to save energy, you can see like how much energy is actually safe. Or like hey, uses AI to play this video game, you can see how high is the score you can actually get. Or a very common use case for this coding. So it do things that's like it's not a code incident, it's a coding, it's the most popular use cases because like we actually know how to evaluate jetted code. Like we might not know how to evaluate jaded jettled and stuff, right?
Starting point is 00:40:42 But you know how to evaluate jaded code because we've been testing like code for like for a long time. So with code you can do like use functional correctness to evaluate like whether this code compile, does it run? like does it generate expected outputs? That's what we wanted to do. So like that one approach. The second approach is like using AI to evaluate other AI. So we've been using AI to evaluate a lot of applications. So can we also use AI to automate like evaluations? And actually doing pretty well I think we have is like um I think like in many many even back in like
Starting point is 00:41:20 like 2023, LandChine has this report. They saw as the majority of applications they saw already has some sort of like AIAS judge or like LMS a judge and I think it's like it's growing. And we do things that's like it's getting pretty cost of fictions and like useful. But of course I see a lot of like challenges around using an as a judge that we can go into later. But another approach that was very interesting is like on a comparative evaluations. And the reason like as humans, it might be hard for us to give an absolute score on something. But if we can give like two versions of something, we can tell, oh, we'd like this on better. So we have done a lot of studies showing that like even for, even for tasks where AI is like out of performing,
Starting point is 00:42:08 like doing at the level with like humans experts like can't really do, we can still tell, like, detects the differences. So I think it's been like, I think this hasn't been a lot, it's been like guiding, not just evaluations, but also a model development. Sounds like there's just no simple answer, right? Like you kind of need to go through all these options and figure out, like, in your case, which makes sense for costs, for what you can do, can you have a human in the loop? So there's no real silver bullet, no one thing that you can just use. Yeah, I don't think there's a simple solution.
Starting point is 00:42:46 So that's one thing I'm a little bit skeptical about evaluation tooling, because a lot of challenge with evaluations are not because we don't know how to evaluate, but because it's required disciplines and hard work. And a lot of things that's like tools can't really automate. So for example, one thing is for evaluation. We need to evaluate an application based on what users want. And we don't, so that means that we need to like go and talk to users. We need to look at their interactions.
Starting point is 00:43:24 Because a lot of things, what we think is like, we want to like evaluate what matters, right? We have to measure what matters. So for example, I have several examples of like how it's very counterintuitive thinking that we're measuring one thing, but users actually care about other things. So in the beginning, for example, like I have a friend who building those pretty big applications just like basically building, like to summarize meetings. And initially they were like, okay, we try to get a, we try to measure like correctness. Like, does the summary cover the content of this meeting?
Starting point is 00:43:58 All I say, so like think, think about like, hey, do, does the model follow the format? Because they think this site users want like shorter summary and they agonize over like, do we want like three sentence summaries or like five sentence summaries. And they try to measure though. But actually, eventually what they found out, that users don't really care about the whole content of the meeting. People only want, like, what is the action item for me? Like, what you had to do after this, right?
Starting point is 00:44:23 So they actually start changing, like, so they don't measure correctness anymore. I mean, they still, like, don't make up things, right? But they focus on, like, get, don't miss out on action items specific for the person asking for the summary. Yeah, so it's like, or like, or the examples, like some people using chatbot for, for, So we talk about a customer support chatbot. So we want to go back to that example. So a pretty big tax firm.
Starting point is 00:44:52 So they built a chatbot. So you know, tax software, you can pretty tell which company it is. So they were just like launched a chatbot. And to help people with tax preparations. And the response is very lukewarm. They were like, so they were measuring by like the users, how they use it. And they was like, people just didn't really seem to use it. And they were like, why is that?
Starting point is 00:45:14 Is that because it's like not, it's what he hallucinate? Like, what is the challenge there? So they try to have a measure all this kind of metrics, right? But in the end, they found out just like, user didn't use it because they hate typing. People just like don't really like typing. And also like because if you face with a domain, a domain that you don't really know,
Starting point is 00:45:33 like I use the software because I don't know a lot of things about tax. I don't really know one question to ask. Yeah. Oh, so they didn't know what to type. They didn't understand the domain. You know, they went to the tax thing because they want to take care of their own tax yeah so so i think i started like uh trying to like
Starting point is 00:45:51 understand more like what kind of questions like people would would ask and like suggest that in the beginning and then it's just basically it's a guide users so it's kind of education like here's a question you should ask and then here's as the answers keep going so so i think it's like um a lot of that is it's just a lot of understanding your domain, the problem domain, like go talk to users, looking at the data. I do still think that looking at data is very, very important. I think Greg Brockman has a great quote about it. So you think that's like manual data inspections is one of the activities that has a highest ratio of like values you practice. So that means that's like people don't think highly of like manual data inspection of data-ranking.
Starting point is 00:46:39 Let's give you some interns you do. Like, let me think with something fancy, like algorithms and stuff. But actually, it's extremely high value. Because by looking at data, you detect patterns. You understand how users use our product. So actually, like, I usually, a very good practice, I really highly recommend to teams. It's like, don't forget human evaluations. So you use ASA judge.
Starting point is 00:47:02 But AASA judge has a good practice. have a lot of challenges because like the quality of the judge depend on the underlying model and the prompt and also non-deterministic. So things can change over time. But like if you have some like immune evaluations, like very consistent, like very clear guidelines, like every day, go in there, look at maybe like 50 samples of like actual interactions. Or like if you have more resources, go at highest like 500 or 1000s, right? So that you can like get some kind of like picture of the house or user.
Starting point is 00:47:33 first how the users are using their product. Any changing behavior based on like current events? Maybe because of reasons like administration change, maybe people have a lot more questions about that topic, for example. Or like, have you like correlate with like all the automated metrics. For example, like if the AI judge scores somehow like start changing compared to the human's just score, maybe is this something you need to investigate? Yeah.
Starting point is 00:48:00 So I guess you cannot, like as you said, you can't, skip hard work if you want to get good results. And you can't really pull humans out of a loop fully, at least initially. What are some common mistakes you've seen when teams are building AI applications? I'm sure you've seen a lot. Yeah, but I feel like I don't want to say in a way, say, oh, everyone is an idiot. So, yeah, so one, I think we touched on several. So one of the common mistake is like use Gen AI when you don't need Gen AI.
Starting point is 00:48:35 So first of this is a startup that came with me with a pitch. It was like, oh, I'm going to use, we're going to use J-FAI to help people optimize electricity usage. So when I ask, like, so that people can tell Chuck the chat to AI, like, hey, here's our, like, I live here and here are the activities that are doing the days that are very energy intensive, maybe in a charging your car or like doing laundry or something. And the AI is going to tell you like, hey, you should do this activity, this time and this time so that you can maximize, like, minimize the electricity bill. And they were like, oh, our reasons show that, like, you can save you on average, like, 30% of electricity bill. And it was like, free money.
Starting point is 00:49:13 Why would anyone not want that? And so I was asking them, it's like, what is the cost saving if I'm just, like, manually schedule, like, the most intensive one during the off-pick hours? And it's just like, you look like, okay, just charge the car at like, you know, 10 p.m. or something, you know. They were like, we haven't done that yet, but we're going to try it and let you know. And they never got back to me. And they abandoned the idea later. So I feel like a lot of those optimization problems can be sold like greedily, like maybe even like spreadsheet.
Starting point is 00:49:48 Without Gen. Yeah, without Gen. Yeah, without Gen. Another spectrum is that like I see a lot of companies giving up on Gen. because they think that Jenaa is not good for that problem because they have tried it and it doesn't work. And a lot of time I got surprised. It was like, wait a second, I just talked to another company
Starting point is 00:50:06 who's just like they use that for the similar use case and it worked really well. And when we look into it, it usually because of like bad product. Like because they don't promise well, they don't understand the users. They don't, yeah, they just like they don't even know what to evaluate well. So for example, like I was working with this like company that does basically like extracting resume information.
Starting point is 00:50:31 So that's a person like get a resume and they try to like map out like where does a person work before and like create a summary of that person's life. And they have a two steps. Like first like from resumes, they try to extract all the text. And then after all the extracted text, they extract the text, they extract the organizations from the extracted text. So by the way, the resume is a PDF, not not pure. text, right? And then I asked them, so it was like, okay, it worked chairable. Like the, they never,
Starting point is 00:51:00 like, they got like the organization's like wrong about like 50% at the time. And then I was asking them like, uh, when the process does this fail? Is that in the from the PDF to extract text or from the extracted text to an organization extraction? And they were like, oh, we don't know. We didn't do that. It was like, if you, if you can't pinpoint it, if you can localize, where it fails, then how can you fix it? So, so a lot of time, it's a lot of time. It's a like seems like common sense but somehow i don't know is this something that's always like puzzled me a little um or like another is um another is just like statue complex for first of all like jump stretch your databases or like a phy tuning um or another common one nowadays it's like when you see a fancy
Starting point is 00:51:48 agent framework you just like let's let's use this framework you know let's just try it and i think it's Eventually, attractions are like really, really cool. Like, I think I'm very grateful for many attractions that make my life easier. But I do think that attraction should encode my best practices and should be heavily tested. But I think we're still in the phase. We're still learning, like, best practices. And also, like, a lot of attractions can introduce, like, unnecessary, very, very painful books. So when I was going through the code basis of a lot of those frameworks, and I found out something interesting, like,
Starting point is 00:52:23 A lot of those frameworks have some default prompts can help you get started, right? Because it's like, it's made you very easy for you to like, to begin. But then like every single those prompts and look like have some type typos. And they're just like changed. So you have somebody submit a quick PR to like fix the typos. But it's not part of the release anything. So if you're using like this framework using one of the deforms and then setting the performance like applications that's like change, you actually don't quite know.
Starting point is 00:52:53 like why is that changing? Because the problem was like changing under the feed. So, so yeah, so like those, um, those, those are very interesting. Like those are just like patterns. But it's interesting because what you mentioned, it sounds like if I'm, you know, collecting these, it's like using this technology when you don't really need it, giving up on it without, you know, just for common sense reasons, you could have just fixed some easy things.
Starting point is 00:53:20 using a new framework when it's just not really high quality or it's and you know it doesn't really have the best practices this kind of all sounds stuff that we could just replace gen ai with you know a new a new technology or a new a new stack and we'll probably hear similar things right it's it's typically these things because it's it's just it's changing all the time there's no best practices no one really knows how to use it there's you know whoever tells you they're the expert, they're still just, you know, they have a maximum year of experience with it. It's not really new, is it? Yeah, I think I definitely agree with you.
Starting point is 00:54:02 I do things that, like, even though, like, technologies change over time, they're, like, systematic thinking. It's like systematic approaches to problems usually don't change. Like, yeah, if you want to sort of problems, you first start by, like, breaking down the problems, like, seeing where the challenges are and, like, go through different solutions. you like do that it seems seem come on but I think like a lot of time a lot of us get FOMO I think FOMO get in the way and was like okay we know there's the right thing to do but I also feel like I just need to shake this thing out first you know and it keep doing that
Starting point is 00:54:36 like three times a day so day is gone and you just like never really get time to sit down and I think really deeply about about what what that what is that you're trying to do yeah so I guess we're going to see a lot of the kind of mistakes that are with news technology happen. Plus, if someone, you know, if some of the listeners have adopted new technology, you can probably use some of that approaches. I mean, you know, just localize it for, for Gen A.I. And, you know, see if you can avoid some of those. Yeah. Speaking of, speaking of new technology, as someone who is learning Gen A.I, a software engineer who wants to get into AI engineering, what would your recommendation be to learn? You know, things do change so fast. You did mention
Starting point is 00:55:15 the importance of fundamentals. What would you focus on? So, I have a lot of thoughts on learning because I like learning a lot and I think over time I just like observe some patterns and like by the way like the way at learning might not be the same as a way you're learning. People have different learning style. But in general I think it's like I think of learning has like two different approach. One is like project-based learning and the other is a structure learning. Project-based learning is like, okay, you choose a project and you work on it really, like, go and try and solve every problem in that project, right, and finish it.
Starting point is 00:55:58 But structure-based learning is more like when you take a course or you read a book. It's just like somebody else laid out, like, here's the things that you want to do. And I think there's quite a bit of a debate on, like, somebody told me recently that's, like, a friend, a very good friend. And he said, like, oh, you think it's a problem nowadays with people who want you to be. engineering is that they spend too much time learning and not enough time doing. And he was just like, just forget on the courses, forget all the books, just pick a project and just work on it. And I do think this project-based learning is very valuable, very valuable.
Starting point is 00:56:31 But you think of like here is a set of the skills and knowledge I want to do, right? I need to have to become like really good at something. Project-based learning can have you hit a lot of this point. But it doesn't always have you hit like on the points. and you can get like sometimes get the confusions. Whereas, sometimes you still need to complement with structure learning. And another thing that's structured on project-based learning, that a lot of people like usually do it follow some tutorials.
Starting point is 00:56:55 Now people like here, someone has this pretty tall how to do this. And I think tutorials is really cool. And I think like I am so personally doing it a lot. But I also notice that it's very easy to just like mindlessly clicking one cell to another and just run the cell run another. And don't really stop to ask. like why is this being this way? Like, why is this library being imported?
Starting point is 00:57:18 Like, why is this code written this way? Why is the batch size is 16 instead of like 64? Like, why is it like, okay, export, the measure of export. So some people, it's very easy to not stop. Like, there's no, like, mechanism to force you to stop. You just want to run to the end and see what the output out and make some changes, like by the best guesses. It's something funny.
Starting point is 00:57:38 Like, when I was, like, looking at this, like, open source project, and it was like, it wanted to do. to do a market research and see who is using this framework. So it was a framework with IBIS. And I thought it's like, I knew that if you need to use this framework, you need to do import IBIS. So I went through on a GitHub and I searched for all the repos that have the line like import ibis.
Starting point is 00:58:02 And then I found a lot of repos. Then I went to it. It has an import IBIS, but then it doesn't like, the code base does not have IBIS anywhere else. It's not used at all. I was like, what is happening? So I realized that a lot of those repos, like copy from a tutorial. And that tutorial used import IBIS.
Starting point is 00:58:20 And then, and that's my mistake. And then, like, everyone else. So maybe the original developer, like, import IBIS and then deleted the code because I didn't use it anymore. And then, like, everyone could copy. It's just like, it's the same thing. So I feel like that is something a little bit dangerous. Like tutorial-based learning is great, but I do think it's very important to be able to
Starting point is 00:58:40 stop and ask question. and sometimes structure learning can help you, like, ask the right questions. Like, think things through. So, yeah, so I think like for you were starting, I would recommend maybe a mixture. Like, yes, choose some project you want to work on. It doesn't have to be like big, fancy project. Like, just try to, like, pick one. And then at the same time, complement it with, like, some structure learning.
Starting point is 00:59:08 Like, pick whatever, like, maybe a book or doing a course with a friend. read paper. I think read paper is a bit, a bit interesting because read paper, reading paper, is a skill. It can be quite time-consuming and you need to know what you want to get out of it. But yeah, so like start a project, complement with like structure learning. At the same time, there's an exercise that I felt very, very useful, at least for me initially, is that like for a week, I try to observe like what I do, like try to make note of like what I do and try to make note of what I do. and try to think of like what percentage of that
Starting point is 00:59:44 can be automated by AI? Like what could be done by AI? Yeah. And then I try to use AI to do those. And this just gave me a lot of ideas on the use cases. Like just think about it what matters to me and it would be an application
Starting point is 00:59:58 that concerns the problem. It's just great already. Yeah, that's, I think it's an unconventional way, but it's a good way to look at it. Because in the end, also it kind of, I think it might help you get ahead of you know, this like dread
Starting point is 01:00:11 of like what would AI do for me? Because you realize what happens when you automate things, which actually leads to my next question. There's a lot of fear mongering around. Oh, AI will mean the end of software engineering because AI is very good at coding. It's a lot better than a lot of other areas. What is your take on?
Starting point is 01:00:29 Will, as AI gets better, will it actually end software engineering or it will change it or it's not going to much change actually? I think it goes back to the question of like, what software engineering is. So maybe you can get an analogy. Maybe you can help it explain it better. So the writing, right?
Starting point is 01:00:48 So we tend to confuse the most salient activity of something as the job itself. So first of all, writing. Writing in the past, writing means a physical act of, like, putting some, like, words onto paper. Yeah, yeah, on a paper or... And back then, right, like, it was... People think of writing as that. People actually took pride in that calligraphy, like, oh, have beautiful handwriting. You must be smart.
Starting point is 01:01:15 You must be intelligent, right? But then we had computers. And now writing doesn't refer to that act anymore. Writing refers to the process of arranging ideas into a readable format. And I think the same thing as coding. So now that people think of like software engineering, but I think of like, it's like, it's a physical act of like putting code on like, I don't know, like a VS code or like VIM or whatever software that you use.
Starting point is 01:01:38 But that's not what some engineering is. engineering is about like solving problems. Like here's the problems. How do I like come up with executable programs to solve this problem? Coding itself is just like a physical act of it. And I do things that's like, yes, maybe AI can have your automate coding. But I don't think it's going to fully like automate like problem solving because you still need to know what problems is.
Starting point is 01:02:03 And only you can understand like what problems you're facing. Well, and also, you know, AI has the problem like coding. really, our software engineering really is like, yes, you need to solve problems. But what I don't think we say is you need to do that very precisely. The reason the job software engineer or programmer exists is because it is very hard to be specific to speed the computer's language. Because, you know, if you move that if statement somewhere else or if you change a variable, suddenly, you know, the program crashes because now you have a stack overflow exception,
Starting point is 01:02:33 which you, of course, understand if you're a software engineer, but if this is just a business user who says, I want to show, you know, if you resize the window, I want the button to move over, it's easy to say, but then as a software engineer, you know the edge cases, you know the environment, you know what needed to worry about, you need to worry about system events, etc. And then you'd write code for all of those things. And I'm sure we'll get to a point where the AI will be able to generate some of that, but it might not. And you will still at some point need someone who understands, you know, that code and can figure out where the gap is. Because English as a language is not as precise as a programming language.
Starting point is 01:03:14 You know, programming languages were invented to be very precise and unambiguous and very easy. You know, you can go from assembly code to the programming language because it's a one-on-one mapping or and then from English to a programming language, that's very fuzzy, right? Yeah. Yeah, definitely. So I think that profession will not go away. Like for users, it might work for some kind of. have more obvious use cases of like you say something and you get something roughly that's it and
Starting point is 01:03:41 you try multiple times and it generates something else and you're happy what for you know a business or a professional use case you will need those people who will be able to guarantee that you get exactly what you want yeah i'm actually really excited about like a i can automate like part of coding uh because like it's actually enable software engineers should do software much more complex so like i go by to the analogy of writing like before when it has a lot of you know that's to do the manual, like, copying words onto papers, like, all the books back then were very small. Like, I think it's like, like, 5,000 words or 10,000 words considered, like, big because,
Starting point is 01:04:17 like, it took a long time for people to, like, copying things. But now, like, we have books, like, 100,000 words, right? And I think it just makes things a lot easier. I do things like with software engineering. Like, if you don't have to manual, write, go to quickly, like, turn ideas into, like, snippets, acceptable programs. I do think it enable, like, much, much more complex software. Yeah, and maybe one software engineer will be able to kind of command a lot more software, debug, or maintain a lot more complex system by oneself.
Starting point is 01:04:47 Because right now, you know, there's a reason that, you know, for a million lines of code, usually there's like several engineers. It's rare to have just one engineer. If a company broke that, I'm not talking about dependencies. So that will be interesting. And so what other use cases are you excited about for that AI could bring outside of just coding? Let's see. I think I'm excited about education. So I do things that's like AI can help people learn. We want you to learn a lot faster. So I think one thing I realize is that like nowadays, if you know the answers, if you know the questions, finding the answer is actually quite easy. like you can ask like AI and it's usually give you like pretty a lot of like at least it can give you a lot of like references for you to like go and read more about it.
Starting point is 01:05:43 But then like what's still hard is like how should come up with the right questions. And I think it's like education needs to like focus on like forcing students like create the habit of like asking questions and understanding like so I do things like it's help learning to be. come a lot more efficient. Like, people can learn a lot of things. Yeah, and I feel like I do believe that if we are can learn better and faster, then we can actually do more things. So I'm very excited about what that would look like. What other use cases I'm excited about.
Starting point is 01:06:27 I'm excited about entertainment. Yeah, I think it's like. I think it's like sometimes we always think of like entertainment and education as separate things. But I don't see why we can't have like games that help we learn things more. Like we have some strategy games like teaching girls about negotiation. That could be like really fun. Yeah. So or something like more intellectually stimulating content.
Starting point is 01:07:00 You know like movies or shows or show. don't have to be like about like my less. But I know other people like watch that because it's helped them escape. But I also think like like like genre as like make me think a little, you know, like for or like understand more about like different fields. So I do think that now I can access us in like creating content that is both entertaining and intellectually stimulating. Yeah.
Starting point is 01:07:29 So I think it could be like a lot of fun. For some of simple, it's like, nowadays we have a lot of medium adaptations. So you have a book, you can convert it into a movie, and sometimes we have a movie, you convert it into a game. Or, like, you have papers, convert into a podcast. So now if we have some content, AI can help us, like, convert, adapt different medium. That could be, like, very, very exciting. Yeah, I think it's like, it's like, it's like, it's.
Starting point is 01:08:01 there are a lot of small problems that I'm so interested in. I think it's like I haven't touched on any of the enterprise. I feel like that's where most money is still. I do think I say a lot of like, I think like enterprise or company organization structure is going to change. So first of what does that mean? If you think about like a lot of organizations, what is the job as a middle management? It's like to, like, first, like, like, aggregating information from their reports and transmitted, like, up, uh, up to the executives.
Starting point is 01:08:40 And, like, and this is the other way in the transport, like, transmitting information, like, directions from executive to, like, lower layers. But, like, information aggregations, actually, like, really, really. And something AI can be, like, really, really well. So, so I do think it's, like, companies can be a lot more, um, more efficient. So let's close up with some rapid questions. So I'll just shoot some questions and you tell me what pops to your mind. What programming language did you use most when you built AI applications or did ML engineering? And why?
Starting point is 01:09:13 Python and JavaScript. You do JavaScript as well. How come? Oh, yeah. Definitely. I think it's like a huge part of like building products. I have to like build them more quickly. So I think that's very, very handy.
Starting point is 01:09:26 I'm not very good. Like I think I've always been scared of JavaScript. script, but I'm very grateful, like, AI actually help me, like, getting started a lot easier nowadays. And which one is your favorite AdLM model right now and why? Oof, I don't really have a favorite. I use it for different things. So I still use chatypity out of habit because I have a bunch of prompts that I use.
Starting point is 01:09:51 I already, like, have little things, like, rebuilds a prompt for chatypity I'm still using. I use clots sometimes for, like, creative writing. because I think it's less, sometimes less cliche. I'm reading on Dipsick R1 and who's not reading about A1, so I just trying it up. I don't think I have, like, a favorite. I think I did some of the Lama, like, lava, like, which is the vision's version of, like, a Lama.
Starting point is 01:10:19 Before for, like, some kind of interesting use case, like, from, like, a screenshot to code, like, just trying to test us out. Having fun with it. but yeah I'm not emotionally attached to any of them yeah and what's a neat AI tool that you've used and that you like um so I have something that I built that's really helped me good research so so one thing is like when when I saw a link like a papers um there's a lot of links and I try to get a quick so I realized when I read a
Starting point is 01:10:51 link I do through the same same process like I read the abstract I look up the the authors to do the read of work ask questions. And so check my, when I was out, check the citation. So I have the little tool. So it's like just like go and scan. So I have a link. You just give me all the information on, yeah, like.
Starting point is 01:11:11 Oh, nice. So she's got a tool to scratch your own edge. Yeah. Yeah. So it's like, I think it's just a beauty of AI now. You can build like, it's taking like very small amount of time. Like you just build a tool like just for you. Like before, right, it would take me like weeks.
Starting point is 01:11:29 But now I can just like do all of that. Like, I don't know. This is something to be excited about. I agree with it. Yeah. What are what are one or two books that you've read and would recommend? Oh, I recommend a lot of books. But I felt like it's recommending books when we feel like forcing people to do what you enjoy.
Starting point is 01:11:51 So I like books that's like help me get a news perspective or like get, give me like, give me like, inside and choose some topics I don't know a lot about. So I really like the book like, first of all, like complex adaptive systems. It's a very interesting book about a system thinking like how to design like yeah, like how should do how to design like social dynamics so that to get people to work toward like the goals that you want to work on. It's a very interesting book have forced you to think about systems. I like the book, a selfish gene because you know, stand more like about free will and makes you like question a little stuff but it's the idea of like you can live on either through like genes or through ideas like they're two ways that you can live
Starting point is 01:12:39 all it's like yeah so genes could need to live on with their offspring and like reproductions the other thing is like if you have ideas and the ideas can also like replicate and like the idea of memes like jeans and memes um I like the book uh antifragile so so yeah so the ideas thing I think the author is a very interesting character. I just read several of his books. I really like them. Yeah, I like that book. Yeah, I think there are a lot of books that I like.
Starting point is 01:13:13 No, thank you for the recommendations. So thank you for being on the podcast. I mean, AI enduring is such a new field, and it was great to hear from yourself, who has clearly gone very broad and also very deep and has been in this field even before I was called AI. engineering. So thank you for this. Thank you so much for letting me ramble on on the show. Yeah, I really appreciate it. And I think that I think one thing for me is I really enjoy
Starting point is 01:13:42 from writing or talking is that I get feedback because somebody was like, oh, I think I'm less interested like I agree with you. I mean, it's great to hear, but it can also like a little bit like not good for the ego. But I'm so really like maybe a little bit pushback like, okay, you didn't think about this, you forgot this, or I didn't take into account this one. So I would love to get those feedback. So if there are anything that you felt like I miss out on, like, do let me know. So I really appreciate it. Thank you to Chip for this conversation about AI engineering.
Starting point is 01:14:10 To get in touch with Chip, including to give feedback on her book, you can find her contact details on her website, linked in the show notes below. I've also found her book, AI Engineering, to be a broad and deep overview of this important field. It's a book that focuses on the fundamentals that are unlikely to change. So if you want to learn these, it's a good book to have. In the Pragmatic Engineering, we've previously done several AI and ML-related deep dives. Check them out, also linked in the show notes below.
Starting point is 01:14:36 If you enjoy this podcast, please do subscribe on your favorite podcast platform and on YouTube. Thanks, and see you in the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.