The Pragmatic Engineer - AI Engineering with Chip Huyen
Episode Date: February 5, 2025Supported by Our Partners• Swarmia — The engineering intelligence platform for modern software organizations.• Graphite — The AI developer productivity platform. • Vanta — Automate compli...ance and simplify security with Vanta.—On today’s episode of The Pragmatic Engineer, I’m joined by Chip Huyen, a computer scientist, author of the freshly published O’Reilly book AI Engineering, and an expert in applied machine learning. Chip has worked as a researcher at Netflix, was a core developer at NVIDIA (building NeMo, NVIDIA’s GenAI framework), and co-founded Claypot AI. She also taught Machine Learning at Stanford University.In this conversation, we dive into the evolving field of AI Engineering and explore key insights from Chip’s book, including:• How AI Engineering differs from Machine Learning Engineering • Why fine-tuning is usually not a tactic you’ll want (or need) to use• The spectrum of solutions to customer support problems – some not even involving AI!• The challenges of LLM evals (evaluations)• Why project-based learning is valuable—but even better when paired with structured learning• Exciting potential use cases for AI in education and entertainment• And more!—Timestamps(00:00) Intro (01:31) A quick overview of AI Engineering(05:00) How Chip ensured her book stays current amidst the rapid advancements in AI(09:50) A definition of AI Engineering and how it differs from Machine Learning Engineering (16:30) Simple first steps in building AI applications(22:53) An explanation of BM25 (retrieval system) (23:43) The problems associated with fine-tuning (27:55) Simple customer support solutions for rolling out AI thoughtfully (33:44) Chip’s thoughts on staying focused on the problem (35:19) The challenge in evaluating AI systems(38:18) Use cases in evaluating AI (41:24) The importance of prioritizing users’ needs and experience (46:24) Common mistakes made with Gen AI(52:12) A case for systematic problem solving (53:13) Project-based learning vs. structured learning(58:32) Why AI is not the end of engineering(1:03:11) How AI is helping education and the future use cases we might see(1:07:13) Rapid fire round—The Pragmatic Engineer deepdives relevant for this episode:• Applied AI Software Engineering: RAG https://newsletter.pragmaticengineer.com/p/rag • How do AI software engineering agents work? https://newsletter.pragmaticengineer.com/p/ai-coding-agents • AI Tooling for Software Engineers in 2024: Reality Check https://newsletter.pragmaticengineer.com/p/ai-tooling-2024 • IDEs with GenAI features that Software Engineers love https://newsletter.pragmaticengineer.com/p/ide-that-software-engineers-love—See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
Transcript
Discussion (0)
How would you define AI engineer or AI engineering?
Yeah, so before, when you wanted to build a machine
link applications, you need to build their own models.
So that means that you need our own data and you need expertise,
how to train a babysit a model.
However, nowadays, if you want to build an application leveraging machine link or AI,
you can just like send a direct API call and access to this wonderful capability.
So that's like really, really lowers the entry barrier to people.
Like you don't need data anymore.
You don't need a fancy AI degree anymore.
It's a shift of four.
from less machine learning and more engineering and more product.
Chip Huyen is a computer scientist and writer and author of the book,
AI Engineering.
This book is currently the most red title on the O'Reilly platform.
Previously, Chip was a research at Netflix,
a core developer of Nemo,
Nvidia's Gen.
Gen. Snarkle AI and founded and sold an AI startup called ClayBud AI.
She taught machine learning system design at Stanford,
and her current book is the second one on ML& AI Engineering.
It's safe to say she's one of the most read ML Engineering and AI engineering experts in the world.
In our conversation today, we cover what is AI engineering and why does it feel a lot more full-stack than ML engineering did?
What are typical steps to build an AI application from choosing a model through using RAG all the way to fine-tuning?
What are practical ways for software engineers to get started building AI applications?
And a lot more on this very timely topic.
If you enjoy the show, please subscribe to the podcast on any platform and on YouTube.
Thank you. This greatly helps the show get to even more listeners and viewers.
Chip, welcome to the podcast.
Hey, hi, I'm Chip. I'm very excited to be here.
So I've been following your substack for a while.
So it's like really, I was really looking forward for the chat, to the chat.
So first of all, I really wanted to congratulate you on this book.
I've started to read the book.
So I've not read the whole thing.
I have started with some chapters and I went deeper and others.
And what I found is when I looked at a table of contents, I was like, well, you know, this, this looks okay.
like in terms of it'll span a broad sense because it goes from how do you understand foundation models,
how do you evaluate them, what about what is prompt engineering, and you go into things like
a fine-tune rack, fine-tuning dataset engineering.
But then on each of those sections, it just starts to get like, you know, initially there's an
introduction, but it starts to then go deeper.
So for example, for evaluation mythology, like here, I'm just looking at a table of contents,
but I start to read this.
It's like, well, we know it's important to evaluate AI models.
We know it's harder to do.
But then you go into things like AI as a judge or ranking models with comparative evaluation and challenges of it.
And then it goes, that's where some parts in my sense is that I had to slow down.
I had to like look things up.
So it does go really deep into a lot of these sections, which I found very refreshing that it's got a mix of breath, but it also goes deep.
So this is definitely not a fast read for me, but it's one of those things where I'm just going to keep coming back to it.
Thank you.
It was not a fast write either.
It took quite a while and a lot of references.
I think I'm so published.
So I think in the book I cited about over a thousand references.
So like I actually read even more like papers.
I've gone into a lot of good basis.
I have a tracking of like a thousand's like repose or like at least like now 800 stars now.
GitHub stores. So eventually those codebases and a lot of blog posts, other books from the 80s,
from the 70s, 90s, you understand AI. So I also publishes like at least about like approximately
100 links, reference links. I felt like really, really useful for me in the process of writing the
book. So if you just want to like look at those references on the own, like it's on my GitHub.
Yeah. And I was surprised with some of the really original research. Like it's not just like,
oh, here's, I wrote about these papers or here was what I read.
But as you mentioned, the 1,000 repos, you actually have a paragraph or a section about
how many, how did GitHub repositories change over time, the ones that were about
infrastructure, AI, application level, other things.
And you actually have more than 900 repos mapped out.
I never saw anything like that.
And clearly that was you, you know, like kind of slicing and dicing and doing your own kind
of research.
Yeah, I feel like I have this thing.
Like, I do a lot of manual labor.
I do think, I get a lot of value out of doing things and non-optimally.
I feel like a lot of focus on like, yeah, what is the quickest way to do it?
What is the finest way to do it?
But sometimes if you're willing to put into effort into things that, like, a lot of people
are not willing to, I feel like you can get some guys inside that other people like don't get.
This episode is brought to you by Swarmia, the engineering intelligence platform
from modern software organizations.
Swarmia gives everyone in your organization the visibility and tools they need to get better
and getting better.
Engineering leaders use Swarmia to balance the investment between different types of work,
stay on top of cross-team initiatives, and automate the creation of cost capitalization reports.
Enduring managers and team leads get access to a powerful combination of research-backed engineering metrics
and developer experience surveys to identify and eliminate process bottlenecks.
Software engineers speed up their daily workflows of Swarmia's two-way slack notifications,
working agreements, and team-focused insights.
You can learn more about how some of the work.
world's best software organizations, including Miro, Docker, and WebFlow, use Swarmia to
bit better software faster at Swarmia.com slash pragmatic. That is S-W-A-R-M-I-A-A-com
slash pragmatic. This episode is brought to you by Graphite, the developer productivity
platform that helps developers create, review, and merge smaller code changes, stay unblocked,
and ship faster. Code review is a huge time sync for engineering teams. Most developers spend
about a day per week or more reviewing code or blocked waiting for a review.
It doesn't have to be this way.
Graphite brings stack pull requests,
the workflow at the heart of the best in class internal code review tools at companies like meta and Google,
to every software company on GitHub.
Graphite also leverages high signal code-based aware AI to give developers immediate actionable feedback on their poll requests,
allowing teams to cut down on review cycles.
Tens of thousands of developers at top companies like Asana, Ramp, Tecton, and Versel,
rely on graphite every day. Start stacking with graphite today for free and reduce your time to merge
from days to hours. Get started at gt.gathe.com slash pragmatic. That is G4 Graphite, T4technology.com
So one thing that is a little interesting about this book, it is about AI engineering. And
this field moves so quickly, you know, like just in a week. We've now had a new model come out,
for example, deep seek that people are talking about in a few weeks. How did you,
write this book, how are you able to write a book about such a fast-moving industry so about the time it's
released, which was clearly a few months after you finish it, it will still be relevant?
That is a great question.
So when I started reading the book, I was thinking the same.
I was like, is now the right time to write it because, like, there's so many things are still changing.
But then I start, like, when chat really came out, like a lot of people, I had this, like,
existential crisis.
I was in a group chat and I was like, oh, no, what does it mean for us, like, engineer?
I feel like there are two things that I usually identify with, like one is being an engineer and the other is being a writer.
And guess what you use cases that AI is really good at, like, writing code and writing, right?
So I was like, oh, shoot, what does it mean?
So I started to interview a lot of people.
I started reading so much.
And I talked to a tongue of people.
And I started making a lot of note.
And as a process, what I realized is that a lot of those things that seem new,
A lot of the fundamentals have been there for a while.
So, so, so, for example, like, language modeling is not a new task.
Like, Claus Shannon, like, introduced that back in the 1950s.
Or, like, as a time we'll talk about rack, right?
Like, rack is actually not, not new.
It's based on retrieval.
Rage is in retrieval augmented generation, right?
Yeah.
And retrieval is, like, a very old technology.
Like, it's powering, like, it's already powering a lot of, like, use cases on the internet.
like search or recommended systems and a better databases have been around for a while and vector
search is very like have so many cool algorithms already.
So I thought it's like, okay, first I see another thing that I knew.
And the second is that like I try to focus on like when I try to focus on like asking the
question like okay, so sometimes it was like oh, there's a problem with this, there's a solution to
this problem.
And I asked a question of whether this is this a due to fundamental limitations of a,
AI or it's just due to the temporary capabilities of AI.
And I try to see like, okay, if it's due to something of like,
of the recent current capabilities, how fast is that could be changing, right?
So, for example, in the early days, a lot of people have shared a lot of like,
prom tips.
Like, for example, you can, you try to price and models, right?
Like, hey, say, like, if you answer this correctly, I'm going to give you like $200.
And like we talked about prompt robustness, like, how.
how robust is a model to like prompt perturbations.
And then I was reading about it, and I found it's like, actually, models is getting more and more
robust to prompts.
For example, like we saw of like from the GPD3.5 compared to GPD3.
It's already like so much more robust.
Like that means that's a small changes to prompt actually reduce to a lot less variations
in model performance.
So like this kind of thing I felt like, hmm, it's probably not going to like stick around.
So so it's just kind of tips I'm not going to be very relevant.
So yeah.
So already when like as people were still like this was at the height of people are saying there might be a job as a prompt engineer.
So you already saw this is likely trending down.
Do I understand it correctly?
So yeah.
So I do things that's like just try.
So writing I think it's like a look score is like making a bet.
Like when you write a topic, you try to bet on whether it's going to stay relevant in the future.
So I think it's the people have been trying to, like, looking at, like, progress and trying
to, like, see what is going to be in, like, one or two years from now on.
So, for so, another example is, like, context length.
So as a point, it was like, okay, we want long context length.
But then I kept seeing people who are going really, really fast, right?
From my, what, like, 8,000 context length to, like, 128, like, in a few months,
it's, like, super, super fast.
So it was like, okay, maybe the question is, like, less about, like, context length,
but, like, context efficiency.
because it's like, can a model use a context efficiency, like, really, really well.
So, like, there are certain, so, so, like, those are, like, cut the bed, and I do think that
there are certain changes during the process of writing the books that made me feel more
confident in the best I made.
Or, like, another thing, like, multi-modelality, I do think it's, like, when I wrote about
mantimodel back in 23, people told me that's, like, you're too early.
Everyone was still working on language now.
We're not there yet.
But it was just like, yeah, it's just like inevitable, right?
Like I think like we learn how to work with language.
But like I do want to do a lot of stuff with more than just language.
And I think it's just like, and nowadays it's just everywhere, like almost own models.
Nowadays I like multi-model.
So the title of the book is AI engineering.
And we also now have this term of AI engineer.
It's a spreading like wildfire.
What, how would you define AI engineering?
engineer or AI engineering, because I feel it's a little bit of a loaded term these days.
It is.
I feel like a lot of terms nowadays loaded.
Like you're not allowed to use Asian anymore.
You're not allowed to use like a lot of things anymore.
So when I was, I was, like, I was agonizing over the title for the book.
Because people was like, first, I know that we need a different term for machine engineering.
And the reason is that why, why, when, when, when, with foundation,
models. There are a lot of fundamentals or like systematic approaches are still the same at
machine engineering. There are a lot of new things. So for example, one thing is that before, when
you wanted to build a machine application, you had to build old models. Just to be clear,
you're comparing it with machine learning, right? So like what machine learning engineers did versus
what AI engineers are doing. Yeah. So, so yeah, just trying to explain why we need a different term
for machine engineering engineers to describe what we are doing today.
So, yeah, so before, when you wanted to build a machine link applications, you need to build
their own models.
So that means that you need our own data and you need, like, expertise in, like, how to
train a babysit our model.
However, nowadays, if you want to build, like, an application leveraging machine link or AI,
you can just, like, send a direct API calls and access to this, like, wonderful capability.
So that's, like, really, really lowers the entry barrier to people.
Like, you don't need data anymore.
You don't need, like, a fancy AI degree anymore.
A second thing is that, like, before, right, like, you need distributions because you deploy
application as part of the existing applications.
So, first of all, if you build a recommender system, you need, like, an e-commerce website
so that you can have, like, recommend, or like, some kind of website, right?
So it's deployed as part-a-city application.
A fire detection is part of, like, maybe in a banking application or, like, some kind of,
like payment app. But however, now you can just put it out,
and then a lot of applications. You don't need an existing distribution channel,
but having a distribution channel is like really, really useful.
So I think that another very big thing is that is a shift of focus
from less machine learning and more engineering and more product.
So before, right, like you start.
from, if you could machine engineer, you start from data. Now, you have to gather data,
you maybe have human annotations, and then you train a model. And now once a model, it's good,
you deploy that into your product. But nowadays, like, you actually start with a bit of demo,
right? So we have a cool idea. So, let's just try it out and see if it works. So you start
with a product. And after you say, okay, it's like, it works pretty well. I want to make it, like,
better, right? So they started gathering more data, maybe like with, like, as more of, like,
prom examples or like in very very rare case I don't recommend most people do it in the early days
it's like fine tuning but but it's very rare but basically what it's like starting to think more
data maybe for evaluation it's extremely important like having good evaluations become even way way
more important with engineering so okay so you got data and then after that it was like maybe
have been like sending a lot of API calls like open AI and anthropic or like google and but like okay
now it's too expensive now I need to like my own model so you start like hosting
your own model, using some open source alternative, or like, 500 model.
So, so, yeah, so like be focused machine engineer, right?
You go from a data, model to product, and now with engineering, you go from product
to data and to model.
So it plays a lot more focus on product and data, which wouldn't be a competitive advantage
when everyone shares a kind of similar base-like AI capabilities.
So I do think that says need a different term, just separate it from machine and engineer.
And I didn't know, like, what terms you use.
And then I was like, okay, let's just ask the people.
So I surveyed like a bunch of people with how it was doing,
but does this building applications on top foundation models.
And almost it was like, air engineering.
And I was like, okay, it's just, let's go with AI engineering.
So do I understand correctly that, you know,
the biggest difference is that machine learning engineers
did a lot more kind of groundwork, getting the data,
building the model, whereas AI engineering,
you have a lot of that, at least initially,
you can start either as APIs or something.
So there's more of engineering.
You kind of hack things together.
You put it together.
And then over time, as things become more serious or your product is bigger, you do a lot more
of the, you might build your own model or you might host it.
You might one day even build your own model if it's there.
But it's just a lot later.
So a lot of ML engineering comes down the path if it's big enough, if it works, et cetera.
Whereas with ML engineering, it was the other way.
You had to put in all this effort.
and then see if it even works?
So, first, I feel like every company may have different definition of the role.
I think it's like even the same company, right,
like people get the same title, same role can do very different things.
So it's not very, never like a clear-cut definition.
Second thing is that I don't think the question is like machine or engineering.
In the vast majority of JNIF AI system, I have seen,
there are very strong traditional machine learning or classifier component.
So, like, imagine you're building a customer support chatbot, which is, like, whenever
I ask at a conference, I see, like, a lot of raise your hand.
It's a very, very classic generic applications.
And so, like, yeah, so you get a request from a customer, and maybe, like, you have, like,
several different potential solution for it, right?
Like, maybe if it's an easy query, you might send it to a cheap model.
Or if it's like a harder query, you might send it to more expensive model.
But it makes something very sensitive, like, hey, why did you charge me twice for the bill last month?
Right? Then you might want to send it to human operator.
So you might have this on like a router or like an intent classifier.
I like to choose what you send at you, which is like a traditional, classical machine learning model that you can build.
Or like after you get a response from an AI model, you might think it's like, okay, does this contain like PII?
So because I don't want to send back to users, like responses that contain, like, private information.
So that, like, PI detections can be, or, like, toxicity detections can be a classifier.
Or in the RAC system nowadays, like, we talk about how a lot of it using retrieval, like, retriever systems,
which also, like, I think it's like in the realm of, like, classifier, like classical machine learning that you can build yourself.
So what are the most common techniques used when building?
AI applications, things that, you know, the software engineer who's going into building
AI applications I should know about and later I can go deeper into.
So there's an assumption here that you have, you have like tried a lot of solutions and now
you try JDAI and you think this JETAI is a solution for you.
And it thinks like what is the first, like what should I progress from there, right?
Is that correct?
Yeah.
Yeah.
It's just like what are some common approaches, you know, like I think things like Rack, fine
tuning or other things that it's just good to know about.
I should probably learn more about those topics.
Yeah.
So I think those techniques are useful.
And I was using to recommend some development, like, I'm not recommend, but like a common pattern
have seen is a certain developmental path.
So initially the first thing I would say is like trying to understand like what means
a good response, what a good response is and what a bad response is.
So like what you want is a model to share it.
And it's not always intuitive, right?
So, for example, like, LinkedIn has a very great example.
So they built a candidate, like, job fit assessments for candidates.
And they found out that, like, a majority of their time spending just to understand, like, what candidates needed from the model.
So initially, they focused on the correct.
But then they realized it's, like, candidates, like, found it not helpful.
Like, say, say, like, if a candidate asks the model, like, am I a good fit for this job?
and the AI respond with your terrible fit.
It was like, okay, when am I doing this information, you know?
So you need to try to understand, like, okay, here is, they want more, like,
understanding what are the gaps and how they can fill the gaps or, like,
get suggestions for other rules that are better fit for them right now.
So, like, have this, like, a picture, like, clear understanding and then build a guideline.
Like, okay, given this response answer like this, like be helpful, show them, like the gaps,
or like show them the role, like, and I have very clear guide like for the model, like,
in the prompt. So, so you try those prompts. And then you see, look at the output,
and then maybe you can try to add more examples. And then just like go through, like,
get really good response. I try to evaluate, maybe like have a create a set of queries
and like experience responses using both automated metrics like AIS judge and also like
human evaluations to measure the progress.
So, okay, so you have done prompting.
We have, like, added more examples just as a prompt.
And maybe you will, like, start, like, make it more complex.
So you may use, like, you can give them the more or more context so they can answer better questions, right?
So maybe when our users ask the questions, you can, like, have the model pull out, like, all the documents or all the job listings.
Related to this question, like, information about the company, like, gets a candidate resume.
Right.
So you build a system, like, you augment the context with, um,
with the document.
So that's a rack pattern.
So I do think that's like rack is a very, very powerful pattern.
It's nothing really, really fancy.
And some interesting about rack is that a lot of people equate rack with vector search.
Right.
Like when I see a lot of people like, oh, I want to use rack and here's like what very database I should use.
Oh, so like people jump straight to vector search.
Well, yeah, because I mean, you do have the chunks, you know, like the embeddings can be.
stored as vectors, right?
So as engineers,
we're like,
I need a vector search database.
Yeah,
people love databases.
Yeah, very interesting.
But I feel like the first solution
is probably not like
jump search embedding base,
retrieval because you have to use.
Oh, interesting.
Yeah, you need to build like an embedding model.
Like the quality is that really highly dependent
on the quality of embeddings.
You have bad embeddings and obviously retrieve things as well.
Also, like, very databases can be quite expensive
to, like, to run.
And then add latency.
Also, another thing is, like,
vector databases can, like,
embeddings can also, like, obscure certain keywords.
For example, I'm searching for, like, a specific error code, right?
Like, through embedding, you don't really get the exact error code anymore.
So, so, like, so, like, there's sort of challenging with vendor databases.
And vector search.
So I think it's, like, the usual, like, common approach.
And, like, maybe just start with something as simple, like, keyword retrieval.
Like, if you ask, like, just extract all the keywords from, from the user query
and file all the documents with the queries.
And you say, okay, maybe the documents are true law.
And now I can't fit into context, right?
That's when you start with chunking.
Okay, now how do I chunk this, like, documents into context, like, they can fit into the context length?
And now with chunking have, like, other problems, right?
So, for example, like, you have maybe, like, the keyword, like, the document is about
the company X, right?
But, like, people say from now on, company X is refers to as company.
So like it's the rest of the document, you don't have X appear anywhere.
So you don't get the chunk.
Like if you search for X, you don't get the chunk below.
So now you might want you like, okay, now I need to extract the keywords from documents
and like get metadata to every chunk.
Or like get the title of the document.
Or like some people like started adding summary to it.
Or like Anthropic has a very, very good article called like contextual retrieval.
It's like as I asked Chachipi to generate the key.
information metadata about each document and apparel net to like prepare that to X chunk to
have you retrieve the right chunk.
So I do think that's like having data preparations, right, actually give a really huge performance
boost.
Like I have seen that's like giving way, way better performance boosts and I'm focusing on like
which better databases I should use, you know?
So what is a little bit of like, I'm not saying that they're not useful.
It's just like in the beginning, you probably want.
to try something simple, like with the biggest performance games.
And then you started moving up, like, the complexity level.
Also, like, for a lot of retrieval, someone told me very interesting,
a little bit of, like, hot-techie.
But he was just like, I'm not going to take any retrieval system seriously
if they don't benchmark against BM25.
So BM25 is a pretty old school, like 20-plus year now.
Retrival is a lot simpler, like term-based retrieval, not embedding.
And it's a really, really hard to beat retrieval systems.
And I think a lot of times we use it.
Like, if you start a complex, you can now combine, like, both term-based retrieval,
like simpler solutions with, like, very databases.
So you have both the semantic, like, on the editing side,
but you have both of the term, the exact keyword match, like, on the term base.
So a lot of things, the hybrid search is very common.
So, okay, so we talk about, like, prom-engineering, add more examples, rack, right?
And I think after that maybe having max out on a lot of those things, which usually take a while people.
Maybe then you might consider FITuning.
But I think that usually have a lot of reservations against FyTuning.
Because FyTuning has a whole host of new problems that they need to deal with.
Yeah, like FyTuning first, you need to think of like, now you have this model, you have FITTune and now you need to think of how to host it.
And like a lot of models are big.
Like they're building the parameters, right?
They are not easy to self.
I actually read this part in the book.
You actually go into the details of the problems with, you know, the memory size and
also like you cover the alternatives where you might not need as much memory,
but they also bring other tradeoffs.
So it's tradeoffs within tradeoffs within tradeoffs, right?
Like you're starting to solve one problem, but you're going to get a bunch of others
and you'll need to decide, you know, is it worth it?
My time effort, resources.
Yeah. Yeah. So I definitely think that also like when you fight a model, you kind of own the
model. And now, I mean, like, you, the question is like, how do you maintain it? Like,
anything, we have this whole world of very smart people, definitely new models. It's like,
just completely increased like rapidly. So when it happened, like, how long can the fighter model
like outperform, like those new models that were putting out? So you might spend a lot of energy
and effort into a fighter model. And then just a wicknitter, maybe like some, like, some, like,
I don't know, random Chinese comment you never heard of,
and release is like extremely fast and extra model, right?
So, so, so, so, so, so yeah, so so yeah, it's quite, uh, challenging.
So yeah, if you think it's the last resort, not, uh, not the first, uh, first life defense.
Yeah, but what, what I've heard is basically like, if I got it correctly, like do a
structured approach.
Like, you know, start with prompting, start with simple, start with you getting responses
that makes sense, then add more data.
You can do this with, you know, rag.
You can do it with chunking, keyword extraction.
Data preparation really, really makes a big difference,
which a lot of people don't think about.
And then you can go to more advanced things.
There's a whole host of things that you could do.
But my understanding is you're probably saying, like,
you'll probably get there over time.
But initially, these things will keep you busy.
And you'll probably be able to build a pretty good system just with the basics
and a little bit of engineering and most importantly,
understanding the problem that you're trying to solve as opposed to, you know,
building whatever shiny technology or approach.
Yeah.
I'm so like in the, um, I saw like the approach kind of so be different for like,
um, individual, um, developers or like enterprise.
Like if you look at the whole organization.
So, so one thing I haven't seen is like for, especially in the early day technologies,
usually like enabling, enabling new use cases.
I should bring more returns on like this like incremental.
improvement over existing use cases. So maybe if instead of like spending the effort into like
investing a lot of energy, getting the etching out like a little bit of performance with like fancier
like complexity, maybe like using the same suspect you have had and like opening up like new
applications. So yeah, so I see that's why I feel like a lot of companies will take a while to
get to the five tuning phase. Trust isn't just earned, it's demanded. Whether you're
startup founder navigating your first audit or seizing security professional skill in your
governance risk and compliance program, proving your commitment to security has never been more
critical or more complex.
That's where Vanta comes in.
Vanta can help you start or scale your security program by connecting with auditors and
experts to conduct your audit and set up your security program quickly.
Plus, with automation and AI throughout the platform, Vantage gives your time back so you
can focus on building your company.
Businesses use Vantage to establish trust by automating compliance,
needs across over 35 frameworks like SOC2 and ISO-27-001.
With Vanta, they centralize security workflows, complete questionnaires up to five times faster,
and proactively manage vendor risk.
Join over 9,000 global companies to manage risk and prove security in real time.
For a limited time, my listeners get $1,000 off Vanta at vanta.com slash pragmatic.
That is V-A-N-T-A dot com slash pragmatic for $1,000 off.
So let's say that at my company, we decide to build an AI solution.
And let's take the example that it's going to be customer service automation.
What are typical approaches that I should know about?
And you do cover some of these things in your book as well,
like some of the kind of more common steps that I'll need to take.
So customer support solutions.
So I would say it's like the first thing I would look into is just like,
what are the bottlenecks for the solution right now?
So, for example, I have worked with, another setup I work with,
it actually has this challenge,
oh, they had a lot of, like, customer support requests
and they don't know what to answer.
And the solution is very interesting.
It's just like, okay, let's try to drive a lot of questions
and choose a common channel, like public discord.
So that means that's like all the users can so help answer the questions.
So, like, it's much, and also, like, in the future,
if someone has a question, right?
They can just, like, refer to previous discussion
instead of, like, asking me.
So, like, they try to make a lot of the discussion.
and customer support request public.
So, like, another solution was pretty popular in, like, in 2018, 2019, is that
you do, like, routing to the right department.
So, so the challenge, then, it's, like, they realize the bottleneck is in, like, triaging.
So, like, for someone, we get a request, I don't know which department you send
it to show.
So, like, well, a bunch of startups then is just like, okay, let's try to build a system,
like, to predict, like, is this just go to, like, the finance department?
I should go to like a technical support department, you know, like just a Skype routing,
already reduced like the version a lot.
And if you think it's like, okay, when we need like Gen AI to do it.
So I think it really recommend the frameworks that Microsoft introduced.
The code is like cron work run, like going from like a slightly lower stick to like higher stick deployment.
So first of the customer support chatbot, right?
So maybe initially you can have a human user loop.
So for example, you can have like for every request instead of like a human agent
right into the response from scratch, you can have like AI suggest like a few options.
And like the human can choose one or like can just like choose one as a starting point
and then make short a quick edit and send it.
So like once you see this like okay maybe the acceptance rate is like getting really high.
So first of all for this categories of queries, maybe.
accept and there's like 90%, right?
So you may feel more, more confident.
So, like, wrote it out.
So maybe it wrote it out should maybe a smaller user or you can run it out to even, like, internal use cases.
And then, so, like, you give it more automations, but, like, reduce the scope of, like, the scope of, like, of deployment.
And then after you're really happy with it, you can, like, run out to, like, more users.
So, yeah, so that's how it would go about something like customer support I thought.
Nice because like my what I was kind of expecting you might say is like oh you know like you know just build this like you know AI framework deploy it. See it. I feel that's what a lot of companies are doing by the way. A lot of things is like oh, Jen AI. Oh, let's let's get you know this model from chat. GBT or Antrophic. Let's try to, you know, put it there. Let's let's put it out there. But I really like how what you're saying sounds like it's not really like you know what you explain. It's not really specific to Gen A.I that you can went through like you know, look at the business problem. Look at the problems.
you have, look at the options, which include not just Gen AI, but more traditional machine learning,
as you say, classifier. And then you look at if, you know, these tools help your problem and then,
you'll make sure that it actually solves your problem when you roll it out. Don't just blindly
roll it out, which, I mean, all of this sounds to me, it's not really new, is it? It's not,
like, you could have said the same thing two or three years ago before Gen AI, except we would
have not had those Gen A.I. Tools to play with. Yeah. So, so I think actually before our chat,
actually look up one of those talks before and you have this talk about like things that haven't
changed, like things that feel very similar about engineering. And I definitely like,
dealing with a new technology is one of the things that never change. So I feel like every time
there's a new technology that comes out, I can hear like the collective side of like senior
engineers everywhere saying like not everything is a nail. Like like people just try to get
technology to work for everything. So, so yeah, so I do things that a very common, um,
challenge I see is that like people just like jump straight into it. Like you just want to use
Gen AI for when they don't need Gen AI. So I think this is there are two different,
there's two different headlines. So one headline is like I use Java AI, right? Or it's the
headline that I sold the problem. So if you want to focus on the first headlines and yeah,
you're Jad of AI, but if you want to sown the problem, then you need to understand like what,
what is a problem? Like yeah, like what other challenge is there? What are the roadblocks and
remove the roadblocks using the simplest of solutions, not the fans.
is one. Yeah, I feel that there's a bit of a really strong fear of missing out across most
tech companies that everyone knows this is such a transformative technology. It gives us so many
new capabilities that it will be important. I think everyone knows that their company will be
using it. But now there's a fear of missing out of, oh, what if what if my team doesn't build
it? What if someone else gets ahead of me? And so like a lot of companies, many teams are all
building it and you're trying to just, you know, like using a hammer looking for nails,
even though they might not need it at the time. I mean, I'm not sure if this is a bad thing
necessarily because, you know, people at least get experienced with it and, you know, they will
need to learn about it. But it's a very interesting time because it's rare to see,
usually we see like a new back and framework come out or like something that's limited to
domain and people jump on it. But this is what the first time I've seen that the whole
industry is jumping on it and everyone is trying to use it. And it. And it's,
put it in whether it works or not.
I definitely agree with you on this formal thing.
I do think that's like everyone jumping on it.
It's actually a pretty good thing.
I feel like the energy is incredible.
I have never before since like so many smart people focusing on the same problem.
It's like incredible and the progress is amazing.
I do think however, like I think this is an irony.
It's just like the more we want to not miss out things,
the more things we will miss.
Because why I think it's feel like if we try to keep up with news, right,
we should try to jump from one piece of news to another,
we will always stay at the surface level.
We never really go deep into anything.
So I actually don't quite read news.
So I find it's like a little bit distracting.
So I think like I try to like I feel like my approach is like,
okay, pick a problem that you care about.
And then only care about things.
It's like have you sold this problem.
So like if there's some news coming out,
I was just like, does this help me?
So on this problem?
If it doesn't, I kind of wait, right?
Because I feel like if there's something important,
it will still be important, like two weeks from now on, like a month from now on.
Like, I don't drop everything.
It's like, okay, let's just go and understand what it is.
Like, you know, so I feel like trying to get a more, try to stay a bit calmer.
Yeah.
Yeah.
So when you're building an AI system, one of the things that you will come across
is you need to evaluate the output, how well it works, does it solve your problem?
Why is it difficult to evaluate AI systems?
And what are common ways to do that?
So evaluation, I think, is like a billion dollar question, or even a trillion
dollar given how much people investing in the eye right now.
Yeah, no, it needs to go big.
And if it go big, better be go really big.
So, so, so I think this is challenging because the smarter AI becomes the harder
it is for us human to evaluate it. So before, right, like if AI was like incoherent,
you can pretty tell if the response is bad. It's like, okay, it doesn't sound good,
like it's a bad response. But nowadays, it's like it's pretty coherent, right? Like,
for example, if you ask, it should like jerry the summary of a book,
if the summary sounds convincing, you actually don't know if it's like a good summary or not,
and you might have to read the entire book yourself just to evaluate whether it's a good summary.
Sorry. Or like the math, a lot of times, I personally use AI to ask a lot of questions because I don't know the answer.
And because I don't know the answer, I don't know if the answer is correct, right?
So an example is like a lot of people can tell if a math solution to a first grade question is correct.
But a very few people can tell if it's like a fancy like equations like proof is like correct.
So I remember when 01 came out, like Terence Tao.
He's his amazing mathematicians.
I think it's one of the best mathematicians of our time.
He actually took time to evaluate O1.
And he says the experience of like using O1 is similar to advising a incompetent but not completely stupid.
Okay, like a media government, not completely incompetent, a BG student.
But this makes me think if we really need like the brightest minds today to evaluate AI,
then we're soon to run out like really, really smart people to like evaluate AI.
So I thought it's like so.
So what work could be like the next step forward?
So before a lot of time we use the whole human as a goal standard for AI performance.
It's like, okay, so humans like starting writing out like here is how you should respond to this
and here's how you should do it.
And like,
yeah,
I should try to copy human.
But now we,
for many,
many tasks,
like we have,
yeah,
like,
outperform like human
way better.
So,
so I think there's several,
so I thought about like
several approach
of like,
to deal with it.
And I think that's why
I separate the chapter,
like those,
so initially I had like one chapter
on evaluation,
but the more I write about it
and it was like,
should,
there's so much sure enough
when I two pretty long chapters
on evaluation.
The first chapter is on
general methodology.
And the second chapter is about how to use different techniques to evaluate the AI system.
So like, so like the methodology, like one is that functional correctness.
So it evaluates the output of application based on how well it performs as a task.
So like if you say that like, hey, use AI to save energy, you can see like how much energy is actually safe.
Or like hey, uses AI to play this video game, you can see how high is the score you can actually get.
Or a very common use case for this coding.
So it do things that's like it's not a code incident, it's a coding, it's the most popular
use cases because like we actually know how to evaluate jetted code.
Like we might not know how to evaluate jaded jettled and stuff, right?
But you know how to evaluate jaded code because we've been testing like code for like for a long
time.
So with code you can do like use functional correctness to evaluate like whether this code compile,
does it run?
like does it generate expected outputs? That's what we wanted to do. So like that one approach.
The second approach is like using AI to evaluate other AI. So we've been using AI to evaluate
a lot of applications. So can we also use AI to automate like evaluations? And actually
doing pretty well I think we have is like um I think like in many many even back in like
like 2023, LandChine has this report. They saw as the majority of applications they saw already
has some sort of like AIAS judge or like LMS a judge and I think it's like it's growing.
And we do things that's like it's getting pretty cost of fictions and like useful. But of course
I see a lot of like challenges around using an as a judge that we can go into later. But another
approach that was very interesting is like on a comparative evaluations. And the reason like as
humans, it might be hard for us to give an absolute score on something. But if we can give
like two versions of something, we can tell, oh, we'd like this on better. So we have done a lot
of studies showing that like even for, even for tasks where AI is like out of performing,
like doing at the level with like humans experts like can't really do, we can still tell, like,
detects the differences. So I think it's been like, I think this hasn't been a lot, it's
been like guiding, not just evaluations, but also a model development.
Sounds like there's just no simple answer, right?
Like you kind of need to go through all these options and figure out, like, in your case,
which makes sense for costs, for what you can do, can you have a human in the loop?
So there's no real silver bullet, no one thing that you can just use.
Yeah, I don't think there's a simple solution.
So that's one thing I'm a little bit skeptical about evaluation tooling,
because a lot of challenge with evaluations are not because we don't know how to evaluate,
but because it's required disciplines and hard work.
And a lot of things that's like tools can't really automate.
So for example, one thing is for evaluation.
We need to evaluate an application based on what users want.
And we don't, so that means that we need to like go and talk to users.
We need to look at their interactions.
Because a lot of things, what we think is like, we want to like evaluate what matters, right?
We have to measure what matters.
So for example, I have several examples of like how it's very counterintuitive thinking that we're measuring one thing,
but users actually care about other things.
So in the beginning, for example, like I have a friend who building those pretty big applications
just like basically building, like to summarize meetings.
And initially they were like, okay, we try to get a, we try to measure like correctness.
Like, does the summary cover the content of this meeting?
All I say, so like think, think about like, hey, do, does the model follow the format?
Because they think this site users want like shorter summary and they agonize over like,
do we want like three sentence summaries or like five sentence summaries.
And they try to measure though.
But actually, eventually what they found out,
that users don't really care about the whole content of the meeting.
People only want, like, what is the action item for me?
Like, what you had to do after this, right?
So they actually start changing, like, so they don't measure correctness anymore.
I mean, they still, like, don't make up things, right?
But they focus on, like, get, don't miss out on action items specific for the person asking
for the summary.
Yeah, so it's like, or like, or the examples, like some people using chatbot for, for,
So we talk about a customer support chatbot.
So we want to go back to that example.
So a pretty big tax firm.
So they built a chatbot.
So you know, tax software, you can pretty tell which company it is.
So they were just like launched a chatbot.
And to help people with tax preparations.
And the response is very lukewarm.
They were like, so they were measuring by like the users, how they use it.
And they was like, people just didn't really seem to use it.
And they were like, why is that?
Is that because it's like not, it's what he hallucinate?
Like, what is the challenge there?
So they try to have a measure all this kind of metrics, right?
But in the end, they found out just like,
user didn't use it because they hate typing.
People just like don't really like typing.
And also like because if you face with a domain,
a domain that you don't really know,
like I use the software because I don't know
a lot of things about tax.
I don't really know one question to ask.
Yeah.
Oh, so they didn't know what to type.
They didn't understand the domain.
You know, they went to the tax thing
because they want to take care of their own tax yeah so so i think i started like uh trying to like
understand more like what kind of questions like people would would ask and like suggest that in the
beginning and then it's just basically it's a guide users so it's kind of education like here's a question
you should ask and then here's as the answers keep going so so i think it's like um a lot of that
is it's just a lot of understanding your domain, the problem domain, like go talk to users,
looking at the data. I do still think that looking at data is very, very important. I think
Greg Brockman has a great quote about it. So you think that's like manual data inspections is one
of the activities that has a highest ratio of like values you practice. So that means that's like
people don't think highly of like manual data inspection of data-ranking.
Let's give you some interns you do.
Like, let me think with something fancy, like algorithms and stuff.
But actually, it's extremely high value.
Because by looking at data, you detect patterns.
You understand how users use our product.
So actually, like, I usually, a very good practice, I really highly recommend to teams.
It's like, don't forget human evaluations.
So you use ASA judge.
But AASA judge has a good practice.
have a lot of challenges because like the quality of the judge depend on the underlying model
and the prompt and also non-deterministic.
So things can change over time.
But like if you have some like immune evaluations, like very consistent, like very clear guidelines,
like every day, go in there, look at maybe like 50 samples of like actual interactions.
Or like if you have more resources, go at highest like 500 or 1000s, right?
So that you can like get some kind of like picture of the house or user.
first how the users are using their product.
Any changing behavior based on like current events?
Maybe because of reasons like administration change,
maybe people have a lot more questions about that topic, for example.
Or like, have you like correlate with like all the automated metrics.
For example, like if the AI judge scores somehow like start changing compared to the human's just score,
maybe is this something you need to investigate?
Yeah.
So I guess you cannot, like as you said, you can't,
skip hard work if you want to get good results.
And you can't really pull humans out of a loop fully, at least initially.
What are some common mistakes you've seen when teams are building AI applications?
I'm sure you've seen a lot.
Yeah, but I feel like I don't want to say in a way, say, oh, everyone is an idiot.
So, yeah, so one, I think we touched on several.
So one of the common mistake is like use Gen AI when you don't need Gen AI.
So first of this is a startup that came with me with a pitch.
It was like, oh, I'm going to use, we're going to use J-FAI to help people optimize electricity usage.
So when I ask, like, so that people can tell Chuck the chat to AI, like, hey, here's our, like, I live here and here are the activities that are doing the days that are very energy intensive, maybe in a charging your car or like doing laundry or something.
And the AI is going to tell you like, hey, you should do this activity, this time and this time
so that you can maximize, like, minimize the electricity bill.
And they were like, oh, our reasons show that, like, you can save you on average,
like, 30% of electricity bill.
And it was like, free money.
Why would anyone not want that?
And so I was asking them, it's like, what is the cost saving if I'm just, like, manually
schedule, like, the most intensive one during the off-pick hours?
And it's just like, you look like, okay, just charge the car at like, you know, 10 p.m. or something, you know.
They were like, we haven't done that yet, but we're going to try it and let you know.
And they never got back to me.
And they abandoned the idea later.
So I feel like a lot of those optimization problems can be sold like greedily, like maybe even like spreadsheet.
Without Gen.
Yeah, without Gen.
Yeah, without Gen.
Another spectrum is that like I see a lot of companies giving up on Gen.
because they think that Jenaa is not good for that problem
because they have tried it and it doesn't work.
And a lot of time I got surprised.
It was like, wait a second, I just talked to another company
who's just like they use that for the similar use case
and it worked really well.
And when we look into it, it usually because of like bad product.
Like because they don't promise well,
they don't understand the users.
They don't, yeah, they just like they don't even know what to evaluate well.
So for example, like I was working with this like company
that does basically like extracting resume information.
So that's a person like get a resume and they try to like map out like where does a person
work before and like create a summary of that person's life.
And they have a two steps.
Like first like from resumes, they try to extract all the text.
And then after all the extracted text, they extract the text, they extract the organizations from
the extracted text.
So by the way, the resume is a PDF, not not pure.
text, right? And then I asked them, so it was like, okay, it worked chairable. Like the, they never,
like, they got like the organization's like wrong about like 50% at the time. And then I was
asking them like, uh, when the process does this fail? Is that in the from the PDF to extract
text or from the extracted text to an organization extraction? And they were like, oh, we don't know.
We didn't do that. It was like, if you, if you can't pinpoint it, if you can localize,
where it fails, then how can you fix it? So, so a lot of time, it's a lot of time. It's a
like seems like common sense but somehow i don't know is this something that's always like puzzled me a little
um or like another is um another is just like statue complex for first of all like jump stretch your
databases or like a phy tuning um or another common one nowadays it's like when you see a fancy
agent framework you just like let's let's use this framework you know let's just try it and i think it's
Eventually, attractions are like really, really cool.
Like, I think I'm very grateful for many attractions that make my life easier.
But I do think that attraction should encode my best practices and should be heavily tested.
But I think we're still in the phase.
We're still learning, like, best practices.
And also, like, a lot of attractions can introduce, like, unnecessary, very, very painful books.
So when I was going through the code basis of a lot of those frameworks, and I found out something interesting, like,
A lot of those frameworks have some default prompts can help you get started, right?
Because it's like, it's made you very easy for you to like, to begin.
But then like every single those prompts and look like have some type typos.
And they're just like changed.
So you have somebody submit a quick PR to like fix the typos.
But it's not part of the release anything.
So if you're using like this framework using one of the deforms and then setting the performance
like applications that's like change, you actually don't quite know.
like why is that changing?
Because the problem was like changing under the feed.
So, so yeah, so like those, um, those, those are very interesting.
Like those are just like patterns.
But it's interesting because what you mentioned, it sounds like if I'm, you know,
collecting these, it's like using this technology when you don't really need it,
giving up on it without, you know, just for common sense reasons, you could have just
fixed some easy things.
using a new framework when it's just not really high quality or it's and you know it doesn't
really have the best practices this kind of all sounds stuff that we could just replace gen ai
with you know a new a new technology or a new a new stack and we'll probably hear similar
things right it's it's typically these things because it's it's just it's changing all the time
there's no best practices no one really knows how to use it there's you know whoever tells you
they're the expert, they're still just, you know, they have a maximum year of experience with it.
It's not really new, is it?
Yeah, I think I definitely agree with you.
I do things that, like, even though, like, technologies change over time, they're, like,
systematic thinking.
It's like systematic approaches to problems usually don't change.
Like, yeah, if you want to sort of problems, you first start by, like, breaking down the problems,
like, seeing where the challenges are and, like, go through different solutions.
you like do that it seems seem come on but I think like a lot of time a lot of us get
FOMO I think FOMO get in the way and was like okay we know there's the right thing to do
but I also feel like I just need to shake this thing out first you know and it keep doing that
like three times a day so day is gone and you just like never really get time to sit down
and I think really deeply about about what what that what is that you're trying to do
yeah so I guess we're going to see a lot of the kind of mistakes that are with news
technology happen. Plus, if someone, you know, if some of the listeners have adopted new technology,
you can probably use some of that approaches. I mean, you know, just localize it for, for Gen A.I. And,
you know, see if you can avoid some of those. Yeah. Speaking of, speaking of new technology,
as someone who is learning Gen A.I, a software engineer who wants to get into AI engineering,
what would your recommendation be to learn? You know, things do change so fast. You did mention
the importance of fundamentals. What would you focus on?
So, I have a lot of thoughts on learning because I like learning a lot and I think over time
I just like observe some patterns and like by the way like the way at learning might not
be the same as a way you're learning. People have different learning style. But in general I think
it's like I think of learning has like two different approach. One is like project-based learning
and the other is a structure learning.
Project-based learning is like, okay, you choose a project
and you work on it really, like, go and try and solve every problem in that project, right, and finish it.
But structure-based learning is more like when you take a course or you read a book.
It's just like somebody else laid out, like, here's the things that you want to do.
And I think there's quite a bit of a debate on, like, somebody told me recently that's, like, a friend, a very good friend.
And he said, like, oh, you think it's a problem nowadays with people who want you to be.
engineering is that they spend too much time learning and not enough time doing.
And he was just like, just forget on the courses, forget all the books, just pick a project
and just work on it.
And I do think this project-based learning is very valuable, very valuable.
But you think of like here is a set of the skills and knowledge I want to do, right?
I need to have to become like really good at something.
Project-based learning can have you hit a lot of this point.
But it doesn't always have you hit like on the points.
and you can get like sometimes get the confusions.
Whereas, sometimes you still need to complement with structure learning.
And another thing that's structured on project-based learning,
that a lot of people like usually do it follow some tutorials.
Now people like here, someone has this pretty tall how to do this.
And I think tutorials is really cool.
And I think like I am so personally doing it a lot.
But I also notice that it's very easy to just like mindlessly
clicking one cell to another and just run the cell run another.
And don't really stop to ask.
like why is this being this way?
Like, why is this library being imported?
Like, why is this code written this way?
Why is the batch size is 16 instead of like 64?
Like, why is it like, okay, export, the measure of export.
So some people, it's very easy to not stop.
Like, there's no, like, mechanism to force you to stop.
You just want to run to the end and see what the output out and make some changes,
like by the best guesses.
It's something funny.
Like, when I was, like, looking at this, like, open source project,
and it was like, it wanted to do.
to do a market research and see who is using this framework.
So it was a framework with IBIS.
And I thought it's like, I knew that if you need to use this framework, you need to do
import IBIS.
So I went through on a GitHub and I searched for all the repos that have the line like import
ibis.
And then I found a lot of repos.
Then I went to it.
It has an import IBIS, but then it doesn't like, the code base does not have IBIS
anywhere else.
It's not used at all.
I was like, what is happening?
So I realized that a lot of those repos, like copy from a tutorial.
And that tutorial used import IBIS.
And then, and that's my mistake.
And then, like, everyone else.
So maybe the original developer, like, import IBIS and then deleted the code because
I didn't use it anymore.
And then, like, everyone could copy.
It's just like, it's the same thing.
So I feel like that is something a little bit dangerous.
Like tutorial-based learning is great, but I do think it's very important to be able to
stop and ask question.
and sometimes structure learning can help you, like, ask the right questions.
Like, think things through.
So, yeah, so I think like for you were starting, I would recommend maybe a mixture.
Like, yes, choose some project you want to work on.
It doesn't have to be like big, fancy project.
Like, just try to, like, pick one.
And then at the same time, complement it with, like, some structure learning.
Like, pick whatever, like, maybe a book or doing a course with a friend.
read paper. I think read paper is a bit, a bit interesting because read paper, reading paper,
is a skill. It can be quite time-consuming and you need to know what you want to get out of it.
But yeah, so like start a project, complement with like structure learning. At the same time,
there's an exercise that I felt very, very useful, at least for me initially, is that like for a week,
I try to observe like what I do, like try to make note of like what I do and try to make note of what I do.
and try to think of like
what percentage of that
can be automated by AI?
Like what could be done by AI?
Yeah.
And then I try to use AI to do those.
And this just gave me a lot of ideas
on the use cases.
Like just think about it what matters to me
and it would be an application
that concerns the problem.
It's just great already.
Yeah, that's,
I think it's an unconventional way,
but it's a good way to look at it.
Because in the end,
also it kind of, I think it might help you get ahead of
you know, this like dread
of like what would AI do for me?
Because you realize what happens when you automate things,
which actually leads to my next question.
There's a lot of fear mongering around.
Oh, AI will mean the end of software engineering
because AI is very good at coding.
It's a lot better than a lot of other areas.
What is your take on?
Will, as AI gets better,
will it actually end software engineering or it will change it
or it's not going to much change actually?
I think it goes back to the question of like,
what software engineering is.
So maybe you can get an analogy.
Maybe you can help it explain it better.
So the writing, right?
So we tend to confuse the most salient activity of something as the job itself.
So first of all, writing.
Writing in the past, writing means a physical act of, like, putting some, like, words onto paper.
Yeah, yeah, on a paper or...
And back then, right, like, it was...
People think of writing as that.
People actually took pride in that calligraphy, like, oh, have beautiful handwriting.
You must be smart.
You must be intelligent, right?
But then we had computers.
And now writing doesn't refer to that act anymore.
Writing refers to the process of arranging ideas into a readable format.
And I think the same thing as coding.
So now that people think of like software engineering,
but I think of like, it's like, it's a physical act of like putting code on like, I don't
know, like a VS code or like VIM or whatever software that you use.
But that's not what some engineering is.
engineering is about like solving problems.
Like here's the problems.
How do I like come up with executable programs to solve this problem?
Coding itself is just like a physical act of it.
And I do things that's like, yes, maybe AI can have your automate coding.
But I don't think it's going to fully like automate like problem solving because you
still need to know what problems is.
And only you can understand like what problems you're facing.
Well, and also, you know, AI has the problem like coding.
really, our software engineering really is like, yes, you need to solve problems.
But what I don't think we say is you need to do that very precisely.
The reason the job software engineer or programmer exists is because it is very hard to be
specific to speed the computer's language.
Because, you know, if you move that if statement somewhere else or if you change a variable,
suddenly, you know, the program crashes because now you have a stack overflow exception,
which you, of course, understand if you're a software engineer, but if this is just a
business user who says, I want to show, you know, if you resize the window, I want the button
to move over, it's easy to say, but then as a software engineer, you know the edge cases, you know
the environment, you know what needed to worry about, you need to worry about system events,
etc. And then you'd write code for all of those things. And I'm sure we'll get to a point where the
AI will be able to generate some of that, but it might not. And you will still at some point need
someone who understands, you know, that code and can figure out where the gap is.
Because English as a language is not as precise as a programming language.
You know, programming languages were invented to be very precise and unambiguous and very easy.
You know, you can go from assembly code to the programming language because it's a one-on-one
mapping or and then from English to a programming language, that's very fuzzy, right?
Yeah.
Yeah, definitely.
So I think that profession will not go away.
Like for users, it might work for some kind of.
have more obvious use cases of like you say something and you get something roughly that's it and
you try multiple times and it generates something else and you're happy what for you know a business
or a professional use case you will need those people who will be able to guarantee that you get
exactly what you want yeah i'm actually really excited about like a i can automate like part of
coding uh because like it's actually enable software engineers should do software much more complex
so like i go by to the analogy of writing like before when it has a lot of you know that's
to do the manual, like, copying words onto papers, like, all the books back then
were very small.
Like, I think it's like, like, 5,000 words or 10,000 words considered, like, big because,
like, it took a long time for people to, like, copying things.
But now, like, we have books, like, 100,000 words, right?
And I think it just makes things a lot easier.
I do things like with software engineering.
Like, if you don't have to manual, write, go to quickly, like, turn ideas into, like,
snippets, acceptable programs.
I do think it enable, like, much, much more complex software.
Yeah, and maybe one software engineer will be able to kind of command a lot more software, debug, or maintain a lot more complex system by oneself.
Because right now, you know, there's a reason that, you know, for a million lines of code, usually there's like several engineers.
It's rare to have just one engineer.
If a company broke that, I'm not talking about dependencies.
So that will be interesting.
And so what other use cases are you excited about for that AI could bring outside of just coding?
Let's see. I think I'm excited about education. So I do things that's like AI can help people learn. We want you to learn a lot faster.
So I think one thing I realize is that like nowadays, if you know the answers, if you know the questions, finding the answer is actually quite easy.
like you can ask like AI and it's usually give you like pretty a lot of like at least it can give you a lot of like references for you to like go and read more about it.
But then like what's still hard is like how should come up with the right questions.
And I think it's like education needs to like focus on like forcing students like create the habit of like asking questions and understanding like so I do things like it's help learning to be.
come a lot more efficient.
Like, people can learn a lot of things.
Yeah, and I feel like I do believe that if we are can learn better and faster,
then we can actually do more things.
So I'm very excited about what that would look like.
What other use cases I'm excited about.
I'm excited about entertainment.
Yeah, I think it's like.
I think it's like sometimes we always think of like entertainment and education as separate things.
But I don't see why we can't have like games that help we learn things more.
Like we have some strategy games like teaching girls about negotiation.
That could be like really fun.
Yeah.
So or something like more intellectually stimulating content.
You know like movies or shows or show.
don't have to be like about like my less.
But I know other people like watch that because it's helped them escape.
But I also think like like like genre as like make me think a little, you know, like for
or like understand more about like different fields.
So I do think that now I can access us in like creating content that is both entertaining
and intellectually stimulating.
Yeah.
So I think it could be like a lot of fun.
For some of simple, it's like, nowadays we have a lot of medium adaptations.
So you have a book, you can convert it into a movie, and sometimes we have a movie, you
convert it into a game.
Or, like, you have papers, convert into a podcast.
So now if we have some content, AI can help us, like, convert, adapt different medium.
That could be, like, very, very exciting.
Yeah, I think it's like, it's like, it's like, it's.
there are a lot of small problems that I'm so interested in.
I think it's like I haven't touched on any of the enterprise.
I feel like that's where most money is still.
I do think I say a lot of like, I think like enterprise or company organization structure
is going to change.
So first of what does that mean?
If you think about like a lot of organizations, what is the job as a middle management?
It's like to, like, first, like, like, aggregating information from their reports and transmitted, like, up, uh, up to the executives.
And, like, and this is the other way in the transport, like, transmitting information, like, directions from executive to, like, lower layers.
But, like, information aggregations, actually, like, really, really.
And something AI can be, like, really, really well.
So, so I do think it's, like, companies can be a lot more, um, more efficient.
So let's close up with some rapid questions.
So I'll just shoot some questions and you tell me what pops to your mind.
What programming language did you use most when you built AI applications or did ML engineering?
And why?
Python and JavaScript.
You do JavaScript as well.
How come?
Oh, yeah.
Definitely.
I think it's like a huge part of like building products.
I have to like build them more quickly.
So I think that's very, very handy.
I'm not very good.
Like I think I've always been scared of JavaScript.
script, but I'm very grateful, like, AI actually help me, like, getting started a lot easier
nowadays.
And which one is your favorite AdLM model right now and why?
Oof, I don't really have a favorite.
I use it for different things.
So I still use chatypity out of habit because I have a bunch of prompts that I use.
I already, like, have little things, like, rebuilds a prompt for chatypity I'm still using.
I use clots sometimes for, like, creative writing.
because I think it's less, sometimes less cliche.
I'm reading on Dipsick R1 and who's not reading about A1,
so I just trying it up.
I don't think I have, like, a favorite.
I think I did some of the Lama, like, lava,
like, which is the vision's version of, like, a Lama.
Before for, like, some kind of interesting use case,
like, from, like, a screenshot to code, like, just trying to test us out.
Having fun with it.
but yeah I'm not emotionally attached to any of them
yeah and what's a neat AI tool that you've used and that you like
um so I have something that I built that's really helped me good research
so so one thing is like when when I saw a link like a papers
um there's a lot of links and I try to get a quick so I realized when I read a
link I do through the same same process like I read the abstract I look up the
the authors to do the read of work
ask questions.
And so check my, when I was out, check the citation.
So I have the little tool.
So it's like just like go and scan.
So I have a link.
You just give me all the information on, yeah, like.
Oh, nice.
So she's got a tool to scratch your own edge.
Yeah.
Yeah.
So it's like, I think it's just a beauty of AI now.
You can build like, it's taking like very small amount of time.
Like you just build a tool like just for you.
Like before, right, it would take me like weeks.
But now I can just like do all of that.
Like, I don't know.
This is something to be excited about.
I agree with it.
Yeah.
What are what are one or two books that you've read and would recommend?
Oh, I recommend a lot of books.
But I felt like it's recommending books when we feel like forcing people to do what you enjoy.
So I like books that's like help me get a news perspective or like get, give me like, give me like,
inside and choose some topics I don't know a lot about. So I really like the book like,
first of all, like complex adaptive systems. It's a very interesting book about a system thinking
like how to design like yeah, like how should do how to design like social dynamics so that
to get people to work toward like the goals that you want to work on. It's a very interesting book
have forced you to think about systems. I like the book, a selfish gene because you know,
stand more like about free will and makes you like question a little stuff but it's the idea of like
you can live on either through like genes or through ideas like they're two ways that you can live
all it's like yeah so genes could need to live on with their offspring and like reproductions
the other thing is like if you have ideas and the ideas can also like replicate and like the idea of
memes like jeans and memes um I like the book uh antifragile so so yeah so the ideas thing
I think the author is a very interesting character.
I just read several of his books.
I really like them.
Yeah, I like that book.
Yeah, I think there are a lot of books that I like.
No, thank you for the recommendations.
So thank you for being on the podcast.
I mean, AI enduring is such a new field,
and it was great to hear from yourself,
who has clearly gone very broad and also very deep
and has been in this field even before I was called AI.
engineering. So thank you for this. Thank you so much for letting me ramble on on the show.
Yeah, I really appreciate it. And I think that I think one thing for me is I really enjoy
from writing or talking is that I get feedback because somebody was like, oh, I think I'm less
interested like I agree with you. I mean, it's great to hear, but it can also like a little bit
like not good for the ego. But I'm so really like maybe a little bit pushback like, okay,
you didn't think about this, you forgot this, or I didn't take into account this one.
So I would love to get those feedback.
So if there are anything that you felt like I miss out on, like, do let me know.
So I really appreciate it.
Thank you to Chip for this conversation about AI engineering.
To get in touch with Chip, including to give feedback on her book,
you can find her contact details on her website, linked in the show notes below.
I've also found her book, AI Engineering, to be a broad and deep overview of this important
field.
It's a book that focuses on the fundamentals that are unlikely to change.
So if you want to learn these, it's a good book to have.
In the Pragmatic Engineering, we've previously done several AI and ML-related deep dives.
Check them out, also linked in the show notes below.
If you enjoy this podcast, please do subscribe on your favorite podcast platform and on YouTube.
Thanks, and see you in the next one.
