Software Misadventures - Become a LLM-ready Engineer | Maxime Beauchemin (Airflow, Preset)

Starting point is 00:00:00 Being a SQL monkey is probably not going to cut it anymore when AI is a better SQL monkey than we are. The thing it's lacking is the executive skill and the memory, the long-term memory and the business context that are, for now, private from the LLM and need to be squeezed into a context window for it to make sense and be useful. It's been kind of a learning journey because at first I was just like trying things and it just like doesn't work. And I was like, man, this LLM thing, it's all hype, like shit doesn't work. And then it took me a while to realize,

Starting point is 00:00:34 he's like, okay, I'm actually just really bad at prompting. It's kind of like Googling back in the days, right? Like if you don't do the right keywords, the result is not like super great. So for you, right right you're doing like eight to ten pounds every day like did you see that gradual like improvement in terms of results for yourself and like how do how do i get better at this like i do want to get better on lang chain i think it's really interesting because like when i found it i had the same thing like i don't

Starting point is 00:01:04 understand why this exists not because i i just didn't understand the problem space then i got familiar with the problem space i was like oh yeah this is like everything i need this is super great but then i started to try to use it and then i was like oh this is it does like kind of what i wanted to do but then it to do, but not exactly. And then I cannot use the methods that are here exactly in the way I want to use them. Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies, but the people and

Starting point is 00:01:45 the stories behind them. So on this show, we try to scratch our own itch by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned, and of course, the misadventures along the way. Welcome to the show, Max. Super excited to have you here. Well, excited to be on the show too and excited to catch up with the episodes you have so far too so make sure they catch up on that thank you excellent okay so just getting right into it so at the beginning of 2017 you wrote this post called the rise of the data engineer which both helped define the role as well as bring more attention to it. So I was a data engineer when that came out. I was like, oh my gosh, this is what it's all about.

Starting point is 00:02:30 But then at the end of the same year, you write a sequel to this called The Downfall of the Data Engineer, which summed up pretty much all my struggles also as a data engineer at the time. So I guess what led you to kind of write a sequel? Well, it's like what led me to write the original to, so I think like trying to get back in the context at the time. So when I left Facebook to join Airbnb, so that thing that was in 2014, internally, they were still calling themselves, I think ETLL people, like ETL engineer and business intelligence engineer. And I was like, coming out of Facebook, I think we had started calling the team, the data engineering team. And to me, I changed the way I came out of Facebook after two years. They're just thinking definitely about my role and about the industry and who I wanted to be and what I wanted to do.

Starting point is 00:03:22 And I wanted to make a chasm with the past, like just basically, I don't want to use these GUI tools anymore. I want to do, you know, pipelines as code, even, you know, just move away from the GUIs be more, you know, bring some of the concept of software engineering into data processing, data engineering, serving people with data in general. And, and I think I wanted to make take a strong stance for that way internally at Airbnb, even saying like, Oh, we should if we do job postings externally to go and hire people, we should, you know, put a job posting saying data engineer. But then people are like, what is a data engineer? You know, what does that mean? So I think I decided to, to write the blog post. I think I had read maybe, and we should dig out that post,

Starting point is 00:04:09 but I think there was a similar post called The Rise of the Data Scientist coming out of someone internally at Facebook. I see, I see, I see. I didn't know that. So yeah, which was similarly kind of declaring like, hey, there's a new role. It's disruptive.

Starting point is 00:04:23 It's fun. It's new and exciting. So I wanted to do something similar for data engineering. kind of declaring like hey there's a new role it's disruptive it's fun it's you know it's it's new and exciting so i wanted to do something similar for for data engineering so that's where it came out of um and then you know i think personally i was struggling with the role to where i wanted to go even further than you know being a data engineer and be more of a software engineer and a tool builder um and then I was like oh here's what all the problems and the challenges around the roles are and maybe this is what we're gonna need to break through to make this kind of fun or and or successful or or

Starting point is 00:04:57 you know this is the reason why I don't want to be a data engineer anymore maybe you know mix it makes and mix of the two. And I think like, you know, we've done the return on that post. So the, the, it's called the downfall of the data engineer. And then it's interesting to revisit it, you know, year after year, uh, with practitioners to, to see like, uh, is that, is that still an issue? Yes or no. Cause there's probably like five or six things in there that are like, ah, this is why it

Starting point is 00:05:22 sucks to, um, to try to be influential in that role or to be successful in that role. Yeah. What was the reception like after, especially, I guess, after the downfall one, like do people, were people like, oh my gosh, yes. Like let's solve these problems or what was that like? Yes. It's interesting. You know, like you, you write a blog post, it's as if you walk into a microphone with an empty room and you say things and then uh some people might mention it in a podcast five years later you know so that's the reception was not like oh there was like hundreds of people at my door the next day like ringing the doorbell trying to get interviews so uh no so i think i mean people reacted there's you know if you think

Starting point is 00:06:02 about like the people that read and review this stuff so usually when you blog on I don't know I don't I think it was like my it was not on behalf of a company some blog posts I've written before was under like that say Airbnb or lift umbrella or presets I get it gets reviewed it gets a little bit more more attention and review and totally kind of just peer review but on the peer review front I think I think people agreed generally and i think it resonated overall so over the years i've heard people saying like hey i read the post and really resonated with like similar to what you said and then there's been a handful

Starting point is 00:06:34 of times where we've done either podcasts or again we did an article with monte carlo where we did the return on the downfall um like is still an issue? Have we moved forward? And for me, it's never clear, like, oh, is it my experience? How generalizable is this? Is it the same at all organizations? Anecdotally, maybe I talk to 30 data engineers a year about these struggles here and there. But it's hard to say, like, oh, is that universal or is that you know limited to my experience i remember there was a joke about like oh yeah you know data science is data engineering until you have the data like i

Starting point is 00:07:16 so i know you got your start in like uh like bi analytics and things like that like have you thought about just at some point just you know giving up just go do data science which has like way more right like coverage and sort of support from leadership and things like that yeah and i think it was called like the sexiest job in america for like five years you know so i was like ah that's like that sounds that sounds kind of good i could work on that no but no actually not really um and i don't know i guess like the big difference was ai and ml right like that was a really exciting thing and that was the draw for a lot of people and in retrospect i think it was and is but like with generative ai now i think a lot of the you know some of the skills learned in that era are are i think i think they're useful and

Starting point is 00:08:07 transferable but yeah i i think i think like the draw for me was more um software engineering in general than data science i don't really know what i think you know maybe it's like it's the potential impact this seemed difficult to have a huge impact as a data scientist. And then there's always the data science. So people wanted to go into it to solve problems using ML and AI. And then they were just kind of data analysts

Starting point is 00:08:38 that live in San Francisco and wanted to call themselves data scientists because that's what companies need. I mean, that's an old joke. What is a data scientist? It's a data analyst living in San Francisco. That's really nice. Yeah, I've heard that one before.

Starting point is 00:08:53 But yeah, I mean, clearly, I think I'd say at Airbnb, we had a data science team of like 100 people at some point. And I think a lot of them were doing, you know, a lot of what data analysts would have done or what analyst engineers are doing today. Either to support the kind of stuff they wanted to do in data science because like 80% of the work is the wrangling and the preparing of data as it's kind of well known or because it's what the company needed. And at some point, there's a limited maybe number of problems

Starting point is 00:09:25 you can apply ML to. And if people want to work on like creating models and doing that kind of stuff, it seemed like there was a lot more impact to be added in terms of like doing very basic data science and applying it at scale. So that's more like data science engineering or data science infrastructure type stuff,

Starting point is 00:09:46 which is a different skill set. I think over time, almost every engineer or even data scientist at this point, I've seen people move down below one stack. It's like, I've done enough of that. Let me now build infrastructure for it, platformatize it, if that is even a word, to make it easier for others to just plug and play, for example. On the skill set thing, I always thought about, okay, yeah, if you want to be a data scientist,

Starting point is 00:10:14 you should pick up the skill sets of the stack below you, so data engineering skills, and then if you wanna be a good data engineer, you need to pick up the stack below you, so infrastructure, data infrastructure like skills what do you think of that yeah I think if yeah a few things on that's the first thing is like to see people expand or moved lower down the stack over their career is like as a natural a pretty natural progression natural draw where you become you as you solve the problem a certain

Starting point is 00:10:45 layer you want to go meta like you want to go more you know generalize and say oh i want to solve the problem that creates the problem on the other layer like i want to get deeper like solve it at a deeper level i think it's a natural it's a natural progression you know i think expanding and become like widening your skills too isn't just in general, but it's a natural progression. Is it better to go down the stack or up the stack? I think there's different kind of biases there. If you want to be closer to users and use cases in the business, you can evolve in that direction. If you want to get closer to the meta problem and how things are done uh doing things in more reproducible way like that's a normal draw too but i think overall if

Starting point is 00:11:31 you think about just how people skills evolve like do you get deeper into a vertical or you get wider and then i would i would say like all of the paths are valid in terms, as long as you gain surface, right? Like you want to expand your surface either left or right, you know, or up the stack, down the stack, deeper in certain areas. So you want to be a very, like very, very specialized, very deep in an area or wider. I think that's a really interesting question.

Starting point is 00:12:00 Overall, I think my stance on that is better to go wide than go deep especially in the era of ai you know um i think like i think what we're gonna see with these lms and some some of the skills getting come out of commoditized commoditized is um it's better to be a generalist because then you have a bunch of little agents you could use eventually, right? And you have like very smart, like it's as if you have an army of very smart interns, you know, without like with some context,

Starting point is 00:12:36 but like not a lot of good executive skill. At least like that's the way feeling, working with LLMs feels like today. So it's good to be good to have good executive skills and coordination-type skills, and then you can be wider and get help from different AIs to help you coordinate and build things.

Starting point is 00:12:58 And there's always the question of, is an AI going to solve this for me? Or if the AI is good at it maybe i don't need to learn it um so speaking of like jen and i jen ai like do you see like a kind of a parallel of like you know back in the days with data science and then that was sort of getting a lot of the uh coverage versus data engineering was sort of the um kind of powering it versus like today with jen and i like what do you reckon would be the equivalence of like data engineering?

Starting point is 00:13:29 The effect of say the having this new tech on the role you mean? Or well, so I think data science was, you know, an important thing that's transformative. Like what we're dealing with here though is like something that's changing you know everything and everyone and every role and every skill so I think this is fundamentally different from anything we've seen before right I guess I can't can only be

Starting point is 00:13:54 compared to like the internet or something like in terms of like the level of this disruption and how it's gonna affect everyone's live and it's like one of these things hard to see at what pace or what it's going to affect everyone's life and it's one of these things. It's hard to see at what pace or what it's going to look like on the other side and how fast we're going to get there. But I think for me, I think one advice I give everyone

Starting point is 00:14:16 is you should develop first reflex to try to do it with AI or have AI do it for you. So the same way that we all develop, you know, first reflexes on, like, let me Google that around, like, 2000 to 2005. Or maybe as we get our first iPhones or our first smartphones, we're like, oh, we're having, you know, a debate about something.

Starting point is 00:14:38 Let me look that up, right? Like having that first reflex. I think we need to develop that very, very quickly with AI. So like, don't try to, you know, do it on your own, try to do it with AI first, and if it sucks at it, then do it on your own. What does that mean technically, or like, tactically, would that be just like using ChachiPT,

Starting point is 00:14:58 try to solve the problem first, or like trying to come up with a prompt, or how? Yeah, I mean, I think think like if you look at your your your daily your daily workflows and and i don't know what's on your to-do list for today beyond say this podcast but you can look at like okay i've got these like some technical tasks some things i'm trying to do before i even get started i might try to ask my assistant and that's probably chat GPT Claude and then to say like I'm just going to write down what I'm thinking about doing for that and see if I can get any or some assistance and then depending on it seems like this thing's going to be able to

Starting point is 00:15:37 help you or not you can you know paste the right code snippets or input documentation or things you're trying to write whether you're trying to you know write an email or input documentation or things you're trying to write, whether you're trying to, you know, write an email or a message or a PR or design, you know, a data model or something like that, like to write down your thoughts and work with your assistant on getting like the feedback loop without disturbing anyone is so glorious. And then you can figure out where it can and cannot help. But like that first reflex for most tasks glorious and then you can figure out where it can and cannot help but like that first reflex for most tasks i think you should try to do it with assistance i think from that's what i do like if you were to look at my and and i definitely would

Starting point is 00:16:14 not pull my chad gbt live on a podcast it's a mix of everything and and be cautious with privacy because like you know it does overflow too and like you know for me even for like founder advice or like legal input or everything the vast array of things that say a founder does as a at a startup um i definitely have developed first reflects for most tasks to ask you know chat gpt and see how it can help and it's it's good at things you would not think originally it's it might be good at right that's a good analogy because like you wouldn't also just pull up your google history to be like haha but what i was going to say the point around that is like the statistics of like how many times a day and the for for what kind of task i use this stuff is like, I would say that it's like now it's like five to 12 prompts a day

Starting point is 00:17:06 or like sessions and across the variety of what it means to be a founder, you know, the kind of tasks that a founder might do. Even like yesterday I had my immigration interview. I'm going to be an American citizen. So I'm Canadian originally. I've been on a green card. So I went and did the interview interview but like i didn't know apparently there's a hundred questions they might ask you and all that stuff and i i did like audio sessions with chad gpt on a drive to the bay area this week and i was like doing a role play with it was asking me

Starting point is 00:17:38 questions i practiced by the time i got to the interview i'd practiced the interview many times and reviewed you know what the three branches of government are and who's the current Secretary of State and all this stuff. All this stuff that were likely to ask me that were tricky, I had reviewed and role-played with GPT over audio in the car,

Starting point is 00:17:58 which is like a random use case, right? That is pretty cool. I was going to say even for ideating on like oh I feel like I want to start a new open source project around like did access policy here's some ideas that I have and just having a conversation around you know instead of like writing in the void you're a kind of you know

Starting point is 00:18:19 talking with someone smart that that has infinite time and attention for you until you run out of GPT-4 requests for the day. But it's surprising how well it is that just being brainstorm, friend kind of deal and keep the emotions out of it. Not a friend, a useful assistant. Think about that. What is the most unusual thing you've asked chat gpt if you remember considering like for example the use case you just mentioned it on brainstorming interview practice for american citizenship i would not have thought of that

Starting point is 00:18:55 that is really cool yeah that's really good yeah especially over voice yeah so when i say unusual i don't mean in a bad way but just like something which you didn't expect it to be good at, but you were like, oh, this is really good at this thing too. I think the stuff I've been most amazed with is writing really intricate blog posts on the edge of discovery. Like what I think is kind of new. Like let's say like ideating and brainstorming for around say uh creation of new project i think it's extremely good at like marketing and product product marketing like messaging and positioning uh for for startup founders it's something you might not think about if you're if you're not a founder but saying like hey we're trying we're coming up this new

Starting point is 00:19:40 this new product launch you know or we're thinking about a new product that we want to launch, you know, and here's how we want to position it and here's what we think it should do. It's an extremely good product marketer. But I was going to say one thing I worked on recently, we could take the tangent eventually, is just thinking about semantic layers, you know, in the BI world.

Starting point is 00:20:07 And then think about the intricacies of what exists and what the world needs. And at some point, we did a hackathon project around what the ideal semantic layer might look like, you know, and its properties. And then just going back and forth. Some of it is like the rubber duck effect, like just having someone to talk to yeah that's that's just like bounces back ideas so there's a lot of value in just having someone who listens carefully and spits back words that are related to the you know but even yeah so like can you you know um give me some related ideas or i'm thinking of this thing you know that what you do you think? And it's been an extremely good partner to work on these things that call it

Starting point is 00:20:49 the edge of innovation and discovery. So some of the aspects you mentioned before was like, hey, start with chat GPT first, similar to how you would go with, well, let's try to Google that first. In a way, you're saying it increases your productivity. Anything you're trying to do, it might already give you some aspect of the solution, so you can do more as an engineer, for example.

Starting point is 00:21:13 Now, putting yourself in the founder's seat, how do you think about your team size at hiring at that point? Because now you're saying, what would have taken me X amount of time to do? Now, with this co-pilot of sorts, I can do a little more efficiently, and so can your team. So have you thought about this in terms of team size? Yeah, definitely. I think as a founder, you always think about throughput and productivity, and then how do we do more overall, and then how we do more with what we have. I think recently over the past year and a half

Starting point is 00:21:48 we're a lot more resource constrained than we were before. Like before we were not, there was just no ceiling like you want to raise infinite money, take it. You don't have an infinite valuation, take it. I think now we've been really pushed to think about efficiency in general. I think it's always really hard to objectively measure throughput in general. I think it's always really hard to objectively measure throughput

Starting point is 00:22:06 in software development, right? And so it's always hard to do estimates. It's always hard to, you could count lines of code, you can count PRs, you can count features, you can, I don't know, you look like customer satisfaction.

Starting point is 00:22:18 But I think we're all a lot more productive than we used to be. One thing is for sure is like telling everyone the company to build that first reflex of like you know well first like everyone should have like pay for your you know we'll pick up the bill for your chat gpt or cloud or you know get the best ai you can or the one that you work best with get um copilot get all the tools right um if you need to produce an image just get mid journey like just go like that stuff is so cheap for what it does um it's just a no-brainer so enabling people with it is it now in

Starting point is 00:22:52 terms like the the the the social economic you know changes over time is really i mean it's gonna have major impact we just don't know exactly how, right? People are looking at layoffs at FYI. I'm like, how many of these layoffs are related to AI or won't be replenished? Maybe it's just a normal dip and markets go up and down, but the swing back with AI might be very different this time around. I think that's fundamentally true. In general, does the printing press lead to less text being

Starting point is 00:23:28 written or read? No is there less journalists because of the printing press or less writers no there's more but this is different though as a founder I can tell you I think it's good to get the pulse on the microcosm

Starting point is 00:23:44 of if you get the take on like how founders think about their kind of their companies individually then maybe an aggregate that gives you a sense of what's going to happen and a more meta economic layer but i would say i think i think i think currently yeah i think it's you know startup is always where incentive incentiv invites to grow as much as possible. So, and now where it's advised to be efficient. But as, you know, I double my revenue, I probably want to double my effect of my expenses too. Because we want to grow as fast as possible. So, there's clearly that.

Starting point is 00:24:22 But, yeah, I mean, I think we're going to start seeing like very, very small companies getting acquired. We're going to see the less than 10 people unicorn becoming probably more of a thing in the future too. So less people can accomplish as much in a lot of cases. I forgot a thing I saw a tweet the other day. I forget who it was from, but it's like, how many number of people does it take to build, let's say, a $100 million company or a billion dollar company, for example, and

Starting point is 00:24:48 that number keeps going down with advances with what we're seeing with LLMs, for example, and it might eventually come down to maybe one person, a company, and that is still valued at this higher number, for example. Yeah, I mean, it's probably been the trend overall, just productivity going up, but there's a big kind of step change happening, and then just a lot of things are going to be different on the other side.

Starting point is 00:25:14 And as I said, it's unclear what it's going to look like during the transition, how fast the transition's going to go, and where we're going to land. So on the topic of LLMs, and there are a bunch of other things you can also talk about, you have this open source project, Promptimize. Can you tell us more about that?

Starting point is 00:25:33 Yeah, so, I mean, that was, I think, that's like a year ago or so. We were building Texas SQL features inside Superset as a differentiator for presets. So for context, people are not super familiar with what I do. I started Apache Superset after I started Apache Airflow and been really dedicated to Apache Superset and then started a company, a commercial open source company, where we offer Superset as a service, essentially, right? And Superset is an open source competitor to the Tableau and Lookers of the World.

Starting point is 00:26:10 So a BI tool, and it's fully open source. It's amazing, it works super well. There's no reason why people should pay for vendors, you know, and then if you wanna host a solution around it, well, so if you haven't checked it out, you can check it out. Just go to Apache Superset and you can check out out you know uh what it does what it is and you can play it you know you can get set up quickly use it try it um and then preset is just a cloud service around it with some bells and whistles and

Starting point is 00:26:37 some improvements some of some of which and i won't go into like the exact pitch of uh just in the context of what we're talking about um So we built an AI assistant within preset to augment Superset. And that's a differentiator because we need to make money and have a commercial offering as well on top of the cloud service. So we were working on text-to-SQL, and it's a tough problem. And it's really hard to work with what's really deceptively easy to work with these LLMs. You work with it. You're like, hey, here's a tough problem and it's really hard to work with what's really deceptively easy to work with these lms you work with it like here's a few table schema can you write sql that does this

Starting point is 00:27:11 like oh my god this thing is good at sql which has deep implication for uh the data engineering world that we haven't talked about but like you know being a sql monkey is probably not going to cut it anymore when ai is is better SQL monkey than we are. The thing it's lacking is the executive skill and the memory, the long-term memory and the business context that are, for now, private from the LLM and need to be squeezed into a context window for it to make sense and be useful. So we started working on this problem saying, oh oh my God, like this thing is so good at writing SQL if you provide it the right context.

Starting point is 00:27:50 So started looking at, you know, vector databases to store your data models. And just in general, like working on some of the challenges we hit like early on, we're like working with different SQL dialects, making sure, you know, that, yeah, it're like working with different SQL dialects, making sure, you know, that, that,

Starting point is 00:28:05 that, yeah, it is able to generate the right dialect. It gets a little confused around that. Um, and then providing just overall the, the right context as to what you're trying to do and, and what the models that can use are.

Starting point is 00:28:18 And we started working on that. Like when we realized is, you know, you can use a GPT three, five turbo at GPT three, five, GPT four3.5 turbo, a GPT-3.5, a GPT-4, and you can bold something in your prompt that says like, do not, you know, make sure to capitalize, you know, the reserve words or if it's BigQuery, do this, right? So, you can start like just really changing your prompt and then it changes the outcome really intricately. And then what we're

Starting point is 00:28:44 trying to solve is the big fuzzy problem of people might ask anything, and your data schema might look like anything. So how do we measure the quality of our prompt or the quality of whether just even something as simple as should we use 3.5 turbo or 4 turbo or 4, right? And how much better is it performing so early on we found um this decent or good data data set around texas sql it's called a spider data set it's out of people

Starting point is 00:29:13 i forgot if it's like mit or sorry i don't want to miss quotes i'm not going to say anybody you can research like there's a spider data set that's a list of prompts simple schemas and then the good answers for it and there's a bit of a context where schemas, and then the good answers for it. And there's a bit of a context where people are like, oh, you know, we did different teams working on this problem. We did 82% or we did like 87% with ChatGPT on this test set. So it's a published test set. And then there was no way at the time

Starting point is 00:29:40 to just write kind of unit tests or a framework for someone kind of unit tests or a framework for you for it for someone to take unit tests and measure the outcome and so promptimize the idea behind it it was like oh let me write a little toolkit where you can write your your prompt cases which i like to test cases for if you're if you're familiar with take some of the ideas from like unit testing frameworks and apply them to prompt engineering and prompt testing so that we could say like, okay, take this 2,000 tests and run them against GPT-3 or 5,

Starting point is 00:30:18 or run them against GPT-4 turbo and compare the output of like the percentage of success where one succeed over the other what it's good at what it's bad at how much it costs how long it takes like the average the p90 of how long it takes for the prompt to to come back so wanted to apply the scientific method and just rigor to prompt engineering and and that's you know prompt them as a little toolkit to allow you to to do that with with some amount of structure it's quite cool and um i saw that you guys also have like link chain support and for me also for lane chain it was like when i first like started

Starting point is 00:30:57 looking at it i guess this is like last year i was like why do you need a library to do this don't i just like write these texts and then it just sort of like works and then i think right like as more i started trying to like write better prompt and you know do more use cases and i was like oh my gosh yeah it's such a mess like without sort of these like libraries and i think it's the exact same thing with the promptimize right where it's um like once things get to kind of the production level where like it's actually dollars on the line like this like you actually want like the same engineering sort of like the best practice that we developed, right.

Starting point is 00:31:29 To actually have that transferred over, transferred over instead of just kind of putting your hands in the air. Yes. It's like trying to have some empirical measurement in a very fuzzy, unknown world. Right. And then, cause like you're,

Starting point is 00:31:43 you're working on your prompts and you can add like literally a hint in there to say like a plea but please don't do this or you know you look at your 10 of failure say on text to sql generation and you might might realize like oh all the failure are related to trying to run that stuff on snowflake because it's not good at speaking this snowflake dialect so then you might add a thing that says, oh wait, but if you're using Snowflake and specifically date function to change the date grain of a thing,

Starting point is 00:32:13 here's some function definition that you can use, right? Or like be cautious around this. But then you're like, by doing this, you might have like whack-a-mole there, but then more like, you know, might have made some BigQuery support worse, right? but then more like you know might have made some like the big query support worse right so then it's really hard to know this so you need empirical you know you need more rigor around that and that was like the general idea with promptimize on langchain i

Starting point is 00:32:36 think it's really interesting because like when i found it i had the same thing i like i don't really understand why this exists not because I just didn't understand the problem space. Then I got familiar with the problem space. I was like, oh, yeah, this is like everything I need. This is super great. But then I started to try to use it. And no disrespect or anything for the toolkit. I think it's just something that matured very quickly.

Starting point is 00:33:00 But then I started using it. I was like, oh, it does kind of what I want it to do, but not exactly. And then I cannot use the methods that are here exactly in the way I want to use them. So then you kind of fall off. For me, I was like, it's harder to try to bend this toolkit into submission

Starting point is 00:33:21 than the value I get from it in you know, in some ways, right? So it has a lot of convenience method to say, break text into chunks and, uh, it would some amount of overlap, do this and that. So some things are really useful, but then I think say it didn't have support for the particular vector database we wanted to use at the time or not the kind of support that we needed. So then you, you went like 80% of the way, but then you have to monkey patch some stuff to make it work.

Starting point is 00:33:49 So, so they're like, that's just like a little bit of Python that does some text processing. Like I can, I can, we can write that with AI and like five minutes easier. Interesting. So, so is that what you guys internally do is just kind of like having your own sort of set of like utils and stuff to help with stuff? I think we do use some like just a land chain is a weird, it's a toolkit, you know, so you can kind of think of it like, you know, as a bunch of like utility tools around AI and ML.

Starting point is 00:34:21 And I think like over time, I think we agree to using like just specific portion of the toolkit it's like oh we use the hammer and the screwdriver but we don't use like anything that saws or cuts or you know so so we we pick some part of it I think stuck around and some things are like okay we could we'll just do our own thing because it's harder to bend this tool into doing what we do we need to do than it is to just do it on our own for for some use cases and so like internally or some of like what i'm trying to work on is like a lot of like summarization and then trying to like kind of like style transfer for like text and i remember it's been kind of a learning journey because um at first i was just like trying things and it just

Starting point is 00:35:01 like doesn't work and i was like man this lm thing is all hype just like doesn't work. And I was like, man, this LLM thing, man, it's all hype. Like shit doesn't work. And then it took me a while to realize he's like, okay, I'm actually just really bad at prompting. It's kind of like, like Googling back in the days, right? Like if you don't like sort of do the right keywords, it's kind of like the result is not like super great. So do you have like, and I guess so for you, right?

Starting point is 00:35:24 Like, you know, you're doing like eight to ten prompts every day like did you see that gradual like improvement in terms of results for yourself and like how do how do i get better at this like i do want to get better yeah i mean i think it's just uh you know it's like you have to approach it a little bit more like if the fuzzy like you know um like a human maybe like it maybe it's like oh have to approach it a little bit more like if a fuzzy like you know like a human maybe like it maybe it's like oh you approach someone you don't know very much that you know they've maybe they're they're high graduate that you know you know they're smart and they have accumulated a lot of knowledge in different areas right so but then you you don't know how to

Starting point is 00:36:00 work with them and you don't know how good they might be about different things so i don't think i don't think the answer is to over engineer your prompts too so it's just like what do i need to tell it for it to help me you know and then in some cases like i think i've gotten more sloppy with the way i interact with gpt too in general in some areas right like in some areas you're like i can just like like i'll take like something i'll just open a session and take if i'm doing some coding i might have just an error message or a you know a problem in ci i'll just like copy paste a big thing instead of text and just throw away i didn't see what it's gonna say i might have some good pointers you know um i think fundamentally the first thing is like oh well what context does it need to help me?

Starting point is 00:36:46 And what context does it have from, you know, learning from the entire internet? So you have to say, okay, it doesn't know anything about things that's specific to my business or my use case. So what's not generalizable? What's it going to need? And then, you know, you can certainly try more things like, what if I tell you this, can you help me more? And so it's progressive disclosures until you prove or disprove whether they're going to, you know, it's going to be able to help you or not. But yeah, in terms like, you know, I think text to SQL and promptimize, I think what I realized, like a lot of the use cases for AI, I think are not as empirical or as measurable as the one we have.

Starting point is 00:37:30 In some ways, we're blessed with text to SQL because if I ask you, can you write this query on this database? It's pretty much, I mean, it's not always like, you know, 100% like a Boolean on like whether it succeeded or not now sometimes they might like I don't know put re-alias columns in a weird way or given you more than what you ask for but it's useful right so so it's sometimes it's so it's not a pure Boolean on like correct not correct but at least we have something that's like generally we could say this is a good answer this is a bad answer. If you say, can you please summarize this text in a paragraph, it's harder to evaluate whether it succeeded or not.

Starting point is 00:38:12 Or if you have a CS type, customer success type question, you're writing a CS, which is huge family use cases, right? People want to automate support um so if i have a if i can simulate and proptimize it a chat session where someone you know put some information i need help with this and that uh but it's harder to read the answer and give it a score so that then you can use an ai to do that but then you're kind of like yeah i don't know like yeah that i don't know what are you doing that could be probably garbage in garbage out but uh you have to trust the underlying system beyond a point well it's like a circular thing like if you get the ai to evaluate the answer of the ai like then you need

Starting point is 00:38:55 to you know to evaluate the answer the answer yeah i have to make sure it helps you but but i mean but you could i think you could you can and then i I talk with people that use Promptimize in more fuzzy use cases that are less like this Boolean, like the AI succeeded, yes or no. For instance, I think the examples that are really interesting in the Promptimize examples that I wrote when I originally wrote the project that was like writing some python function so i can actually ask the ai to write a python function take the python function and then run run unit tests on it make sure it actually works it's like write a function that you know does the tells you if it's a prime number or not then it generates a code then you actually put it in an interpreter and and test it so that this uh empirical use But yeah, when you get to like less empirical, like true or false use cases,

Starting point is 00:39:49 it gets more subjective and hard to evaluate. But this is pretty cool. Like this is more like test-driven development, right? You specify what you want, you describe the test, but you let the, and then you evaluate whether the code you got back is actually doing what you asked for it,

Starting point is 00:40:04 you asked it to do. Yeah, the blog post was very much, like originally when I wrote the thing, it was like bring the TDD and like, you know, rigor and what we've learned in software engineering and tests, you know, unit tests, test run development to prompt engineering. Well, the project is super cool

Starting point is 00:40:26 and we'll definitely link it in our show notes. We recommend people check it out. Not just the project, but also presets, superset, and airflow. Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

Software Misadventures - Become a LLM-ready Engineer | Maxime Beauchemin (Airflow, Preset)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.