Software Misadventures - Become a LLM-ready Engineer | Maxime Beauchemin (Airflow, Preset)
Episode Date: May 14, 2024If you’ve worked on data problems, you probably have heard of Airflow and Superset, two powerful tools that have cemented their place in the data ecosystem. Building successful open-source software ...is no easy feat, and even fewer engineers have done this back to back. In Part 1 of this conversation, we chat about how to adapt to the LLM-age as engineers.  Segments: (00:01:59) The Rise and Fall of the Data Engineer (00:11:13) The Importance of Executive Skill in the Era of AI (00:13:53) Developing the first reflex to use AI (00:17:47) What are LLMs good at? (00:25:33) Text to SQL (00:28:19) Promptimize (00:32:16) Using tools LangChain (00:35:02) Writing better prompts  Show Notes: - Max on Linkedin: https://www.linkedin.com/in/maximebeauchemin/ - Rise of the Data Engineer: https://medium.com/free-code-camp/the-rise-of-the-data-engineer-91be18f1e603 - Downfall of the Data Engineer: https://maximebeauchemin.medium.com/the-downfall-of-the-data-engineer-5bfb701e5d6b - Promptimize: https://github.com/preset-io/promptimize  Stay in touch: 👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! hello@softwaremisadventures.com
Transcript
Discussion (0)
Being a SQL monkey is probably not going to cut it anymore when AI is a better SQL monkey than we are.
The thing it's lacking is the executive skill and the memory, the long-term memory and the business context
that are, for now, private from the LLM and need to be squeezed into a context window for it to make sense and be useful.
It's been kind of a learning journey because at first I was just like trying things
and it just like doesn't work.
And I was like, man, this LLM thing,
it's all hype, like shit doesn't work.
And then it took me a while to realize,
he's like, okay, I'm actually just really bad at prompting.
It's kind of like Googling back in the days, right?
Like if you don't do the right keywords,
the result is not like super great.
So for you, right right you're doing like eight
to ten pounds every day like did you see that gradual like improvement in terms of results for
yourself and like how do how do i get better at this like i do want to get better on lang chain
i think it's really interesting because like when i found it i had the same thing like i don't
understand why this exists not because i i just didn't understand the problem space then i got familiar
with the problem space i was like oh yeah this is like everything i need this is super great
but then i started to try to use it and then i was like oh this is it does like kind of what i
wanted to do but then it to do, but not exactly.
And then I cannot use the methods that are here exactly in the way I want to use them.
Welcome to the Software Misadventures podcast.
We are your hosts, Ronak and Gwan.
As engineers, we are interested in not just the technologies, but the people and
the stories behind them. So on this show, we try to scratch our own itch by sitting down with
engineers, founders, and investors to chat about their path, lessons they've learned, and of course,
the misadventures along the way. Welcome to the show, Max. Super excited to have you here.
Well, excited to be on the show too and excited to catch up with
the episodes you have so far too so make sure they catch up on that thank you excellent okay
so just getting right into it so at the beginning of 2017 you wrote this post called the rise of the
data engineer which both helped define the role as well as bring more attention to it. So I was a data engineer when that came out.
I was like, oh my gosh, this is what it's all about.
But then at the end of the same year, you write a sequel to this called The Downfall of the Data Engineer,
which summed up pretty much all my struggles also as a data engineer at the time.
So I guess what led you to kind of write a sequel? Well, it's like what led me to write the original to, so I think like trying to get back in the
context at the time. So when I left Facebook to join Airbnb, so that thing that was in 2014,
internally, they were still calling themselves, I think ETLL people, like ETL engineer and business intelligence engineer.
And I was like, coming out of Facebook, I think we had started calling the team, the data engineering team.
And to me, I changed the way I came out of Facebook after two years.
They're just thinking definitely about my role and about the industry and who I wanted to be and what I wanted to do.
And I wanted to make a chasm with the past, like just basically,
I don't want to use these GUI tools anymore. I want to do, you know, pipelines as code, even,
you know, just move away from the GUIs be more, you know, bring some of the concept of software
engineering into data processing, data engineering, serving people with data in general. And, and I think I wanted to make take
a strong stance for that way internally at Airbnb, even saying like, Oh, we should if we do job
postings externally to go and hire people, we should, you know, put a job posting saying data
engineer. But then people are like, what is a data engineer? You know, what does that mean?
So I think I decided to, to write the blog post. I think I had read maybe, and we should dig out that post,
but I think there was a similar post called
The Rise of the Data Scientist
coming out of someone internally at Facebook.
I see, I see, I see.
I didn't know that.
So yeah, which was similarly kind of declaring like,
hey, there's a new role.
It's disruptive.
It's fun.
It's new and exciting. So I wanted to do something similar for data engineering. kind of declaring like hey there's a new role it's disruptive it's fun it's you know it's it's
new and exciting so i wanted to do something similar for for data engineering so that's where
it came out of um and then you know i think personally i was struggling with the role to
where i wanted to go even further than you know being a data engineer and be more of a software
engineer and a tool builder um and then I was like oh here's what all the
problems and the challenges around the roles are and maybe this is what we're
gonna need to break through to make this kind of fun or and or successful or or
you know this is the reason why I don't want to be a data engineer anymore maybe
you know mix it makes and mix of the two.
And I think like, you know, we've done the return on that post.
So the, the, it's called the downfall of the data engineer.
And then it's interesting to revisit it, you know, year after year, uh, with practitioners
to, to see like, uh, is that, is that still an issue?
Yes or no.
Cause there's probably like five or six things in there that are like, ah, this is why it
sucks to, um, to try to be influential in that role or to be successful in that role. Yeah. What was the reception like
after, especially, I guess, after the downfall one, like do people, were people like, oh my gosh,
yes. Like let's solve these problems or what was that like? Yes. It's interesting. You know,
like you, you write a blog post, it's as if you walk into a microphone with an empty room and you
say things and then uh some
people might mention it in a podcast five years later you know so that's the reception was not
like oh there was like hundreds of people at my door the next day like ringing the doorbell
trying to get interviews so uh no so i think i mean people reacted there's you know if you think
about like the people that read and review this stuff so usually when you blog on I don't know I don't I
think it was like my it was not on behalf of a company some blog posts I've
written before was under like that say Airbnb or lift umbrella or presets I get
it gets reviewed it gets a little bit more more attention and review and
totally kind of just peer review but on the peer review front I think I think
people agreed generally and
i think it resonated overall so over the years i've heard people saying like hey i read the
post and really resonated with like similar to what you said and then there's been a handful
of times where we've done either podcasts or again we did an article with monte carlo where we did
the return on the downfall um like is still an issue? Have we moved forward?
And for me, it's never clear, like, oh, is it my experience?
How generalizable is this?
Is it the same at all organizations?
Anecdotally, maybe I talk to 30 data engineers a year about these struggles here and there.
But it's hard to say, like, oh, is that universal or is that you know limited to my experience i remember there was a
joke about like oh yeah you know data science is data engineering until you have the data like i
so i know you got your start in like uh like bi analytics and things like that like have you
thought about just at some point just you know giving up just go do data science which has like way more right like coverage and sort of support
from leadership and things like that yeah and i think it was called like the sexiest job in
america for like five years you know so i was like ah that's like that sounds that sounds kind of good
i could work on that no but no actually not really um and i don't know i guess like the big difference was
ai and ml right like that was a really exciting thing and that was the draw for a lot of people
and in retrospect i think it was and is but like with generative ai now i think a lot of the
you know some of the skills learned in that era are are i think i think they're useful and
transferable but yeah i i think i think like the draw for me was more um software engineering in
general than data science i don't really know what i think you know maybe it's like it's the
potential impact this seemed difficult to have a huge impact
as a data scientist.
And then there's always the data science.
So people wanted to go into it
to solve problems using ML and AI.
And then they were just kind of data analysts
that live in San Francisco
and wanted to call themselves data scientists
because that's what companies need.
I mean, that's an old joke.
What is a data scientist?
It's a data analyst living in San Francisco.
That's really nice.
Yeah, I've heard that one before.
But yeah, I mean, clearly, I think I'd say at Airbnb,
we had a data science team of like 100 people at some point.
And I think a lot of them were doing, you know,
a lot of what data analysts would have done
or what analyst engineers are doing today. Either to support the kind of stuff they wanted
to do in data science because like 80% of the work is the wrangling and the preparing
of data as it's kind of well known or because it's what the company needed. And at some
point, there's a limited maybe number of problems
you can apply ML to.
And if people want to work on like creating models
and doing that kind of stuff,
it seemed like there was a lot more impact to be added
in terms of like doing very basic data science
and applying it at scale.
So that's more like data science engineering
or data science infrastructure type stuff,
which is a different skill set.
I think over time, almost every engineer or even data scientist at this point, I've seen
people move down below one stack.
It's like, I've done enough of that.
Let me now build infrastructure for it, platformatize it, if that is even a word, to make it easier for others
to just plug and play, for example.
On the skill set thing, I always thought about,
okay, yeah, if you want to be a data scientist,
you should pick up the skill sets of the stack below you,
so data engineering skills, and then if you wanna be
a good data engineer, you need to pick up the stack
below you, so infrastructure, data infrastructure like skills what do you think of that
yeah I think if yeah a few things on that's the first thing is like to see
people expand or moved lower down the stack over their career is like as a
natural a pretty natural progression natural draw where you become you as you
solve the problem a certain
layer you want to go meta like you want to go more you know generalize and say oh i want to
solve the problem that creates the problem on the other layer like i want to get deeper like solve
it at a deeper level i think it's a natural it's a natural progression you know i think expanding
and become like widening your skills too isn't just in general, but it's a natural progression.
Is it better to go down the stack or up the stack?
I think there's different kind of biases there.
If you want to be closer to users and use cases in the business, you can evolve in that direction.
If you want to get closer to the meta problem and how things are done uh doing things in more reproducible way like that's a normal draw too but i think overall if
you think about just how people skills evolve like do you get deeper into a vertical or you get wider
and then i would i would say like all of the paths are valid in terms, as long as you gain surface, right?
Like you want to expand your surface either left or right,
you know, or up the stack, down the stack,
deeper in certain areas.
So you want to be a very, like very, very specialized,
very deep in an area or wider.
I think that's a really interesting question.
Overall, I think my stance on that is better to go wide than go deep especially
in the era of ai you know um i think like i think what we're gonna see with these lms and some some
of the skills getting come out of commoditized commoditized is um it's better to be a generalist
because then you have a bunch of little agents
you could use eventually, right?
And you have like very smart,
like it's as if you have an army of very smart interns,
you know, without like with some context,
but like not a lot of good executive skill.
At least like that's the way feeling,
working with LLMs feels like today.
So it's good to be good to have good executive skills
and coordination-type skills,
and then you can be wider
and get help from different AIs
to help you coordinate and build things.
And there's always the question of,
is an AI going to solve this for me?
Or if the AI is good at it maybe i don't
need to learn it um so speaking of like jen and i jen ai like do you see like a kind of a parallel
of like you know back in the days with data science and then that was sort of getting a lot
of the uh coverage versus data engineering was sort of the um kind of powering it versus like
today with jen and i like what do you reckon would be the equivalence
of like data engineering?
The effect of say the having this new tech
on the role you mean?
Or well, so I think data science was,
you know, an important thing that's transformative.
Like what we're dealing with here though
is like something that's changing you know everything and
everyone and every role and every skill so I think this is fundamentally
different from anything we've seen before right I guess I can't can only be
compared to like the internet or something like in terms of like the
level of this disruption and how it's gonna affect everyone's live and it's
like one of these things hard to see at what pace or what it's going to affect everyone's life and it's one of these things. It's hard to see at what
pace or what it's going to look like on the other side
and how fast we're going to get there.
But I think
for me, I think one
advice I give everyone
is you should
develop first reflex to try
to do it with AI
or have AI do it for you.
So the same way that we all develop, you know, first reflexes on,
like, let me Google that around, like, 2000 to 2005.
Or maybe as we get our first iPhones or our first smartphones,
we're like, oh, we're having, you know, a debate about something.
Let me look that up, right?
Like having that first reflex.
I think we need to develop that very, very quickly with AI.
So like, don't try to, you know, do it on your own,
try to do it with AI first, and if it sucks at it,
then do it on your own.
What does that mean technically, or like,
tactically, would that be just like using ChachiPT,
try to solve the problem first,
or like trying to come up with a prompt, or how?
Yeah, I mean, I think think like if you look at your
your your daily your daily workflows and and i don't know what's on your to-do list for today
beyond say this podcast but you can look at like okay i've got these like some technical tasks
some things i'm trying to do before i even get started i might try to ask my assistant and that's probably chat GPT Claude
and then to say like I'm just going to write down what I'm thinking about doing for that and see if
I can get any or some assistance and then depending on it seems like this thing's going to be able to
help you or not you can you know paste the right code snippets or input documentation or things
you're trying to write whether you're trying to you know write an email or input documentation or things you're trying to write, whether you're
trying to, you know, write an email or a message or a PR or design, you know, a data model or
something like that, like to write down your thoughts and work with your assistant on getting
like the feedback loop without disturbing anyone is so glorious. And then you can figure out where
it can and cannot help. But like that first reflex for most tasks glorious and then you can figure out where it can
and cannot help but like that first reflex for most tasks i think you should try to do it with
assistance i think from that's what i do like if you were to look at my and and i definitely would
not pull my chad gbt live on a podcast it's a mix of everything and and be cautious with privacy
because like you know it does overflow too and like you know for me even for like founder
advice or like legal input or everything the vast array of things that say a founder does as a at a
startup um i definitely have developed first reflects for most tasks to ask you know chat
gpt and see how it can help and it's it's good at things you would not think originally it's it might be
good at right that's a good analogy because like you wouldn't also just pull up your google history
to be like haha but what i was going to say the point around that is like the statistics of like
how many times a day and the for for what kind of task i use this stuff is like, I would say that it's like now it's like five to 12 prompts a day
or like sessions and across the variety of what it means to be a founder,
you know, the kind of tasks that a founder might do.
Even like yesterday I had my immigration interview.
I'm going to be an American citizen.
So I'm Canadian originally.
I've been on a green card.
So I went and did the interview interview but like i didn't know apparently there's a hundred questions they might ask you and all that stuff and i i did like audio sessions with chad gpt
on a drive to the bay area this week and i was like doing a role play with it was asking me
questions i practiced by the time i got to the interview i'd practiced the interview many times
and reviewed you know what the three branches of government
are and who's the
current
Secretary of State and all this stuff.
All this stuff that were likely to ask me that were tricky,
I had reviewed and role-played with GPT
over audio in the car,
which is like a random
use case, right?
That is pretty cool.
I was going to say
even for ideating on
like oh I feel like I want to start a new open source project around like did
access policy here's some ideas that I have and just having a conversation
around you know instead of like writing in the void you're a kind of you know
talking with someone smart that that has infinite time and attention for you until you run out of GPT-4 requests for the day.
But it's surprising how well it is that just being brainstorm,
friend kind of deal and keep the emotions out of it.
Not a friend, a useful assistant.
Think about that.
What is the most unusual thing you've asked
chat gpt if you remember considering like for example the use case you just mentioned it on
brainstorming interview practice for american citizenship i would not have thought of that
that is really cool yeah that's really good yeah especially over voice yeah so when i say unusual
i don't mean in a bad way but just like something which you didn't expect it to be good at, but you were like, oh, this is really good at this thing too.
I think the stuff I've been most amazed with is writing really intricate blog posts on the edge of discovery.
Like what I think is kind of new.
Like let's say like ideating and brainstorming for around say uh creation of
new project i think it's extremely good at like marketing and product product marketing like
messaging and positioning uh for for startup founders it's something you might not think
about if you're if you're not a founder but saying like hey we're trying we're coming up this new
this new product launch you know or we're thinking about a new product that we want to launch, you know,
and here's how we want to position it
and here's what we think it should do.
It's an extremely good product marketer.
But I was going to say one thing I worked on recently,
we could take the tangent eventually,
is just thinking about semantic layers,
you know, in the BI world.
And then think about the intricacies of what exists and what the world needs.
And at some point, we did a hackathon project around what the ideal semantic layer might look like, you know, and its properties.
And then just going back and forth.
Some of it is like the rubber duck effect, like just having someone to talk to yeah that's that's just like bounces back ideas
so there's a lot of value in just having someone who listens carefully and spits back words that
are related to the you know but even yeah so like can you you know um give me some related ideas
or i'm thinking of this thing you know that what you do you think? And it's been an extremely good partner
to work on these things that call it
the edge of innovation and discovery.
So some of the aspects you mentioned before
was like, hey, start with chat GPT first,
similar to how you would go with,
well, let's try to Google that first.
In a way, you're saying it increases your productivity.
Anything you're trying to do, it might already give you some aspect of the solution,
so you can do more as an engineer, for example.
Now, putting yourself in the founder's seat,
how do you think about your team size at hiring at that point?
Because now you're saying, what would have taken me X amount of time to do?
Now, with this co-pilot of sorts, I can do a little more efficiently, and so can your team.
So have you thought about this in terms of team size?
Yeah, definitely.
I think as a founder, you always think about throughput and productivity, and then how do we do more overall, and then how we do more with what we have.
I think recently over the past year and a half
we're a lot more resource constrained than we were before.
Like before we were not, there was just no ceiling
like you want to raise infinite money, take it.
You don't have an infinite valuation, take it.
I think now we've been really pushed
to think about efficiency in general.
I think it's always really hard
to objectively measure throughput in general. I think it's always really hard to objectively measure throughput
in software development, right?
And so it's always hard to do estimates.
It's always hard to,
you could count lines of code,
you can count PRs,
you can count features,
you can, I don't know,
you look like customer satisfaction.
But I think we're all a lot more productive
than we used to be.
One thing is for sure is like telling everyone
the company to build that first reflex of like you know well first like everyone should have like pay
for your you know we'll pick up the bill for your chat gpt or cloud or you know get the best ai you
can or the one that you work best with get um copilot get all the tools right um if you need
to produce an image just get mid journey like just go like that
stuff is so cheap for what it does um it's just a no-brainer so enabling people with it is it now in
terms like the the the the social economic you know changes over time is really i mean it's gonna
have major impact we just don't know exactly how, right? People are looking at layoffs at FYI.
I'm like, how many of these layoffs are related to AI or won't be replenished?
Maybe it's just a normal dip and markets go up and down,
but the swing back with AI might be very different this time around.
I think that's fundamentally true.
In general, does the printing press lead to
less text being
written or read? No
is there less journalists
because of the printing press
or less writers
no there's more
but this is different though
as a founder I can tell you I think it's good
to get the pulse on the microcosm
of if you get the take on like how founders think about their kind of their
companies individually then maybe an aggregate that gives you a sense of what's going to happen
and a more meta economic layer but i would say i think i think i think currently yeah i think it's
you know startup is always where incentive incentiv invites to grow as much as possible.
So, and now where it's advised to be efficient.
But as, you know, I double my revenue, I probably want to double my effect of my expenses too.
Because we want to grow as fast as possible.
So, there's clearly that.
But, yeah, I mean, I think we're going to start seeing
like very, very small companies getting acquired.
We're going to see the less than 10 people unicorn
becoming probably more of a thing in the future too.
So less people can accomplish as much in a lot of cases.
I forgot a thing I saw a tweet the other day.
I forget who it was from, but it's like,
how many number of people does it take to build, let's say, a $100 million company or a billion dollar company, for example, and
that number keeps going down with advances with what we're seeing with LLMs, for example,
and it might eventually come down to maybe one person, a company, and that is still valued
at this higher number, for example.
Yeah, I mean, it's probably been the trend overall,
just productivity going up,
but there's a big kind of step change happening,
and then just a lot of things are going to be different
on the other side.
And as I said, it's unclear what it's going to look like
during the transition, how fast the transition's going to go,
and where we're going to land.
So on the topic of LLMs,
and there are a bunch of other things
you can also talk about,
you have this open source project, Promptimize.
Can you tell us more about that?
Yeah, so, I mean, that was, I think,
that's like a year ago or so.
We were building Texas SQL features
inside Superset as a differentiator for presets.
So for context, people are not super familiar with what I do.
I started Apache Superset after I started Apache Airflow and been really dedicated to Apache Superset
and then started a company, a commercial open source company, where we offer Superset as a service, essentially, right? And Superset is an open source competitor
to the Tableau and Lookers of the World.
So a BI tool, and it's fully open source.
It's amazing, it works super well.
There's no reason why people should pay for vendors,
you know, and then if you wanna host a solution around it,
well, so if you haven't checked it out,
you can check it out.
Just go to Apache Superset and you can check out out you know uh what it does what it is and you can play it you know you can get set up quickly
use it try it um and then preset is just a cloud service around it with some bells and whistles and
some improvements some of some of which and i won't go into like the exact pitch of uh just
in the context of what we're talking about um So we built an AI assistant within preset to augment Superset.
And that's a differentiator because we need to make money
and have a commercial offering as well on top of the cloud service.
So we were working on text-to-SQL, and it's a tough problem.
And it's really hard to work with what's really deceptively easy
to work with these LLMs. You work with it. You're like, hey, here's a tough problem and it's really hard to work with what's really deceptively easy to
work with these lms you work with it like here's a few table schema can you write sql that does this
like oh my god this thing is good at sql which has deep implication for uh the data engineering
world that we haven't talked about but like you know being a sql monkey is probably not going to
cut it anymore when ai is is better SQL monkey than we are.
The thing it's lacking is the executive skill and the memory, the long-term memory and the
business context that are, for now, private from the LLM and need to be squeezed into
a context window for it to make sense and be useful.
So we started working on this problem saying, oh oh my God, like this thing is so good at writing SQL
if you provide it the right context.
So started looking at, you know,
vector databases to store your data models.
And just in general, like working on some of the challenges
we hit like early on,
we're like working with different SQL dialects,
making sure, you know, that, yeah, it're like working with different SQL dialects, making sure,
you know,
that, that,
that,
yeah,
it is able to generate the right dialect.
It gets a little confused around that.
Um,
and then providing just overall the,
the right context as to what you're trying to do and,
and what the models that can use are.
And we started working on that.
Like when we realized is,
you know,
you can use a GPT three,
five turbo at GPT three, five, GPT four3.5 turbo, a GPT-3.5, a GPT-4, and you
can bold something in your prompt that says like, do not, you know, make sure to capitalize, you
know, the reserve words or if it's BigQuery, do this, right? So, you can start like just really
changing your prompt and then it changes the outcome really intricately. And then what we're
trying to solve is the big fuzzy problem
of people might ask anything,
and your data schema might look like anything.
So how do we measure the quality of our prompt
or the quality of whether just even something as simple
as should we use 3.5 turbo or 4 turbo or 4, right?
And how much better is it performing so early on we found um this
decent or good data data set around texas sql it's called a spider data set it's out of people
i forgot if it's like mit or sorry i don't want to miss quotes i'm not going to say anybody you
can research like there's a spider data set that's a list of prompts simple schemas and then the good
answers for it and there's a bit of a context where schemas, and then the good answers for it.
And there's a bit of a context where people are like,
oh, you know, we did different teams working on this problem.
We did 82% or we did like 87% with ChatGPT on this test set.
So it's a published test set.
And then there was no way at the time
to just write kind of unit tests
or a framework for someone kind of unit tests or a framework for you for it for someone to take
unit tests and measure the outcome and so promptimize the idea behind it it was like
oh let me write a little toolkit where you can write your your prompt cases which i like to test
cases for if you're if you're familiar with take some of the ideas from like unit testing frameworks
and apply them to prompt engineering and prompt testing
so that we could say like, okay, take this 2,000 tests
and run them against GPT-3 or 5,
or run them against GPT-4 turbo
and compare the output of like the percentage of success
where one succeed over
the other what it's good at what it's bad at how much it costs how long it takes like the average
the p90 of how long it takes for the prompt to to come back so wanted to apply the scientific
method and just rigor to prompt engineering and and that's you know prompt them as a little toolkit to allow you to
to do that with with some amount of structure it's quite cool and um i saw that you guys also
have like link chain support and for me also for lane chain it was like when i first like started
looking at it i guess this is like last year i was like why do you need a library to do this
don't i just like write these texts and then it just sort of like works and then i think right like as more i started trying to like write better prompt and you know
do more use cases and i was like oh my gosh yeah it's such a mess like without sort of these like
libraries and i think it's the exact same thing with the promptimize right where it's um like
once things get to kind of the production level where like it's actually dollars on the line like
this like you actually want like the same engineering sort of like the best
practice that we developed,
right.
To actually have that transferred over,
transferred over instead of just kind of putting your hands in the air.
Yes.
It's like trying to have some empirical measurement in a very fuzzy,
unknown world.
Right.
And then,
cause like you're,
you're working on your prompts and you can add like
literally a hint in there to say like a plea but please don't do this or you know you look at your
10 of failure say on text to sql generation and you might might realize like oh all the failure
are related to trying to run that stuff on snowflake because it's not good at speaking
this snowflake dialect so then you might add a thing that says,
oh wait, but if you're using Snowflake
and specifically date function
to change the date grain of a thing,
here's some function definition that you can use, right?
Or like be cautious around this.
But then you're like, by doing this,
you might have like whack-a-mole there,
but then more like, you know,
might have made some BigQuery support worse, right? but then more like you know might have made some like the
big query support worse right so then it's really hard to know this so you need empirical you know
you need more rigor around that and that was like the general idea with promptimize on langchain i
think it's really interesting because like when i found it i had the same thing i like i don't
really understand why this exists not because I just didn't understand the problem space.
Then I got familiar with the problem space.
I was like, oh, yeah, this is like everything I need.
This is super great.
But then I started to try to use it.
And no disrespect or anything for the toolkit.
I think it's just something that matured very quickly.
But then I started using it.
I was like, oh, it does kind of what I want it to do,
but not exactly.
And then I cannot use the methods that are here exactly
in the way I want to use them.
So then you kind of fall off.
For me, I was like,
it's harder to try to bend this toolkit into submission
than the value I get from it in you know, in some ways, right?
So it has a lot of convenience method to say, break text into chunks and, uh, it would some
amount of overlap, do this and that.
So some things are really useful, but then I think say it didn't have support for the
particular vector database we wanted to use at the time or not the kind of support that
we needed.
So then you, you went like 80% of the way, but then you have to monkey patch
some stuff to make it work.
So, so they're like, that's just like a little bit of Python
that does some text processing.
Like I can, I can, we can write that with AI and like five minutes easier.
Interesting.
So, so is that what you guys internally do is just kind of like having your own
sort of set of like utils and stuff to help with stuff?
I think we do use some like just a land chain is a weird, it's a toolkit, you know,
so you can kind of think of it like, you know, as a bunch of like utility tools around AI and ML.
And I think like over time, I think we agree to using like just specific portion of the
toolkit it's like oh we use the hammer and the screwdriver but we don't use like anything that
saws or cuts or you know so so we we pick some part of it I think stuck around and some things
are like okay we could we'll just do our own thing because it's harder to bend this tool into doing
what we do we need to do than it is to just do it on our own for for some use cases
and so like internally or some of like what i'm trying to work on is like a lot of like
summarization and then trying to like kind of like style transfer for like text and i remember
it's been kind of a learning journey because um at first i was just like trying things and it just
like doesn't work and i was like man this lm thing is all hype just like doesn't work. And I was like, man, this LLM thing, man, it's all hype.
Like shit doesn't work.
And then it took me a while to realize he's like, okay,
I'm actually just really bad at prompting.
It's kind of like, like Googling back in the days, right?
Like if you don't like sort of do the right keywords,
it's kind of like the result is not like super great.
So do you have like, and I guess so for you, right?
Like, you know, you're doing like eight
to ten prompts every day like did you see that gradual like improvement in terms of results for
yourself and like how do how do i get better at this like i do want to get better yeah i mean i
think it's just uh you know it's like you have to approach it a little bit more like if the fuzzy
like you know um like a human maybe like it maybe it's like oh have to approach it a little bit more like if a fuzzy like you know
like a human maybe like it maybe it's like oh you approach someone you don't know very much that you
know they've maybe they're they're high graduate that you know you know they're smart and they
have accumulated a lot of knowledge in different areas right so but then you you don't know how to
work with them and you don't know how good they might be about different things so i don't think i don't think the answer is to over engineer your prompts too so it's just like what do i need
to tell it for it to help me you know and then in some cases like i think i've gotten more sloppy
with the way i interact with gpt too in general in some areas right like in some areas you're like
i can just like like i'll take
like something i'll just open a session and take if i'm doing some coding i might have just an error
message or a you know a problem in ci i'll just like copy paste a big thing instead of text and
just throw away i didn't see what it's gonna say i might have some good pointers you know um i think
fundamentally the first thing is like oh well what context does it need to help me?
And what context does it have from, you know, learning from the entire internet?
So you have to say, okay, it doesn't know anything about things that's specific to my business or my use case.
So what's not generalizable?
What's it going to need?
And then, you know, you can certainly try
more things like, what if I tell you this, can you help me more? And so it's progressive disclosures
until you prove or disprove whether they're going to, you know, it's going to be able to help you
or not. But yeah, in terms like, you know, I think text to SQL and promptimize, I think what I realized, like a lot of the use cases for AI, I think are not as empirical or as measurable as the one we have.
In some ways, we're blessed with text to SQL because if I ask you, can you write this query on this database?
It's pretty much, I mean, it's not always like, you know, 100% like a Boolean on like whether it succeeded or not now sometimes
they might like I don't know put re-alias columns in a weird way or given you more than what you
ask for but it's useful right so so it's sometimes it's so it's not a pure Boolean on like correct
not correct but at least we have something that's like generally we could say this is a good answer
this is a bad answer.
If you say, can you please summarize this text in a paragraph, it's harder to evaluate
whether it succeeded or not.
Or if you have a CS type, customer success type question, you're writing a CS, which
is huge family use cases, right?
People want to automate support um so if i have a if i can simulate and proptimize it
a chat session where someone you know put some information i need help with this and that uh
but it's harder to read the answer and give it a score so that then you can use an ai to do that
but then you're kind of like yeah i don't know like yeah that i don't know what are you doing
that could be probably garbage in garbage out but uh you have to trust the underlying system beyond a point well it's like
a circular thing like if you get the ai to evaluate the answer of the ai like then you need
to you know to evaluate the answer the answer yeah i have to make sure it helps you but but i mean
but you could i think you could you can and then i I talk with people that use Promptimize in more fuzzy use cases that are less like this Boolean, like the AI succeeded, yes or no.
For instance, I think the examples that are really interesting in the Promptimize examples that I wrote when I originally wrote the project that was like writing some python function so i can actually ask the ai
to write a python function take the python function and then run run unit tests on it
make sure it actually works it's like write a function that you know does the tells you if
it's a prime number or not then it generates a code then you actually put it in an interpreter
and and test it so that this uh empirical use But yeah, when you get to like less empirical,
like true or false use cases,
it gets more subjective and hard to evaluate.
But this is pretty cool.
Like this is more like test-driven development, right?
You specify what you want,
you describe the test,
but you let the,
and then you evaluate whether the code you got back
is actually doing what you asked for it,
you asked it to do.
Yeah, the blog post was very much,
like originally when I wrote the thing,
it was like bring the TDD and like, you know,
rigor and what we've learned in software engineering
and tests, you know, unit tests,
test run development to prompt engineering.
Well, the project is super cool
and we'll definitely link it in our show notes.
We recommend people check it out.
Not just the project, but also presets, superset, and airflow.
Hey, thank you so much for listening to the show.
You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com.
You can also write to us at hello at softwaremisadventures.com.
We would love to hear from you.
Until next time, take care.