Software Misadventures - LLMs are like your weird, over-confident intern | Simon Willison (Datasette)
Episode Date: September 10, 2024Known for co-creating Django and Datasette, as well as his thoughtful writing on LLMs, Simon Willison joins the show to chat about blogging as an accountability mechanism, how to build intuition with ...LLMs, building a startup with his partner on their honeymoon, and more. Segments: (00:00:00) The weird intern (00:01:50) The early days of LLMs (00:04:59) Blogging as an accountability mechanism (00:09:24) The low-pressure approach to blogging (00:11:47) GitHub issues as a system of records (00:16:15) Temporal documentation and design docs (00:18:19) GitHub issues for team collaboration (00:21:53) Copy-paste as an API (00:26:54) Observable notebooks (00:28:50) pip install LLM (00:32:26) The evolution of using LLMs daily (00:34:47) Building intuition with LLMs (00:43:24) Democratizing access to automation (00:47:45) Alternative interfaces for language models (00:53:39) Is prompt engineering really engineering? (00:58:39) The frustrations of working with LLMs (01:01:59) Structured data extraction with LLMs (01:06:08) How Simon would go about building a LLM app (01:09:49) LLMs making developers more ambitious (01:13:32) Typical workflow with LLMs (01:19:58) Vibes-based evaluation (01:23:25) Staying up-to-date with LLMs (01:27:49) The impact of LLMs on new programmers (01:29:37) The rise of 'Goop' and the future of software development (01:40:20) Being an independent developer (01:42:26) Staying focused and accountable (01:47:30) Building a startup with your partner on the honeymoon (01:51:30) The responsibility of AI practitioners (01:53:07) The hidden dangers of prompt injection (01:53:44) “Artificial intelligence” is really “imitation intelligence” Show Notes: Simon’s blog: https://simonwillison.net/ Natalie’s post on them building a startup together: https://blog.natbat.net/post/61658401806/lanyrd-from-idea-to-exit Simon’s talk from DjangoCon: https://www.youtube.com/watch?v=GLkRK2rJGB0 Simon on twitter: https://x.com/simonw Datasette: https://github.com/simonw/datasette Stay in touch: 👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! hello@softwaremisadventures.com Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en
Transcript
Discussion (0)
I call it my weird intern.
I'll say to my wife Natalie sometimes,
hey, so I got my weird intern to do this.
And that works, right?
It's a good mental model for these things as well
because it's like having an intern
who has read all of the documentation
and memorized the documentation
for every programming language
and is a wild conspiracy theorist
and sometimes comes up with absurd ideas
and they're massively overconfident.
It's the intern that always believes that they're right.
But it's an intern who you can,
I hate to say it, you can kind of bully them.
You can be like, do it again, do that again.
No, that's wrong, no, that's wrong.
And you don't have to feel guilty about it, which is great.
Or one of my favorite prompts,
one of my favorite prompts is you just say, do better.
And it works. It's the craziest thing. It'll write some code and you just say do better and it works it's the craziest thing
it'll write some code you say do better and it goes oh i'm sorry i should and then it will
churn out better code which is so stupid that that's how this technology works oh yeah it's
kind of fun welcome to the software misadventures podcast. We are your hosts, Ronak and Guan.
As engineers, we are interested in not just the technologies, but the people and the stories behind them.
So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors
to chat about their path, lessons they have learned, and of course, the misadventures along the way.
Simon, so you've been building tools for doing data analysis in the past few years,
but you also started playing with LLMs before it was cool.
I think you started writing about GPD-3 like two years ago,
and I'm sure you had different expectations after you started playing with it.
Has there been any big surprises in the last two years?
Big surprises in the last two years?
Hasn't been a big surprise, right?
The last two years have been completely wild.
Yeah, so I started playing with GPT-2 back in 2020, which was the very early precursor.
And it wasn't, it was, there was clearly something there.
I tried to use it to generate New York Times headlines for current affairs based on the style of headlines from different decades.
So I like fed in like New York Times 1950s, 1960s, 1970s.
And I mean, I didn't really get anywhere with it. I
sort of abandoned that project, but it felt like there was something interesting, but certainly
not sort of life shattering. And then GPT-3 became available. And I started really playing with that
sort of 2021, 2022. And that thing was extraordinary because it was this weird situation where the only
way you could use it was either through the OpenAI, through their API, or through their weird little playground interface.
And so nobody was using it, right?
Like, I actually, like, I put up a tutorial.
Here's how to use this thing because nobody was experimenting with it.
And because nobody else was using it, there was very little, there wasn't much information about what it could do.
Like, you sort of poke around with it. It was also, GPT-3 was a completion model. So you didn't
get to like chat with it. You'd have to give it a sentence and then put a colon at the end and have
it complete the sentence. So you'd discover things like, one of the things that really
clicked for me early on was the JQ programming language for manipulating JSON. I discovered that
GPT-3 could write that so i could
say hey here is a json document the jq program for extracting an array of names from this list
of objects is colon and it would spit out working code and that was a bit of a revelation you know
because i could never remember the syntax for jq um and so i was poking around with it and it was
increasingly clear that there were all sorts of things it could do that you wouldn't have expected something less to be able to do.
But it never felt really like an AI. It wasn't like you were conversing with something.
It was just this this sort of very weird tool that could complete things if you prompted it in the right way.
And then ChatGPT came along and what that was November 2022, right? Yes, it was November 2022. And all they did, all they did is
they slapped the chat interface on top of their existing model effectively. Like they tweaked it
and trained it a little bit more. But you know, chat GPT was an experimental prototype and a bunch
of people inside of OpenAI thought it was a bad idea. They're like, hey, this is a waste of time.
GPT-4 is coming. We should just hold off until then. They didn't expect it to take off at all.
And it was, I think it's the fastest growing consumer application in the history of the world.
It is.
Which for a very obscure, weird thing is sort of astonishing, you know.
But it was fun because this sort of this rocket just took off.
The entire world swiveled and started paying attention to this field.
And then because you've got millions of people experimenting with trying things
out, that's when we really started figuring out what it could do and what
things it was good at and all of that.
And so, yeah, I've been documenting and exploring that for the past couple of years.
I also had an advantage in that I've got a blog and most people just
don't bother blogging anymore.
They might like tweet or post on LinkedIn,
but very few people are writing sort of long form content about what they're learning.
But because I was doing that bad at AI,
I very quickly sort of established myself
as a person that you go to and talk about this stuff with,
which is great because then you get all of these people
who are figuring things out, talking to you directly,
and you can learn much faster.
Is like having a blog sort of like an accountability mechanism to have you yourself like go out and
then find these sort of things that are maybe not working super well so maybe back like you know gpt2
back in the days just as like a new source of you know inspiration to write more posts so that in
the process enormously interesting yes and that's actually that's the thing I've been doing this year is I very quietly
started a streak. I'm trying to, like inspired by Duolingo and actually Tom Scott on YouTube
did this 10 year streak of making a video once a week, which I found incredibly inspiring
because wow, like what a thing to manage to do. And so since January 1st, I've been trying to post something on my blog every single day.
And I've done that.
And it means that I do have that little extra incentive to make sure I find something interesting.
So that's been helping.
My blog, it's been an accountability mechanism for me for wider work for a few years now.
Because I'm now sort of independent.
I don't have an employer.
And so I started doing this thing I call week notes,
where once every two or three weeks,
I post a blog entry just saying,
here are the things that I've worked on
in the past couple of weeks.
And that means that when I'm thinking about what to do,
occasionally I'll think, you know what,
I haven't done anything I can write about yet,
so I should really invest in one of my open source projects
or do something so i can actually i've
got something to show for it and yeah i i love that i think writing is thinking and it's such
a great way of of forcing you to structure your thinking you know it's if if the best way to learn
something is to try and explain to somebody else so if you've got a blog and even like my shortest
little um like link blog things where it's like a link and two sentences of text,
I always try and put something in there that's valuable
that partly it's like to prove that I read the thing
that I'm linking to.
But also it's like,
if you read the summary in my blog and read the article,
do you get something slightly extra
from my perspective on it?
And it might just be saying,
maybe I'll link it back to something else and say, the Claude Trump caching they came out with a few days ago. And when I
wrote about it, I linked back to Google Gemini, which has a similar feature. And I could compare
how Google Gemini pricing works and how Claude pricing works. And that's a little bit of extra
perspective that you won't get from Anthropic. They're not going to write about Google Gemini
in their announcement of the feature. So it's that kind of thing.
It's having, it's forcing you to engage with the material just a tiny bit more thoughtfully
so that you can try and say something interesting about it as well as linking to it.
So when it comes to blogging, I think you had this tweet at one point, which was something
like blogging is like planting a beautiful cactus.
The best time to do it is 18 years ago. but the second best time to do it is today.
I think, especially when it comes to LLMs today, when generating content has become
way easier, not necessarily good content, but there is just way more out there.
How do you think about adding enough quality to the content where someone would
actually read the post? The other part is also having an accountability mechanism to
just do interesting thing is one perspective on writing blog posts. And I'm also curious
to hear like, what are some of the other things that keep you going? Because after a point
this, it takes a lot of work to write blogs.
Well, that's the secret of blogging is that it takes a lot of work to write blogs well that's the secret of blogging is that it
takes a lot of work at first but i've been blogging for 22 years and you just get faster you know if
you write every day you get faster at writing most days i will spend 10 to 15 minutes on my blog and
that's it you know it's like two links maybe a quote it's a very very quick process to turn things around i actually have a
second blog my um my til blog today i learned where the idea there is um it's the it's it should
really be part of my main blog it's partly to play with different technology that i'm running it as a
separate site but the idea there is anytime you learn anything new it's worth putting it out there
and saying hey these are the things i learned it's all it is it's worth putting it out there and saying, hey, these are the things I learned. It's all it is. It's my personal notes, but very slightly cleaned up so that I can publish
them. And actually, as a result of this habitually, when I'm writing personal notes, I sort of write
them well enough that I could copy and paste them into a public document, which is a good habit to
be in anyway. But part of the reason I do the TILs is that it's the most low pressure form of writing that there is.
Because with a regular blog, you feel like when you write something, you have to say something new.
You've got to add something new to the world.
With a TIL blog, no, you don't.
The barrier for writing on TIL is did you learn it today or recently?
And if it's like how to do a for loop in bash that still counts that's fine like
i'm publishing it i honestly it's mainly for me it's sort of my public notes i can go back and
find if somebody googles for how do i write a for loop in bash and they land on that document that's
great for them it's also um i feel like i mean i've had like i've got like 25 years of software
engineering experience i feel like it's important to outwardly demonstrate
that when you've got 25 years of experience,
it's still worth celebrating learning for loops in Bash.
You shouldn't get into that.
There's that pattern people get into
where they don't want to admit
that they only just learned how to do something.
It's sort of a shame that I didn't know
how to do for loops in Bash.
I like using my reputation to do for loops and bash. Doesn't, you know, that, that I like using my sort of my,
my reputation to broadcast out that no, be proud of that. Right.
You figured out for loops and bash. Fantastic.
There's a million other things that still to learn about everything
involving computers, right? It's,
it's no biggie that you didn't know that already.
So one thing that I struggle with always is I want to do this a whole lot more of.
I have a blog which has four entries right now and maybe 10 in my notes, which I've never gotten to polish and publish.
And it always goes back to, well, OK, today I have maybe, let's say, an hour I can either spend on writing this up, cleaning it up, or, you know, I could just spend the time doing some work.
So I struggle with that balance and I'm curious how
you think about it. I've got a great trick for that. So the thing that the way that I work,
all of my work that I do, software work and a lot of my other stuff as well is in GitHub issues,
right? It's free, it's got, you can have private issues, public issues and so forth. So every
single one of my projects has a very active GitHub issues like setup and I've got dozens of private repositories. I've got one just
called To Do's that I use for personal stuff. And the thing I love
about GitHub, basically the idea is that anytime I'm doing any project at all, I
open an issue and I stick in a sentence at the top saying, do this thing, and then
most of the work that we do as software engineers, it turns out is research, right?
Like you have to gather so much information
to solve a problem.
You have to be like, okay, where am I gonna do it?
I'm gonna do this file here needs modifying,
the tests for it live over here.
I need to use this library.
Here's some example code I found on Stack Overflow
that solves this problem.
I asked Claude a few questions and got these answers.
And so I will very quickly pepper in like two or three
or four reply comments to my issue with the research that I've done. And then I'll do the implementation.
And it means that firstly, programmers often talk about how damaging it is to be interrupted,
right? There's this idea that you carefully build up the context of everything that you need for
your problem. And then somebody taps you in the shoulder and asks you a question, and it all comes
tumbling down. It takes you half an hour to get back into it yeah the fix
for that is to have very detailed notes right if you have written down everything as you were going
along i can be distracted come back read the last three issue comments and have everything back in
place again and that's amazing for productivity but it also means that I'm maintaining over 250 active open source projects at the moment.
And a lot of them are very small.
They're like little command line tools or plugins for my projects or whatever.
But they're all maintained in as much as if somebody reports a bug and I see their issue report in amongst all of my notifications, I will fix that bug and I'll ship a new release. And the only way to maintain
250 projects is to treat every single one of them like you're going to forget every detail of it.
Like every project has to be as if it was somebody else's project that you occasionally drop into
and maintain. The way to do that is with issues, right? Every project I have, every single design
decision I ever made is in an issue comment somewhere in that repository.
So I can search through them, I can use git blame and say, okay, why did I add this code?
It was in this commit, this commit is linked to this issue, this issue tells me what I
was thinking at the time, what options I explored, all of that kind of thing.
And so this is an enormous productivity boost.
It feels like writing all of these notes should slow you down.
It's the opposite. It speeds you up.
It means when you want to publish something,
you've already written the rough outline of anything that you want to publish.
Most of my TILs are copied and pasted from my GitHub issue notes,
and then I'll clean up the wording a little bit
and maybe add some formatting, and that's it. It's done.
So that's been enormously...
I gave a talk about this at JagerCon a few years ago
about increasing your productivity on personal projects
through documentation and unit tests, right?
The two things that people would expect would slow you down.
Turns out if you put the right habits in place,
having comprehensive documentation
means you can work so much faster, right?
I can drop back into a project I haven't touched in a year, read the documentation as if I didn't know what the project was,
and then start working on it. That's fantastic. And the same thing with unit tests, right? If
you've got tests, you can iterate so much faster because you get over that fear of accidentally
breaking something. You make a change. Normally, you'd have to manually test every single feature of the software to make sure it didn't break. If your tests are
doing that work for you, you can drop in, make a five-line change, add a new test,
run the test suite, and then publish it to PyPI or ship a release of it.
It just works, you know?
That's super interesting. Do you also have design docs?
I'm curious about like you know
having all these projects being able to kind of drop in if it's something that's not like
so give blame super useful maybe just search through the issues that sounds super cool
um yeah like what about like do you also write like design docs or like yes but the issues are
design docs yeah absolutely so the issues are the design documentation effectively. And the only that because the problem with design documentation and all documentation has to be
kept up to date, right? If it falls out of sync with the code, then the big problem people lose
trust in it, right? Like I've worked at companies where we've had internal documentation and nobody
uses it because they know that it's not being actively maintained. And so the way I see it, there are sort of two key forms of documentation.
There's the documentation that has to be up to date, which tells you how the thing works or how to use the thing.
So if you're writing software libraries, it's the documentation that tells you which functions to call.
If you've got a web API, it's the one that tells you what the API endpoints are.
Command line tools, these are the options and what they do.
That I keep in my repository with the code. So there's always a docs folder.
It's got a bunch of markdown files in it. Anytime I update the code, I update the associated documentation. And if I'm collaborating with people, that's part of the pull request design
process, the code review process. If you submit a pull request and it doesn't update the documentation,
I'll either put in a note saying you need to update the documentation, or sometimes I will
update the docs as part of that pull request. The idea being that the moment you land it on
the main branch, it's got the test, it's got the implementation and the documentation all in a
single commit. Because then when you use git blame, the commit shows you the documentation
change as well. But the other form of documentation
is, I've been calling it temporal documentation, it's documentation that was true at a certain
point, but isn't guaranteed to still be true today. And that's where issues shine, right?
If I read an issue, and it says 2017, January the 5th, a bunch of stuff, I know that that's
not promising to be up to date. So it's still useful because I can say,
okay, well, in January of 2016, this was true, but it's not sort of ruining my trust in my docs
because I look at it, I'm like, hey, is this still true anymore? I don't know. And yet, so
very occasionally, I will write design documentation that says, if you are a maintainer
of this code, you should look here and here and here, and this is how it works. But I often don't do that.
I sort of leave it to the issues.
The idea being that if you can spelunk through the code
with git blame and the issues,
you can get that same information.
You might have to put a bit more work in.
See, I don't think any of my projects
have significant design docs
like current architectural documentation right now.
And I might start adding it. Also, a lot of them are, you know, if it's a software library, significant design docs at the like like current architectural documentation right now and i might
start adding it um also a lot of them are you know if it's a software library the design
documentation and the api documentation are kind of the same thing you know the the the the design
is sort of presented through how the the api is built yeah i was actually thinking this is
a practice that would be useful for even teams at companies where what ends up happening, at least in my experience, is something new will come
up that you need to implement.
Someone will go do some research, try out a few things.
So you see the code changes in the PRs, maybe the approach changes in the Google Docs, mostly
at least at my workplace.
But they don't always end up linked together, and one usually gets out of sync.
But I think this idea of using issues to do that,
where you can do that in the repository itself,
and the issue may link to the Google Doc that you have,
which is easier for collaborating and commenting on,
I think that would go a long way.
So it's something I'm going to try, actually.
And you know what?
I've got issue threads that are over 100 comments long,
and they're all me.
It's just me talking to myself.
I just realized that issues are a blog, right?
An issue thread is basically a one-off blog
for the story of this change, the story of this feature.
One of the reasons I love issues so much,
I used to write really long commit messages,
like I'd do six paragraphs in a commit message
explaining what I was doing.
I've stopped doing that now.
What I do instead is if there's stuff that should be in documentation,
I put it in the documentation and then include that in the commit.
So it doesn't go in the commit message.
It goes in the actual code.
And secondly, it's every commit always links to an issue thread.
Because the great thing about an issue thread is I can add comments to it a year after the commit.
Right, you're running git blame, you see a commit,
you click through to the issue thread, and there might be a comment saying,
12 months later it turned out this was a terrible idea for these reasons.
Also, issues accept screenshots, so I can put screenshots of the feature. So if I'm doing
CSS stuff, I always include screenshots of before and after. You can do animated GIFs or videos in
issues, so I'll sometimes do a little GIF demo of the thing.
Issues can link to each other.
You can embed code in them.
They're a really rich canvas for all sorts of aspects of document.
You can't put an image in a commit message, right?
But you can put a screenshot in an issue.
So yeah, I'm definitely a GitHubithub issues power user this is super helpful
thanks for sharing the tricks we'll actually link your talk in our show notes as well so that people
can find it easily cool uh by the way one thing about the blog so i was looking at your blog and
bunch of educational posts where people can learn about how to do various things and you want to get
into some of those but i also saw that you have some of these posts linked on Substack.
So I was curious, how do you use one versus the other?
That is a cutting trick that I came up with.
So I have a Substack newsletter,
which I put out once every two or three weeks.
And all it is, is the content on my blog
since the last newsletter.
With maybe a sentence at the top,
with like maybe I'll add a tiny bit of text at the top, but it's, it's purely,
it's basically I'm using Substack as a free mechanism to let people subscribe
to my blog via email because I didn't want to pay to send emails and build all
of that kind of stuff. And Substack, it's great for that.
I've got like over 6,000 subscribers now on Substack. Um, and it's,
it takes me about two minutes per newsletter to send it out.
So it turns out Substack, they don't have an API, but you can copy and paste stuff into your
Substack edit panel. And so I built myself a little tool, which it's actually an observable
notebook. But what it does is it pulls all of the content from my blog, reformats it into HTML rich text, and then gives me a big copy button that I can click, which puts all of that on my clipboard.
And so I go to this notebook, I click copy, I switch to Substack, I hit paste, I set the title of the newsletter, and I pick a preview image, and that's it.
I'm done. Like literally two minutes to send that newsletter out,
because it's using copy and paste, copy paste as an API,
which it turns out is a really powerful trick.
There's loads of stuff you can do with software
that thinks it doesn't have an API, and you're like,
yeah, but I can paste stuff into you, so.
Yeah, and that's been great.
My only regret with the newsletter
was I should have started doing it years ago,
because I've been doing it for about
just a year and a half maybe and it's it's brilliant you know it's it's it's a really
great way of getting things out there to people who live in their email clients so there's an
argument of uh using either systems like substats or medium or having your own personal blog
and the argument that i've heard to keep your own personal blog is that these platforms
may or may not exist in the future which has happened uh for many of these platforms is that
the reason why you still have the personal blog and substack is mostly just an email distribution
service of sorts that's one of the reasons yeah i mean one of the reasons i chose substack is you
can export your subscriber list so if substack ever say hey we're shutting down next week I can pull out a CSV file
with all of the email addresses in and I can move to something else that's really
important to me because yeah vendors absolutely come and go my I've owned my
domain name for again 20 or 20 odd years it built and you know it builds up SEO
credibility and stuff over time but But also it's just having,
there's something sort of wholesome about having a little corner of the internet that's just for you.
Like that, that's something I genuinely, I really enjoy. It feels a little bit subversive as well
in this day and age with all of these giant walled platforms and things. Yeah, no, I'm,
I've got a domain name and i'm running a web advocate website when
i'm yeah i so there's a and it's just fun you know as a software engineer it used to be like
10 15 years ago everyone's intro to soft to web development was building your own blog system
i don't think people do that anymore and that's really sad because it's such a good project you
get to learn databases and html and url design and all of
these and seo and all of these different skills um and yeah i mean my my blog itself is running
it's a django application because i helped create django 20 odd years ago so i want to have
something in my life that's like a django app that i'm building on and it's all open source
like the code is on github it's is on GitHub. Over the past six months,
I've started updating it a lot more, just making little tiny tweaks to it. I changed the default
typeface that I'm using for headings a couple of weeks ago. And I started doing more things
with images. And it's just really nice. It's nice being able to dive in and try out something new
completely in that space. I run it on Heroku behind Cloudflare.
The great thing about Cloudflare is if I get a giant spike of traffic,
like if I'm linked off the Hack and Use homepage,
my tiny little cheap Heroku instance doesn't even notice
because Cloudflare absorbs all of the traffic.
And that's great.
I bet that helps.
That's a nice way to do it.
I think Elon Musk maybe linked one of your posts in a tweet at one point.
So I'm assuming this would have helped.
Yeah, I got 1.4 million hits on a page from that one.
And yeah, without Cloudflare, I would have instantly melted.
Well, by the way, I think that also resulted in your first ever TV appearance, right?
That's right.
It was last year.
It was when Microsoft Bing added their chat thing.
And it was February. It was February last year. It turns out it was GPT-4, which hadn't been released yet.
So Microsoft Bing was our first glimpse of GPT-4. And they hadn't quite figured out the personality.
It was some early prompt engineering and it went completely wrong and it started threatening people
and it tried to break up.
Kevin Roos from the New York Times,
it tried to break up his marriage.
It was just joyfully bizarre.
And yeah, so I, there were all these people,
these posts on Reddit from people going,
yeah, it just told me that it wanted to have me arrested
and all of this kind of stuff.
So I put up a blog post where I just collected together a bunch of examples of this and yeah and elon musk tweeted a link to it
and it was on the hack news home page and yeah i got interviewed on live live tv news out of chicago
talking about trying to get trying to reassure people that this thing wasn't going to steal the
nuclear code even though it said it was this This is no Terminator. Yeah, that was deeply entertaining.
Oh, I can imagine.
By the way, coming back to LLMs, and before we actually go there,
you mentioned this tool that you use to move your blocks from your site to Substack.
Is this tool something open source that people can use, or this is something that-
Yes.
Well, it's an observable notebook.
And yeah, I think it's an observable notebook um and yeah the i think i think it's
probably linked to from about page we can stick it in the show notes it's a little observable
right now it only works against my blog so it's useful for me to create my own newsletter but
because the observable notebooks is a it's a platform where you can create basically write
a sort of interactive document using JavaScript,
and the code is visible in it. So you can dig through and see exactly how it's working.
It's quite complicated because it actually pulls the content from a dataset instance.
Like Dataset is my major open source project, which lets you build a JSON API on top of a
SQLite database. But my blog is running a Postgres database in Heroku.
So there's actually a whole chain of things
that make this work where I've got code I wrote
that does a backup of my blog from Heroku into JSON.
And then it loads that into a SQLite database
and then it publishes the SQLite database with dataset,
which gives it a JSON API.
And then my notebook can use fetch calls in JavaScript
to run SQL queries against the JSON API
to pull in the content, to assemble it into Markdown,
which it renders as HTML, which I copy out.
It's beautiful.
Like it's this giant convoluted chain of things
that somehow works really well.
It's basically sounds like Unix,
but in a notebook where you pipe the input.
Oh, totally.
And I'm big into the...
The Unix philosophy is big in a lot of the work that I do.
I love tools that just...
You can pipe tools together.
I said I've got 250 projects.
That's because they're all little tiny things that you can then plug together to pipe things
from one to another.
So talking about the Unix philosophy, I want to talk about the
LLM CLI tool that you developed. And I actually recently came across it while researching for
the episode. And I found it to be super cool because a lot of times I don't want to leave
the terminal that I'm on. I just want to ask the question there. So having something like this,
where you can choose the model you use, or sometimes even run it just locally when you don't have Wi-Fi, for example, that sounds super cool.
Can you tell us a little more about this tool?
Yeah, so this was, I built this last year.
I started this project last year.
The idea was originally OpenAI were effectively the only interesting game in town for quite a while.
You know, with GPT-4, they had such a lead.
Today, that's not true at all.
There are a bunch of amazing, like, competing models
that I'm often using instead of OpenAI.
But yeah, so my initial idea was,
you can, the OpenAI API is kind of cool.
I like hanging out in the terminal.
It would be great if I had a way of
not just running prompts from the terminal,
but also piping data to and from the model,
because the Unix piping idea is always like,
you get some content, you pipe it into another thing,
which transforms it, you pipe it back out again.
That's all language models are, right?
They're a function where you can give it some stuff,
it does something, and it gives you back,
you give it input, it gives you output.
And so the original idea was to build a little API client
for OpenAI where you
could basically say, LLM space, double quotes, how do you do a for loop in bash? And then you hit
enter, and then it spits out the answer on your terminal. But you can also pipe things to it. So
you can say, cat, hello, world.py, pipe LLM, and then give it an extra prompt saying, explain what this code does, or rewrite this in
C or whatever. And I noticed that nobody had the PyPI, the Python package index, nobody had
reserved the LLM name on there yet. And I'm like, oh, I've got to have this. So I grabbed this
beautiful three-letter, so pip install LLM is how you get it. And that was fun. And then a few months
later, I wanted to start playing with all of these new models,
the Lama models that run on your laptop and so forth.
And I'd already built plugin systems for other software that I've developed.
So I thought, okay, what if I had plugins so you could install a new plugin for this command line tool,
and now it can talk to Anthropics Claude, or it can talk to Google Gemini,
or it can run Lama on your computer directly. And so I built that. And now there's over 200 different models that this one command
line tool can run if you install the right plugins for it. Other people have written plugins. The
great thing about plugins is it's a way of building an open source project where you don't have to
review people's code to add features to your thing. Like I can wake up one morning and
my software can do a new thing because somebody else released a plugin for it. It's amazing,
right? It's the best form of open source contribution as well. Because if you write a
plugin for my project, you're not asking any time of me to sort of review your code and interact
with you. You're just putting it out there into the world. yeah so you can llm install um all of these
different plugins for all of these different models and the other big feature of llm is that
everything it does is logged to a sqlite database so anytime you prompt it the prompt is logged and
the response is logged and it records which model was used so you can actually use this for model
research where you can run the same prompt against five different models and now you've logged all of
the responses and you can go and compare them later on i've got like 3 000 prompt response
pairs that i've recorded just in my own local database from tinkering out around with this thing
which is i'll be honest i don't go back and use it to compare the models as much as i want to
but the data's there you know i've hoarded the data at some point i can and it's also just useful
to be able to say okay okay, show me the logs
of my conversations, search those logs,
export the log of this conversation
and publish it somewhere.
So yeah, that's been super, super fun.
It does, it's very distracting because it means
that whenever a big new model comes out,
I lose half the day to spinning up a new LLM plugin
so that I can try it out.
But it's been really, and it's super useful.
Like I use it several times a week.
Most of my personal usage is still through the web interfaces.
I use claude.ai and I use chatGPT daily,
like every single day for the last year and a half, basically.
But yeah, having it on the command line as well,
it gives you all of these other options.
It's also just really fun for hacking things together.
Like you could write a bash script
that implements a retrieval augmented generation
against something like by scraping a webpage
with a curl command and then piping that to LLM
and running it against Lama and all of that kind of stuff.
It's really, really fun.
It's like some of the guests that we've talked to,
there's this sort of idea of,
oh, you need to kind of build up this,
what do you call it?
This habit of like going to the LLM first
before you like try anything else.
I was curious, yeah,
like what has it been like in this one and a half years for you
for like kind of using it daily?
Did you, when you started,
do you have to kind of force yourself to be like,
hey, you know, I need to,
even though I know the answer, I'm going to still still go to the lm just to see how you how did that like evolve
over time that's interesting i don't think i've not been turning to lms for things i already know
the answer to but you know i'm as a software engineer every single day i run things that i
don't know and it might be a for loop in bash,
it might be something a lot more interesting and complicated than that. So I don't, I mean,
the problem with LLMs is they're actually really difficult to use, which is very unintuitive.
Everyone assumes that they're easy because it's a chatbot. You type things, it says things back to you. But to use them effectively, you need to build this really really really deep model of what they can and can't do
like i would never ask an llm to count all of the the instances of something in a paragraph because
i know they can't count which is totally non-obvious right it's a super sophisticated
computer system how can it not count computers are great at counting that's like what they've
been doing since we've invented them um i know that it can't look think i know that if i i've got a question where if i if i have a like a sort of if a friend of mine could
read read a wikipedia page and then answer my question then i know that the llm will be able
to answer that question but if it's the kind of thing which the wikipedia page probably isn't
going to cover it's less likely that the llm will be able to answer it but it's difficult because
you really do just have to put the time in like you've got to spend um a friend of mine says it's less likely that the LLM will be able to answer it. But it's difficult because you really do just have to put the time in.
Like you've got to spend, a friend of mine says it's 10 hours
is the minimum you have to spend with a GPT-4 model
before it really starts to click what these things even are
and how to use them.
And I think to develop that level of expertise
where I can look at a prompt and I can, 90% of the time,
I will predict correctly if it's going to work or not. Like I look at somebody's prompt and say, yeah, of the time, I will predict correctly if it's going to
work or not. Like I look at somebody's prompt and say, yeah, you're asking it to count things.
That's not going to work. Or you're asking it about like quantum physics, but it's the kind
of basic question that an undergrad like student in quantum physics would answer straight away.
It'll definitely get that one right. Right. But having that intuition is, it takes a long,
long time to build up and it's not transferable.
Like I love teaching people to use this stuff, but I can't just dump my intuition into their head.
I can't be like, boom, here you go. Now you'll be able to use these things effectively.
Like one of the lessons I think people need to learn as quickly as possible is you've got to run prompts where it gets the answer wrong in a really confident way like that earlier do that the better because otherwise you you can
go into the decided that it is this sort of like science fiction AI that knows
everything and so when I'm evaluating new model I always start with an ego
prompt I ask it about myself I say provide a career outline for Simon
Willison and because I've been blogging for like 20 odd years it knows a lot
about me right there's a lot of stuff that ends up in the training data but
it's still it makes it like they often say that I'm the CTO of github I have
never worked forget or that's I had one tell me the other day that I'd been to
Oh a university that I hadn't been to like those kinds of mistakes so
generally if you if you know somebody who's sort of internet
famous right they're not like a a celebrity but they've been around on the internet for long
enough that there's stuff about them in the training data asking questions about them very
quickly exposes that these things are not knowledgeable that they're spitting out
statistically likely text from their training data and that's so important like that it's it's crazy to me how to get the best results out of these things you need to have expertise in what
they can do so experience using them you need to have a bit of expertise in how they work right you
don't need to understand the matrix multiplication and the key value pair and all of that kind of
stuff but you do have to understand that they come from training data they're doing next token
prediction you need to have that sort of basic level and you have to be a subject expert
in what you're doing with them right like as a soft an experienced software engineer i can do
amazing software engineering with an lm because i've got that expertise in what kind of questions
to ask i can spot when it makes mistakes very quickly i know how to test the things it's giving
me i like occasionally I'll ask it legal
questions. Like I'll paste it in terms of service and say, Hey, is there anything in here that looks
a bit dodgy? I know for a fact that that's a terrible idea because I have no legal knowledge.
Right. So I'm sort of like play acting with it and nodding along, but I would never make a life
altering decision based on legal advice for an LLM that I got, because I'm not a lawyer.
If I was a lawyer, I'd use them all the time because I'd be able to fall back on my actual
expertise to sort of like make sure that I'm using them responsibly.
By the way, I can attest to that one part where if you search for internet famous people,
these things will, or LLMs, will very confidently tell you stuff which
is not true. I've experienced that in doing research for a lot of our episodes, including
this one, where I saw this like, oh, and what I've started doing now is I would actually tell it,
give me the source information. And at least, I mean, I use chat, GVT more than anything else.
And I would actually say, give me the source for this information.
And surprisingly, when it comes to fetching information from certain podcast transcripts,
it's decent at doing that.
But it's horrible at attribution because either transcripts are faulty or it just doesn't
know who said what.
And the other thing is it'll actually start sourcing things,
which look like links, but they're not clickable.
If you search for that exact string.
That's my favorite bug.
Yeah.
I wrote something about this last year.
Because before ChatGPT had browsing mode, it would do that all the time.
It was amazing.
It would just hallucinate these URLs.
And one thing that you could do that's really fun is you could give it a URL and say, summarize this article. And even though it couldn't access the web back URL, it's a 404 page, and then you paste
that in and it would confidently write a story as if it was a wired story about
that happening. Like just utterly, like Claude now, because Anthropix, Claude
can't access the web, they do at least have a little inline hint that shows up
and says, by the way, I can't access the web. But yeah, it's, that's, that was a
great one because people got so confused by that one.
People who were absolutely convinced that ChatGPT could summarize webpages
because they'd seen it do it dozens of times.
And you're thinking, wow, you've probably spent the last two months
consuming summaries of webpages that were entirely made up.
And you do not want to admit to yourself that you've got two months of crap.
It's fascinating, right? There's so many traps in all of this stuff. And the interesting thing, I think perhaps you mentioned it in one of your talks that
LLM interface is kind of interesting because it's just a simplified
interface where you just get dropped into this chat box and you
kind of have to discover the capabilities as well as limitations of the system
like you can't find things out that it's capable of doing it's it's like taking a brand new computer
user and dumping them in a linux machine with the linux prompt and say there you go figure it out
right like it's it's a joke it's an absolute joke that we've got this incredibly sophisticated
software and we've given it command line interface and launched it to 100 million people what what were we thinking yeah one of
the things i'm most excited about is alternative interfaces to these systems which we're beginning
to see some really interesting stuff starting to crop up there um but i mean the chat interface it
is really powerful and useful but it's such a bad way to onboard people and they've they've nodded
like at least now these systems they'll at least give you a few ideas they'll be like why not try
to get it to cheat on your homework or whatever but come on you know we could we could do so much
better oh so i gave uh i said i introduced chat gpd to my mom she doesn't speak english but
recently uh she wanted to send a message to someone and she was asking me to help her
format things a little bit. Format meaning like she had a rough draft and she's like, help me
improve it. And I was like, you know what would be amazing at it? Chat GPT. So I just gave her
the phone and I said, just speak into it. Forget about typing. And she's like, but what do I say?
So just looking at that microphone prong, for for example like she had no idea what she could
do but once she got started oh boy she won't charge a video on her phone now uh for all sorts
of things she wants to do honestly people who don't who don't speak english who have english
as a second language this stuff is incredible right absolutely amazing and that's something
like i feel like that's something that people often like there are lots of people who are very
cynical about this technology and there are a lot of reasons to to
there are a lot of like reasons to be concerned about it i feel like taking like we live in a
society where if you have really good spoken and written english it puts you so it's such an
advantage like you've got a problem which with, like, the streetlight outside your house is broken, and you need to write a letter to the council to get it fixed.
That used to be a significant barrier.
It's not anymore.
ChatGPT, if you get it to write a formal letter to the council complaining about broken streetlight, flawless.
Absolutely flawless.
And you can prompt it in any language, and I'm so excited about that i feel like the the and it's it also interesting
it sort of breaks aspects of society as well because we've been using written english skills
as a filter for so many different things like if you want to get into university you have to write
one of those like like a formal letter and all of that kind of stuff which used to just it used to
keep people out now it doesn't anymore which i I think is thrilling. But at the same time, if you've got institutions
that are designed around the idea that you can evaluate everyone and filter them based on,
on written essays, and now you can't, we've got to redesign those institutions. That's going to
take a while. What does that even look like? It's so disruptive to society in all of these
different ways. I think this is like a nice plug that I saw on another podcast.
You mentioned that, you know, the thing that I want to spend my life doing is helping people make the most use of these computers.
And we want people to be able to automate their lives.
Right. This is what computers are for. Right. Computers are supposed to automate tedious things in our lives. Right. And if you are a programmer, you can do that. Right. This is what computers are for, right? Computers are supposed to automate tedious things in our lives, right?
And if you are a programmer, you can do that, right?
If you've got a software engineering degree, there are so many problems in life that you can automate away.
The vast majority of people can't do that, right?
They didn't spend two years getting a software engineering degree, which means that they frequently end up having to spend all day copying and pasting things i actually um last year i i was at an event where i encountered i heard from a fire chief
right the guy who runs the a fire station who had just spent the last day and a half copy and paste
copying and pasting names and phone numbers from one crm system into another crm system
because it needed to be done and i'm'm like, this is, how are we taking people
with like jobs of that much importance
and leaving them so that they have to do
this kind of manual copying and pasting
because computers are really, really frustrating to use.
And there's no easy way to do that.
If that guy had a computer science degree,
he could have automated the export from the CRM system
to the other CRM system and saved a day and a half of work and that's the thing which it feels and like there's this idea of end user
programming for years we've been wanting to solve it so that users can actually like program
computers without spending six months learning how to do it like apple like hypercard and apple
script and microsoft excel is probably the best version of this, right? So
many people are programmers every day using Excel and they don't think of themselves as programmers,
but honestly, if you can use Excel, if you can spin up formulas and stuff, that's programming,
that's software, you are building software and automating things. I feel like language models
could be the key to unlocking this. Like we're just beginning to see little hits for ChatGPT Code Interpreter and Claude Artifacts
are two of the most exciting things in the AI space.
And I continually hear from people who,
firstly, people who really are using these tools
on a daily basis who've never programmed before,
but now they can do stuff.
And the other thing that's exciting is
I talk to people who tried to learn to program in the past
and they didn't get over that initial six months of misery where you forget a semicolon and you get an obscure error message and you get stuck for two hours.
And a lot of people give up. They're like they assume they're not smart enough to learn to program.
And that's not the case. It's that they were nobody warned them how tedious and frustrating it was.
They weren't patient enough to get over that miserable initial learning curve.
Those people, a lot of them are learning to program now because if you get that semicolon
error and paste it into ChatGPT, it tells you the fix.
So it's like having a teaching assistant on hand 24 hours a day who you can call over
and they go, yeah, you put the semicolon there.
Amazing, right?
Absolutely amazing.
I was talking to somebody just the other
day who had who's a very experienced professional in their own field and they've spent the last two
months programming and really enjoying it having tried and failed to learn a dozen times because
they've got this new assistant that can help them that's amazing right that's as a professional
programmer there's a little tiny aspect where you're like, OK, does this mean that our jobs are going to dry up?
I don't think the jobs dry up.
I think more companies start commissioning custom software because the cost of developing custom software goes down, which I think increases the demand for engineers who know what they're doing.
But I'm not an economist.
Maybe this is the death knell for six-figure programmer salaries, and we're going to end up working for peanuts.
I don't know.
I guess we'll just find out.
So there's a lot to unpack there,
and I want to take a couple of directions.
But before we go forward,
there's one thing you mentioned
when you were talking about the LLM
being the interface for talking to these models.
So I wanted to read one of your tweets
where you're asking a question on Twitter or X. What are the LLM driven products that people use
which don't have this chat interface? I'm sure you would have gotten fascinating answers,
but I'm actually curious, what are some tools that you use which don't have this
chat interface on top, but are built with LLMs?
That's a really good question. The most obvious one, GitHub Copilot was the first mainstream non-chat based.
And actually, GitHub Copilot predates ChatGPT.
Like that was a thing before ChatGPT came along.
And that interface, the sort of gray text which you get to approve,
seems so simple and obvious now.
They iterated on that a lot.
Like the team that built GitHub Copilot, they were the first to sort of figure out how you do LLM integration into IDEs.
They put a heck of a lot of research and work into that.
They came up with something.
It's one of those things where a lot of the really obvious ideas weren't obvious at all until somebody did the work to get there.
So GitHub Copilot is my favorite example. I'll be honest, on a day-to-day basis,
I'm not using anything that's not chat-driven that I can think of, but I do use the alternative
inputs a lot. I use the voice mode on ChatGPT, and I've been playing with the Google Gemini one
a lot. I can go on a walk with my dog with AirPods in
and I can write code walking my dog
because I get ChatGPT to do it over the audio thing.
It's amazing, right?
So I use that.
Images.
I love image inputs.
I feel like image inputs are actually still quite new.
GPT-4 Vision was announced in November last year.
So we've only had, and these days,
all of the models have amazing image inputs,
but that's still like not that,
it's still quite a new capability.
So I will drop in screenshots of like a rough mock-up
of a thing and get it to do HTML and CSS.
I'll drop in screenshots of error messages,
all of that sort of stuff.
The coolest demo still that I've seen of alternative UI
is the TL draw guys.
The TL draw team did this thing called make it real,
where you've got this browser based vector editing software.
So you can draw boxes and lines and add text.
And they added a feature where you can then select a mockup
and click make it real.
And it sends a screenshot of that to GPT-4 gets back um like
tailwind HTML CSS JavaScript and it pops in a working version of the thing and you can literally
like you can draw a you can draw a calculator like a Fahrenheit to centigrade Celsius calculator
just draw the boxes put c f and a calculate button and you don't even tell it what's supposed to
happen you say make it real and, oh, I bet clicking that button
should calculate that from Fahrenheit into Celsius and update the two boxes.
It's extraordinary, absolutely extraordinary.
And that feels like there's so much more to be explored around that,
this idea of, okay, so we've got an interface that lets you draw something
and we can pipe that through an LLM and turn it into working code,
that kind of stuff.
In my own work, I've been experimenting with the...
So Dataset is software for data analysis.
It loads... You have a SQLite database full of data,
and it gives you a UI for exploring it,
a JSON API for running queries against it, that kind of thing.
So I've got one plug-in for it, which is a ask a question in English,
and have... That one's using claude haiku at
the moment have that turn it into a sql query and then it'll run that sql query and then a lot of
people who build those systems give you the answer straight away so you'll say how many records were
in california and it'll say 230 records were in california that i think is a bad idea because in
my experience with it it gets the right answer
four out of five times. But what in five times? It'll like do it where state equals CA, but in
the data, it was where state equals California. So it gets zero results, right? And that's a
disaster, right? You've just given somebody the wrong answer to their question. So instead,
I'm redirecting them to the SQL query page. So you at least see
this. So if you're SQL literate, you can look at that and go, oh, it's search for CA, not California.
I'll fix that. If you're not SQL literate, it's not great. I'm trying to figure out, okay, do I do
a human explanation of the query? Should I show like a join diagram? What are the other things
that I can do to try and make this more obvious? I like the idea of showing you're working with these systems. But yeah, so that's one of my
experiments is ask a question that gets into the SQL query, you get shown the SQL query,
all of that kind of stuff. There's so many more things like that, that we can be experimenting
with. So that approach is fascinating. I think in this case, what you're the way you are at least
building this application is assisting people right at the place where they
would ask the system a question so like what typically happens when i am interacting with
sql databases is so i use db where for example to connect with some of our internal my sql tables
i used to be really good at sql a few years back i haven't written sql in at least the last four
years i can write simple queries but when it comes to doing
things beyond joins, where you need like a bunch of unions
and join the other things and so on and so forth,
I'm like, I can do that, but I'm lazy.
So I would go to something like chat GPT,
give it a simple prompt and say, give me the answer.
And then I run it.
So I like the way you described what you're building,
because in this case, in the same prompt,
a user can say, I want to do this.
It's like you see, it's kind of a debug log of sorts.
You see what it generates.
Exactly.
You click it right there.
Also, so I use language models for SQL queries
just all the time because they're so good at SQL.
Like they're really, really good
at sort of advanced SQL queries,
all of that kind of thing. The problem is
you have to copy and paste the schema in first.
You've got to give it the schema so that it knows what to do.
And I'll do that,
but again, when I'm building it myself,
invisible to the user is I'm
sending the schema. I can actually...
I've also started experimenting with sending example
rows. The thing where the state
column might be CN, it might be California,
send three example rows, and the language model cotton's on.
It's like, oh, okay, I should search for Florida because I know that it's full state names in this column.
So, yeah, tricks like that are super important.
I feel like generally if you're a developer working with these models, it's all about the context, right?
What matters is it's all about the prompt.
And the most interesting thing about the prompt is that you can slap in a full copy of the SQL schema, five examples of queries that have run in the
past, those kinds of things. That gets really interesting. I'm a big fan of the term prompt
engineering, which is a term that a lot of people make fun of. A lot of people are like,
come on, it's chatting to a chatbot. How is that engineering? But I feel like those people are missing the craft of this thing.
Like, forget about chatbots.
For me, prompt engineering is about figuring out, yeah, okay, for a SQL thing, we need to send the full schema.
And we need to send these three examples and these three responses.
We need to prompt it in this specific way.
That's engineering.
It is engineering.
It's complicated.
It's very, the hardest part of prompt
engineering is evaluating it's figuring out okay of these two prompts which one is better i still
don't have a great way of doing that myself like that to me is the the people who are doing the
most sophisticated development on top of llms are all about evals they've got really sophisticated
ways of evaluating their prompts i aspire aspire to get to that point.
Like I'm still trying to figure out the best way to do that.
Yeah, reading your post was really helpful.
Like I love how you include,
it's like, hey, this is my first prompt.
This was like the code that you split it out,
you spit it out.
And then it's like, this is what's like how I changed it.
Right, like I feel like that modification process
is exactly the most important. That's super important. was like how i changed it right like i feel like that modification process um is uh exactly um the
most important that's super important like as an end user of an llm it's all about the follow-up
prompts like a lot of people who are disappointed in lms will stick in a single prompt say write me
code that does this and it'll spit out a bunch of code and they'll look and go well that was crap
and sure it was crap so now you tell it you say refactor that to not to
write some tests about or this doesn't work but you paste in the error message
and that's all of the work the substantive work that I do with these
things ends up being like 20 or like actually to be honest often I'll get
there with two or three follow-ups but sometimes you you you you go you go
longer than that so I will always try I love sharing my prompts because these
things are so hard to use.
I feel like it's beneficial
to show people what you did.
And so I'll very frequently,
I'll share chat GPT transcripts.
I built my own tools
for sharing Claude's transcripts
because they don't have a good,
like full transcript sharing thing.
My LLM tool makes it easy
to pipe out the logs into Markdown format.
I paste those into a GitHub gist and then share that.
A little habit I've got is that when I'm sharing these things,
I like to put them in private gists because GitHub private gists aren't
indexed by search engines, but you can link to them.
So it's a way of avoiding polluting the internet with giant mounds of LLM
generated text, but still letting people like giving people links. They can go and see it. That's just a little habit that I've got.
By the way, are there any prompt engineering resources that you've found to be useful?
One, just one, the Claude documentation. Anthropic are the only team who have really invested in good
documentation on how to prompt their models
there are i mean there are a million sources of like millions of people on twitter will tweet
like crazy prompting tricks like threaten your grandmother and all that kind of stuff
uh and and honestly some of those some of those are good tips the problem is filtering through
them so if you want to read something which is reliable and
like i trust the anthropic prompting guide a lot that's not to say there aren't other good prompting
guides out there but that's the one that if you want one resource that's the one i send people to
and you were describing a data set and using llms to power some of the features so for folks who
haven't been paying attention i want to say there's been a theme of SQLite
in a bunch of things that you do with your blog,
with LLM, the CLI tool, with Dataset as well.
And you've built a lot of data analysis tools
and worked on them over the last few years.
How are you thinking about this integration?
Because at least when I first just learned about LLMs and I thought, well, having them answer random questions is cool.
But I want them to do things on either my data or the context that I provide.
And this idea of context was bizarre.
At least it didn't make sense to me very initially.
I thought you always had to just fine tune things on top. So, and I was discussing some
of the ideas with my wife and she was like, well, you're not thinking about, she works on some of
the LLM stuff. So she was like, you're not thinking about it in an LLM first way. That's just not how
you build applications on top. A lot of it is just prompting to build stuff on top. So I'm curious
when you're thinking about
building some of the features in Dataset,
how do you go about building these features?
And is that different from doing traditional software engineering
where you rely more heavily on prompts than APIs, for example?
Yeah, I mean, this is, like, as a software engineer,
LLMs are incredibly frustrating because they are non-deterministic, right?
You give them, you tell them to do something,
and there is no guarantee that if you'd say the same thing twice,
you'll get the same answer back.
Even if you fix the seed and turn the temperature down,
you still might get slight differences.
Unit testing, how do you unit test something
which has a random number generator almost built into what it spits
out really frustrating and difficult um i it's it's working with the computer that sometimes
just straight up says no right it might refuse to do a thing that's really difficult because
the sort of larger theme of my work is around data journalism this idea of um like helping
journalists analyze data and find stories in it data Dataset was originally designed for data journalists.
It turns out it's applicable way outside of that field as well.
But that's always been the sort of framing that I hold for this.
And a challenge that journalists have is that if you're a journalist,
some of the source material you work with is nasty, right?
It's police reports about violent incidents.
It's fascist message boards, all of this kind of stuff.
Right now, if you've got an LLM that's helping process these things and you like ask it to summarize the themes from this fascist notice board,
it's going to say no, right? A lot of the LLMs will just straight up refuse to process that,
which as a journalist kind of makes them not, it doesn't make them useless, but it greatly limits
how useful they can be in all sorts of different things. Like if you analyze 10,000 documents and it 9,999 of them, it does analyze and one of them it rejects. Maybe there was something important
in the one that it rejected. Like this is very frustrating. But yeah, so working with things
that sometimes say no is really confusing. It means that you have to, you always have to keep
the human in the loop. Like I feel like anytime you have an LLM
doing something for you, and then the result of that is used for something, and there's at no
point could anyone spot if something had gone wrong, that's going to almost certainly lead you
into difficulties. But then there are things they are good at. My favorite application of LLMs
in journalism, and I'm getting the impression this is one of the most important business
applications generally, is this idea of structured data extraction so you've got a document that's
just typed up or even handwritten and you need to pull out who are the people and what dates are
there and like that that and what are the job titles they are so good at this so good at this
and that's like data entry is one of the most frustrating aspects of anything
involving computers data analysis like journalists often need to do data entry on thousands of
documents but they can't do that they haven't got the the person power to to go ahead and actually
do all of that work giving them access to an llm that can do that data entry and with data entry
if the llm gets it 95, that's probably what you'd have
got if you got a room full of interns doing the same data entry. The accuracy is not perfect,
which is unfortunate, but a lot of these things not being completely perfect is still incredibly
valuable. So the AI features I built for Dataset, there's actually three. There's the one I talked about, the ask a question, get back a SQL query.
There's one called Dataset Extract.
And the idea there is that you can define a SQL-like table.
So you can say, I want a table with restaurant name, restaurant address, number of Michelin stars.
And then you paste in the copy of an article that talks about new restaurants and Michelin stars, and it will populate the database for you. That works so well. That's like absolutely fantastically
effective.
I have a question on that. So in this case, a user is just sharing two things. One is,
here's a document that talks about Michelin star restaurants and kind of prompting the
system to say do X. But that's what's happening behind the scenes is a little more than that.
So what are the, maybe I'm using the word wrong,
but system prompts that you then add to what the user provides
to make it do the right thing?
I'm going to have to look.
I will look that up right now because I can't remember.
I think it's a very short one.
Let's see.
I don't know if I'm even using a prompt
for that one because I'm using the structured data. Like with OpenAI, you can give it a
structured, you can give it a schema, effectively a JSON schema. Oh, no, I get the user to provide
additional prompts. So when you're doing this, you can put in an extra prompt that says
only include restaurants that are at least two Michelin stars, like for example. And then I to provide additional prompts. So when you're doing this, you can put in an extra prompt that says,
only include restaurants
that are at least two Michelin stars, like for example.
And then I have a tiny,
I say extract data matching this schema.
And then I give it the schema in terms of,
you know, there's name string,
or it's an array of objects.
And each object has a string called name
and a string called location
and an integer called number of stars. That's the thing like we can stick in the show notes it's
very very simple because this is so sort of fundamental to what these things can do but yeah
and then i let my users add additional like prompt instructions if they need to one something i use
a lot is um when you're extracting the date, format it as year, year, year, year, month, month, day, day.
Little clues like that.
Well, that's it.
It's spectacularly powerful for how simple the underlying system is.
That's actually a good example of a UI for these models
that isn't just a chat UI as well.
It's a paste in some text, or it accepts images as well.
You can drop in an image, give it a schema you select from,
you sort of type in the name and you select text from a dropdown, then type in a name and select integer from a dropdown.
That's effectively it.
And it works.
It works really well.
And then my third feature is I have a feature where you can basically run a prompt against every row in your database table. So you might have a table with 100 restaurants in and you'll say, in a, you can say, enrich this data for each of these 100 rows, write a haiku about this
restaurant and stick it in the haiku column. I, haikus come up a lot for this stuff. And that's
it, it works. So yeah, that, those are the three things, the very, very sort of early, like steps
in what's possible with this but yeah the applications to
finding to data analysis data cleaning finding stories and data it's almost overwhelming how
much potential there is there so i want to try something today uh but one of the things that
we do uh before we record a podcast is we research about the guest to educate ourselves
to inform the conversation as well and guang has been building an amazing tool that helps
us collect a lot of this information guang can describe more of what that does but i'm
curious if you've been exploring lms a lot so i'm curious to get your input on this if
our goal is to, given an
internet famous person, we want to know more about what they've done in the recent past or let's say
over the years and we want to get notes for where the conversation could go and obviously we want
to dig more into it. How would you go about doing something like this with LLMs? So for this
particular thing, the one thing I would not rely on is them doing the
research, them knowing about, because like we said earlier, for people who are internet famous,
it will make stuff up all the time. What's way more interesting, it's find reliable information
and dump it into the LLM. So go and like grab their RSS feed from their blog or all of their
recent tweets, which is harder now because Twitter doesn't really have an API you can use.
Really frustrating.
But yeah, or transcripts from other podcast episodes that they've been in, anything like that.
And then what I'd do, I'd use Google Gemini because Google Gemini's signature feature is that it's got a one million token or even 2 million token context, which like clawed and opened AI cap out about 200,000.
So it's like five times the amount of stuff that you can pipe into it.
Plus Gemini can accept audio clips,
which I haven't really played with very much yet.
It accepts video.
So what I would do is I'd experiment with audio and video,
but out of interest.
I wouldn't necessarily
trust those to be the most effective way of doing it. I'd basically try and gather as
many tokens about that person as possible. So copy and paste crap out there, Wikipedia
bios and anything that they've written, all of that kind of stuff, copy and paste all
of that into Google Gemini and then prompt it with, we are interviewing this person,
what are some themes that we should do?
I think that will work amazingly well.
I think you'd probably,
as long as you're feeding it the source data
so that you know that the source data,
again, don't even trust it to go and read web pages
because who knows what it's going to do.
But copy and paste is the best API, right?
Copy and paste half a million tokens of information about that person in.
I am certain you'd get good results out of that.
I'm going to give that a shot.
That feels like it would work really well.
The prompting trick that I use a lot is, especially with these longer context things,
is I always prompt and say, identify core themes for topics we should talk about illustrate each one for each one
provide two illustrative quotes from the source material so then it'll say you should talk to
simon about his llm tool simon said quote llm is my something tool for something something something
partly as a fact checking mechanism because then you can take the quotes it gave you and you could
search in the source material and see if it made them up in my experience it doesn't make those up if you ask for
direct quotes it might even like fix a type of fix the punctuation or something but i can't remember
having asked it for direct quotes where it did completely invent a quote which is useful it's
not to say it wouldn't do it, but it's a good trick.
Yeah, that's super helpful. Thanks for sharing that. And I wanted to talk a little bit more about
LLM-enhanced development of sorts. So I like this quote that you had in one of your talks,
where you said, LLMs kind of make you more ambitious and the way you go about thinking about technology or any new
technology is how it makes things possible which are impossible before or how it makes you faster
or build things faster given all of this that's going on can you share a few examples of where
LLMs have made you more ambitious or have you tried things which you
wouldn't otherwise? Some of the recent examples that you're most... So many. Yeah. I mean, so many.
This is the thing is that, so as a software engineer, when I'm building a project, I like
to have confidence that I've got most of what I need to build that thing, right? If I'm going to
have to like learn Objective-C from scratch to do a project,
I can't necessarily justify investing the time.
I will try, I will find a different project to do.
LLMs have kind of changed that equation for me.
My earliest example of this is I've like had a Mac for 20 years.
I've never learned AppleScript because AppleScript is a weird, weird programming language.
Like I've heard AppleScript described as the world's only, it's a read-only programming language.
If somebody shows you some AppleScript, you can go, oh, I get what that does.
And then you sit down to write it yourself and you have literally no idea what you would do to make it do anything useful.
ChatGPT, it turns out, is so good at AppleScript, right?
It knows AppleScript.
The thing I wanted to build is I wanted to export all of my Apple Notes into a plain text format.
And I asked for the AppleScript to do it.
And it knocked out six lines that looped through every Apple Note.
And for each one, I output the title and the body.
And I ended up writing a little Python program on top of that that embedded AppleScript in a Python program.
And now I've got a command line tool that can export my notes to a SQLite database.
That project was impossible. It was impossible for me to build that previously,
because I would have had to spend, realistically, probably a solid week getting my head around AppleScript, which is not a well-documented language either. And instead of that full week,
I got a working prototype in five minutes that proved to me that the thing I wanted to build
could be done.
And once you've got, like, my style of development is all about research and prototypes. Like,
you build a prototype to prove that the thing is possible and to fill in those gaps in your
knowledge about what you need to know. And then writing the software around it's easy once you've
figured out the Apple script you need to get the notes out, whatever it is. So that was an early
example. And that just keeps on going.
I have production code written in Go right now,
despite if you asked me for a for loop in Go,
I would have to go and look it up.
I'm not fluent in Go, but the code that I wrote in Go with the help of, I think that was Claude 3 Opus I used for that one,
it's fully unit tested.
It's got continuous integration,
so when I commit to GitHub, it runs the test. It has continuous deployment, right? If the test pass, it deploys the thing. All of these things which I see as essential for production grade software.
And I feel good about it. Like despite the fact that I could not sit down and write it off the top of my head, I know that when I go and look at that code, it's good code. It's well tested. I've thought about the edge cases it's it's like and it's been running in production for six months and serving quite a decent volume of traffic that's really cool
like being able to no no i no longer look at a problem think well ideally i'd use go for this
but i don't know go so i'm gonna just cross that off the list just the other day i um what was the
thing i was working on recently i built a a little Django application that was a,
it's like a webhooks debugging application.
When you're working with webhooks,
the thing you really want is just set up an endpoint
that logs everything.
And then you tell Stripe, hit my endpoint,
and you get logs in your database showing what it sent you.
And then you configure things out from there.
And I've always wanted a Django app for doing this,
but it would take like a day to build that. And I couldn't quite justify spending a day on it. I got Claude 3.5 Sonnet to write the
entire thing. And it took two hours from idea to having deployed working software with unit tests
in production that was solving this problem for me. And it's a great example of a project where
I could just about justify two hours on that problem. I couldn't justify any longer
than that. Like I should just use something off the shelf at that point. So yeah, time and time
again, all of these little projects would not exist without LLMs. Not because I couldn't build
them, but because I couldn't build them fast enough to justify the effort. So this is fascinating
because one thing that we see LLMs being really good at is code because you can test it, you can verify it.
Text, prose, not as much because it's hard to verify.
In a lot of these projects,
what does your typical workflow look like?
So you mentioned like you sometimes have chat,
you have to write code while you're walking your dog,
which is amazing.
For something like this where you spend,
let's say, a couple hours. So can you walk us through what that looks like from prompting to actually
getting the thing in production? I mean, it definitely varies.
There are two types of projects. There are the projects where I know it's possible already,
like building a webhooks endpoint for Django. I know that's possible. I could absolutely just sit
down and write that. So that doesn't need the exploratory prototype, right? Whereas there are other projects like
exporting my Apple Notes. The number one question is, can I even do this? And so if it's got those
unknowns, that's when I'll jump straight into a prototype. And that's normally just have an idea,
prompt an LLM a few times, say, hey, can you write me? Oh, a great tip with LLMs, always ask for
options. So I'll say things like, what are my options for exporting Apple Notes? And it might
say, you could do this, or you could use Apple Script, or you could do this, or you could do
this. That's the best way to work with them. Because if you ask for one option, if you ask
it a question, they'll give you an answer. And if you're lucky, it'll be a good answer. But maybe
it's not ideal. If you ask it for options, one of those four or five options is almost always the best thing.
And you're better equipped to evaluate than it is.
Because, I mean, it's just a random number generator, essentially.
But, you know, it can spit out the...
So I'll often start with, okay, what are my options for solving this problem?
Sometimes I'll say, write me the code for option three.
And I'll do that in...
Normally, I'll have it write it in JavaScript or Python,
because those are my two daily driver programming languages.
Occasionally, I'll try it in Bash, if it's something I can use on the terminal.
That kind of thing.
So if there's a prototyping phase, I'll be using the LLMs as part of that prototyping
to answer those questions.
The moment it turns into a project I'm actually going to try and commit to, I start a GitHub issue for it. And sometimes I'll
start a GitHub issue just for the research, like maybe in my private notes, like figure out if I
can export Apple Notes, and I'll just copy and paste things that I learned along the way.
If it's going to turn into actual software, most of the software I build is Python. And it's mostly, it's Python packages
that I can publish to the Python packaging index. And those come in basically three shapes. They're
either a Python library that I'm going to import and use. It's a Python CLI tool. So something
where I type LLM space, whatever. Or it's a plugin for one of my other projects where I install it
into Dataset
and it adds new functionality.
I've got cookie cutter templates for all three of those.
So cookie cutter is this great little Python tool
that will spin up the directory structure
and the readme and the setup.py or pyproject.com,
all of that kind of junk
based on a few questions that it asks you.
So I've got three public open source
cookie cutter templates that I use to get me started on that. Those set up the initial file structure. They set up the GitHub
actions workflows for testing. They set up the workflow for publishing the package to PyPI.
So if I've picked a name for it, I can write a bunch of code, push it to GitHub, click a button
in Git, or I post a release on GitHub and that will be
published to PyPI. So that entire workflow of writing the code, testing the code, documenting
the code, publishing the code is all automated for the most part, which is a huge productivity
boost. Like I can, I've written, I've got like command line tools that I've published
to PyPI where I had them live on the package index
within an hour of the idea of the tool. That's something... And that's because I've done it 250
times now. So you've got the automation in place. It's just a very, very quick habit.
I love the idea of release early, release often for open source things. If it's an open source
package, I will often... If I'm not confident yet, I'll put it as an open source package i will often if i'm if i'm not confident yet i'll
put it as an alpha i'll say okay this is the 0.1 a zero alpha release and i won't release code that
doesn't run at least but you know if i'm not quite confident that the design's right or whatever
and um some of my projects languish in alpha state for far too long i'm also trying to get better at
committing to a 1.0 release. I've still got
my main dataset projects on version 0.65 right now, I think. So I've had like 65 releases and
I still haven't done the 1.0 and I really need to do the 1.0 for it. But yeah, so that's the
process. I've written quite a bit about this. I've got some good sort of write-ups on how that all
works. Oh yeah, we would love to link that in the show notes i think what's helpful here to note is that it's not just using
llm to tell you what to do but in a way you were in the driving seat you're kind of having it just
assist you uh where you still have a lot of structure around it to make you more productive
yes where it's not just like i call it i call it my weird intern i'll take it i'll say to my wife
i'll say to my wife and Natalie sometimes,
hey, so I got my weird intern to do this.
And that works, right?
It's a good mental model for these things as well
because it's like having an intern
who has read all of the documentation
and memorized the documentation
for every programming language
and is a wild conspiracy theorist
and sometimes comes up with absurd ideas
and they're massively overconfident.
It's the intern that always believes that they're right but it's an intern who you can i hate to say you
can kind of bully them you can be like do it again do that again no that's wrong no that's wrong and
you don't have to feel guilty about it which is great like sometimes when you're working with
other people and like they're like they've done five iterations and you're like you know what i'm
still not entirely happy with this bit but come on i, I'm not going to make them do a sixth.
That's just not fair.
The LLM, you can do that, right?
You can just keep on having to say, oh, actually, you know what?
Rewrite that whole thing in Go.
Or one of my favorite prompts, one of my favorite prompts is you just say, do better.
And it works.
It's the craziest thing.
It'll write some code.
You say, do better.
And it goes, oh, I'm sorry.
And then it will churn out better code,
which is so stupid that that's how this technology works.
Oh, yeah.
But it's kind of fun.
It reminds me of our friend Austin.
So we have a common friend, Austin.
If you tell, if anything,
let's say if you're struggling with anything
and if you go to him for advice
and if you ask him, hey, Austin,
what do you think I should do?
He has one answer for every damn thing and that's try harder and i think it works really well
nice yeah sorry i think you were saying something no no that's not very true
so in terms of interns like the good thing is you don't have just one you have many of them with
like chad gbtpt cloud and whatnot and as
you were describing some of your projects you mentioned have you used different ones uh for
different things i'm curious how do you go about like using one over the other is it more try what
works or do you have a pattern at this point that you go to it's so hard it's so hard right it's um
i've been calling this it's vibes vibes-based evaluation, right? Because
the only way to figure out if a model is any good is you have to use it repeatedly a bunch
of times and try different things about it. And some people are really like, they're really
sophisticated about this. They have like a document full of all their test prompts that
are run through the new models. I'm not doing that. I should be doing that. I sort of, I
have a few, like I have a few prompts that i always run
against a new model just to try and get a feel for it but a lot of the time i go sort of based
on vibes from other people like if a whole bunch of people are saying no seriously i was all about
claude sonnet but now google gemini 1.5 is is better for these things then i'll i'll start
experimenting with that one as well at the moment my daily driver is Claude 3.5 Sonnet.
I think that's the best model,
but the new Gemini 1.5 from like two weeks ago
is getting massive buzz.
So I need to spend more time with that one.
I still use ChatGPT for walking my dog.
The voice mode is amazing.
And for code interpreter.
Like if I'm writing Python code
and I want it to actually test that Python code for me and fix any bugs that it finds i'll go to chat
gpt for that claude also the claude artifacts thing where it can build little interactive web
apps it's amazing like i i i'm using that i use that to prototype up little like things that i'm
actually building i use it to build little one-off tools like a little pricing calculator for something just for me to use um i really love that feature then and on the
command line i'm i love playing with the local models the ones that run on my laptop the problem
is that they are never going to be up to the standards of like cloud 3.5 sonnet so for actual
real work that i'm doing i I tend not to use them.
But because of my LLM project,
I'm constantly tinkering around with them.
I also, I think they're really good
for people learning LLMs
because using a kind of crap one
that runs on your laptop,
it hallucinates way more often.
It makes more mistakes.
It helps you get that mental model
of what they're good at much better
than working with the really good models.
So I always recommend people like five three um gemini uh gemma gemma 2b is really good
llama 3.18b is currently my favorite local model it's quite easy to run it's a four gigabyte
download if you get the quantized version it's it's genuinely useful like it's shocking it's definitely it feels
equivalent chat gpt 3.5 at least and it's really amazing to me that a four gigabyte file can be
that useful running on my own laptop like the compression of these things is extraordinary
um but yeah so it's vibes it's it's it's it's vibes based it's frustrating i wish i had better
benchmarks of my own to try these things out.
And a lot of it also comes down to prompting style.
Like some people will say, oh, no, I tried Claude and it sucked.
And it's like, yeah, but maybe that's because the way you prompt LLMs and the way I prompt LLMs, it's not like I'm doing it right and you're doing it wrong.
It's that your way is more compatible with ChatGPT and my way is more compatible with Sonnet in ways that I don't fully understand.
So you've been writing a lot about LLMs
over the last few years.
And as we were going through your blogs,
there's a lot of new stuff that's coming out.
And in general, one thing that at least I struggle with
is just keeping up to speed
with everything that's happening.
Yep.
It's like two weeks, work's busy,
our life's busy, and then suddenly something has
changed. And I'm not spending as much time building things on top of LLMs, but I'm curious
to just learn more and see what the capabilities are and how it can be useful. I'm curious how you
stay up to date, one, and also filter signals from noise because there's just so much of it. Right. The big one is, so Twitter is still the,
like I tried moving to Mastodon.
I'm very active on Mastodon.
Mastodon is mainly AI skeptics who don't like this stuff.
All of the AI people hang out on Twitter still.
So I maintain presence on Twitter
because that's where the AI conversations are happening.
So I've, following a bunch of people helps.
There are a few accounts that I turn on notifications for.
So I get a push notification whenever Anthropic or OpenAI put out a tweet
because it's always like that's where the big news comes from.
The other thing is like private groups.
You know, I'm on a couple of WhatsApp groups.
I'm in a bunch of different discords.
Those are great.
Like those are sort of the highest signal stuff
will come from a discord I'm in
with like 15 other people
who are very engaged with this stuff
and we'll be sharing notes with each other in there.
And then it's blogs.
Like I blog, having a blog
means a lot of this stuff comes to me.
People will like tag me and say,
hey, have you seen this new thing?
It's relevant to what you were talking about last week. That's super useful. And that's it. And I've got an RSS reader
that's subscribed to a bunch of things and sub stacks and so forth. But the other thing is like,
I don't have a, I'm not employed by anyone else. So if I want to spend a couple of hours because
a big thing just happened, I want to research it. It's nobody to tell me not to, which isn't
necessarily beneficial for my own projects. know it's i don't have that
accountability but yeah i'm in a privileged position in that i can afford to invest the time
in figuring this stuff out as well so uh you mentioned that you're an independent open source
developer and this is something i want to talk about uh but one question that i wanted to ask
before was we're talking about using llms to kind of improve your productivity and being able to build things
faster. But there's one thing which comes up, which is like learned helplessness. In other words,
it's more like your kind of muscles are atrophied. And in this case, your skills are maybe atrophied,
where you can't just write things off the top of your mind uh for example like i remember some time ago in the
recent uh this year itself wi-fi was out and i was writing some code and copilot wasn't working
i knew what i wanted to write but i was frustrated because the damn thing were just not autocomplete
and i was like why is this not working and it's like like, oh, Wi-Fi's out. So I'm curious how you think about that in general.
Yeah, I felt a little bit of that, to be honest. The other day, I went and reported a bug against
GitHub Actions for like, I was saying, hey, I'm running a Windows GitHub Actions thing,
and the version of Python can't load SQLitelite extensions and i thought you'd fix that this is really frustrating and then after i'd filed the
bug i realized that i'd got clawed to write my test code and it had just written sqlite code
that doesn't it had hallucinated the sqlite code for loading an extension and i'd gone and i'd
literally i'd reported a bug and i had to close that bug and say, no, sorry, this was my fault.
That code is wrong.
And that was a bit embarrassing.
I should know more than most people that you have to check everything these things do,
and it had caught me out.
And I'd lost like half an hour of time as well to trying to figure out what was going on.
It turns out it just hallucinated the wrong way to use SQLite.
Python and SQLite are my bread and butter.
I really should have caught that one.
So yeah, this has happened that my counter to this is I feel like my overall capabilities are expanding so quickly. I can get so much more stuff done that I'm willing to pay with a little bit of
my soul, right? I'm willing to, I'm willing to accept a little bit of atrophying in some of my
abilities in exchange for, honestly,
like a two to five X productivity boost on the time that I spend typing code into a computer.
And that's like 10% of my job. So it's not like I'm two to five times more productive overall,
but that is a very material acceleration. And like I said, it's making me more ambitious.
I'm writing software I would never have even dared to write before. So I think that's worth
the risk. A lot of people are worried about the impact this has on new programmers. And I've
sort of got two conflicting opinions there. One opinion is, like I said earlier, people like the
fact that you've got a semicolon, you lose half a day to figuring out the semicolon. That sucks,
right? That's just inexcusably miserable. fixing that for people is is a wonderful thing i
think it opens i think way more people are going to learn to program and i think that the people
who are learned to program will be able to learn faster but i think there are skills they're not
that they're going to skip over i heard a kind of terrifying anecdote from a friend recently where
they had a they knew somebody who was a new programmer. They were just getting
started. They were a professional programmer or just like very early stages. And they,
they were calling code. They used the word, something like goop. And they said, so I got,
I got, I got a chance if you'd spit out some goop and I paste it in, it seems to work.
And then, and this didn't work. So I'm going to spit out more goop and I paste that. And
then now that's working. And they were asked, well, how are you going to maintain the goop
in the future? And they said, oh, I'll just get it to write more goop. i pasted that and now that's working and they were asked well how are you going to maintain the goop in the future and they said oh i'm just going to write more goop and that idea
the idea that code is now goop as a programmer that offends my very soul like that's that's
sort of horrifying but you know it's if you get working maybe maybe we are going to have to like
we have we currently live in this in a in a world where half of the world runs on Excel spreadsheets with no unit tests, with four, which with no backup, no version control, no unit tests.
And anyone can muck up a formula and the valuation of a company goes down by half overnight because that's the world we live in today.
Right. Excel spreadsheets are kind of goop already.
And somehow society functions.
So maybe those of us who are like, no, every line of code has to be perfect.
Maybe we're wrong.
Maybe actually goop is the way forward.
But that's a little bit terrifying, you know.
It is. I think that's precisely what I was thinking about.
That with all of these LLM tools, the amount of goop is just
increasing and not just in code, but in almost everything else. And I think you talked about
slop as well, which is like the unwanted and not good AI content, especially images coming out of
many, many countries right now. In general, if you think about how these LLMs have been helpful from a usability standpoint,
one is they're super cool and exciting and they have way more potential than what we are seeing today.
They've truly been impactful in increasing productivity for software engineers,
where someone who knows what they are doing.
And I think we were speaking with Steve Yeagy week and he mentioned like LLMs are way more safer
or way safer in hands of senior engineers
who know what they need to be doing
as opposed to someone who doesn't.
But we don't control who uses this and how they use it.
And if you think about the quality of software
that's actually coming out,
and I was having a discussion
with one of my friends recently,
and we were talking about this, where these tools are amazingly helpful to make us productive who are quote-unquote some sort of an expert in a domain where you kind of know what's
right but a lot of the other tools that come out they are super nice from a prototype standpoint
from a demo standpoint but you don't see quite as good tools when it comes to a production system
that you would fully rely on. I know RAG is a thing that people talk about too, where demos
are amazing, but then production is like, well, would you trust it to give it to customers?
So I'm curious, what's your take on that in terms of the sheer quantity of things being
built from a prototype standpoint, but the quality isn't quite there yet?
It's really interesting, isn't it?
Yeah, like that.
I mean, so many of these things are completely open questions to me.
I still don't like will society overall in like 10 years time look back on this and say, OK, this technology had more pros than cons.
Or will we just be flooded in slop and be like, wow, I wish nobody had ever invented this stuff at all. And it's harder for
me to evaluate that because I think programmers are the best equipped to use these tools. Like
hallucinations in code don't matter because when you run it, you get an error and you fix it,
right? Like we are, and they're better at code than they are at anything else. So I'm getting
enormous productivity boosts out of this stuff and it looks amazing is that just because i happen to be in the one profession
in this world that is most attuned to the benefits these things can can bring you and then yeah in
terms of quality one thing i've been thinking is you keep every now and then you hear a story of a
company who've got software built for them and it turns out it was the the boss's cousin who's like a 15 year old
who's good with computers and they built software and it's garbage software the quality is absolutely
awful but you know it's it's how these things happen and maybe we've just given everyone in
the world the overconfident 15 year old cousin who's gonna claim to be able to build something
and build them something that maybe kind of works and maybe society is okay with that maybe that because that this is why i don't feel
threatened as a senior engineer because i know that if you sit sound down somebody who doesn't
know how to program with an llm and you sit me with an llm and ask us to build the same thing
i will build better software than they will right that that's there's no question about that at all
um but yeah mate so hopefully sort of market
forces come into play and people the demand is there for software that actually works and is
fast and reliable and so forth and so people who can build software that's fast and reliable often
with llm assistant use responsibly benefit from that that seems okay to me but yeah i don't know
it's um one of my a big big frustration i have is i want like lots
of computer science papers come out about llms i want sociology papers i want all of the sort of
humanities doing research into the impact on these things how do people learn to use them
all of this kind of stuff is and i think that research is happening but in academia it takes
two to three years to get a paper out so like we're seeing papers come out today that is talking about GPT 3.5 from like December of 2022, which is so outdated at this point.
So outdated at this point.
But yeah, it's frustrating.
There's so many open, big questions like this that we don't have good answers to.
Yeah.
We're starting a family pretty soon.
So these are questions that at least I'm thinking about these days and struggling with and don't know the answers to. And I would love to get some
of those research papers as well, which I may ask these tools to summarize for me, which is a
different problem. Oh, yeah. I read academic papers now. I never used to read academic papers. But you
can copy and paste the app. I built a GPT called Dejargonizer. And it's just a prompt that says,
yeah, you paste a prompt that says,
yeah, you paste text and it says,
find all of the jargon terms and define what they mean.
And so I can grab an academic paper abstract
and paste it into de-jargonizer,
and then I'll understand it.
Because they inevitably use like five terms
I've never heard before, but it, yeah, that's so good.
It's so useful for that kind of thing.
So we've been talking about how these systems
are beneficial for senior engineers. And I've been having some conversations with some friends who have kids who are either starting school or already just starting computer science or looking for a job.
Which job market has been much tougher this year and the last year in general.
At least for entry-level engineers.
But let's skip the job market problem for a second.
In general, for junior engineers who have these tools at their disposal
to be productive and learn things much faster,
what advice do you have for them,
for them to also develop some of the skills
that you only develop through making mistakes
or just
building things in production? So I'm not qualified to answer this question because I was a junior
engineer 25 years ago. So I do not have the learner's mindset. I will answer it anyway.
I think it's all about projects. I think build things that do something and ship them.
There is my very strong hunch,
and this is going back throughout my entire career,
the fastest way to learn anything in software is to build something with it.
And also to get beyond tutorials.
You know, tutorials are fine.
You can go through a tutorial and build that thing.
Those will not have nearly as much of an impact on you
as saying, okay, I'm going to build a thing that does this
or take the
inspiration from the tutorial and build something else. It's also, it's great for hiring, right?
I've been a hiring manager in the past. If a candidate can show me stuff that they've built,
that's worth more to me than any degree. Like I've hired people where we hired them. And then
at the end of the process, I realized I never even asked them if they went to university
because it didn't matter because they showed me cool stuff that they'd built,
and they could talk through it.
If you've got a great demo and I can ask,
oh, how did you solve this problem?
What else did you try?
We can have an amazing interview.
Also, there's that whole the fizz, buzz, leet code side of interviewing.
I hate that stuff.
I absolutely hate that.
If you've got code on GitHub, which I can read through and I can look through your commit history and see evidence that you know how to fix a bug in a for loop or whatever, I can, if I can hit it in a web browser
and see that you've built something.
On that basis, like I love, I'm a massive power user of GitHub.
I love GitHub pages.
So you can like just build a little static web app, host it on GitHub pages.
It'll live forever, right?
And it's a URL that people can click on and they can start using it.
If you're doing server side code, it gets a bit trickier.
I've been, I was, I've used Vercel a lot in the past.
Vercel, if you don't give them a credit card
so that you can't get accidental denial of service billing problems,
Vercel can be really good.
There's always places that you can host code online
if you look around for them.
But yeah, having live demos of things that you've built
that are hosted online,
I think is the best possible sort of resume
and it's the best way of learning.
And so to this day, I've got a tag on my blog called project
and every time I do a project, I tag it project.
And right now it has 404, oh, good number,
404 items tagged projects. And that's over the course of 20
years, you know. So it's, but every single project that I do, I learn just the tiniest
new thing. And it's also like, if I want to remember how to like do a screen, take a screenshot
using the Playwright framework, I've written code on GitHub that does that and I can go and look at it.
Or if somebody asks me, how do I take a screenshot with that?
I can send them a link to the code that I wrote in GitHub.
So it almost becomes an external memory of everything that you've ever learned to do.
But yeah, for me, I think that's it.
If you're a new programmer, knock out projects.
It doesn't matter what they are, weird little things, fun little things.
It's also a great excuse to do writing
because one of the two easiest forms of blogging
are something that I learned or something that I built.
You can do a blog entry where you just say,
I wanted to solve this particular problem,
so I built this.
Here's a screenshot of it.
Screenshots are amazing because they never break.
I love screenshots.
If you build hosted software, it's going screenshot of it. Screenshots are amazing because they never break. Like I love screenshots.
So if you build hosted software, it's going to break eventually.
Take a little video, take a screenshot, stick those up.
Like when I've coached people going through boot camps before,
and one of the things I always tell them is they always do this sort of end of boot camp project.
And they'll have a GitHub repository with their project in it. And I say, invest in the readme.
The readme needs screenshots of your thing. Like if I'm a hiring manager and I click through, I'm not going to check the code out.
I'm not going to try and run it. But if there are screenshots and a couple of paragraphs saying how
it works, that puts you in the top 1% of candidates if you've got a readme with a screenshot in it.
So do that, right? Yeah. So I think my advice is do lots and lots of projects.
Small, weird projects, whatever
is best. If you can get them deployed, that's excellent.
Then have a readme with a
screenshot in, and that's a really good way
of learning.
That's good advice.
Talking about projects, I wanted to
jump on to life
as an independent developer.
Now, when it comes to someone working at an employer, at a company, for example,
the kind of projects you work on are typically driven by some business priority.
And there are just problems to solve, and you don't need to go look for them very often.
People kind of tell you what the problems are.
Well, let me put it this way.
If you're lucky, you don't need to go look for interesting problems they kind of come to you uh at a company
and you're going to work on those and there's always it's not your job at a company to figure
out what what is the important thing to do that's what the management chain is for exactly so you
always have the steady input of things to do and And at least these days, everyone I speak with has more work to do than they have the time for.
But when it comes to working independently and having to define how you spend time, one needs to be very disciplined.
Also have a way of identifying what you work on.
So I'm curious how you do that, which is like figuring out how do you work on and
keep a structure that keeps you going. Honestly, that is the hardest problem. It's really,
really difficult. So I'm in a very privileged position in that my wife and I ran a startup
for a few years. We sold that startup. It made us enough money that I don't, it's not that I don't
ever have to work again, but I have a substantial runway where I don't have to worry about an income,
which is almost like a requirement
for if you want to go out independently,
especially doing open source stuff, right?
It's very, very difficult to make.
And I'm starting to spin up sort of consulting things
and so forth, because I want to extend that runway.
And ideally, I want to do what I'm doing right now
for the rest of my life, right?
To do that, it needs to be funded.
It needs, I need to have a repeatable source of income for, right? To do that, it needs to be funded.
I need to have a repeatable source of income for it. So I've been building a software as
a service version of my main dataset open source project. That feels like, in open source,
that's one of the most proven business models. It's like WordPress, right? WordPress is open
source or you pay automatic and they run the hosting for it. And they built a really successful
business around that. I essentially want to do exactly that,
because also it's kind of lonely, right,
working on your own projects.
I would like to be able to employ a full team of people
to work on stuff with me.
Like, that's the sort of big ambition.
But that said, honestly,
like, what to work on next prioritization is so difficult
when you don't have any external sort of forcing
factors one of my big thing my i mentioned my week notes earlier just forcing myself to be
accountable every couple of weeks to write the stuff up i don't care if anyone reads them or
not like the week notes are entirely for me they're for me to track what i've been working on
and the progress towards things um i try and set myself deadlines. I occasionally do conference-driven development
where you sign up to give a talk at a conference
and you're like, this project needs to be in a state
where I can actually present it on stage.
The dataset AI features are almost all conference-driven development.
I was speaking at a journalism conference
about ways to use AI in journalism
where I better have the features ready by then.
So yeah, it's really difficult, especially since in the AI space and in software engineering
generally, everything is interesting. In the language model space, I've been calling it
recursively interesting because any aspect of it that you look at, like audio models that can
process images, or how does the training work, or how does fine tuning work just raises more
questions you can just keep on getting deeper and deeper and deeper into any of these spaces
so I don't think I have a good answer to that question to be honest like I've been
kind of coasting on the fact that I don't have financial incentives that that force me to do
something and letting myself go run wild with all of these different projects and i would my number one goal to be honest is i'd like to be more disciplined in terms of saying
okay here are the big goals how can i go after those i do have a goal at the moment and so my
main software data set it's for journalists to try and find stories and data my ambition is i want
someone to win a pullet surprise where for a piece of investigative
reporting where my software was one of the tools that they used so I want dataset to
be part of the mix in some pullet surprise winning investigative reporting and that's useful because
I can say to myself okay am I building the right features am I engaging with the right people am I
making sure it's easy enough to use and all of that kind of stuff? So that's the sort of like one of my sort of guiding ideas at the moment is that.
And so I can ask myself, is the thing I'm working on right now on the path to somebody else winning a Pulitzer using my software?
But, yeah, it's I could do with I sometimes think I sometimes wish I was like raising money from investors just so I had somebody breathing down my neck saying,
you said you were going to get this thing done this is the focus have you done it yet
but yeah I'm still still completely I'm free of all influence at the moment
it seems like in many of these cases creating some sort of uh force and function helps could
be like you said content development absolutely uh by the way you mentioned the startup
that you uh ran with your wife which got acquired congratulations uh i think it got acquired by
eventbrite if if i remember it correctly that's right yes um and you also were at eventbrite for
some time after that and or is that true i think yes six years at eventbrite yes yeah and then you
decided to go independent so i'm curious what prompted that decision to then not continue on the job
because looking at your career, I think you could have gotten any job
if you didn't want to continue at Eventbrite, but you chose to be independent.
I'm curious what prompted that decision.
So what happened is I was at Eventbrite for six years.
I was a director of engineering, focusing on APIs and scaling and internal platform stuff and so forth.
And then later on, I moved into more of a prototyping and R&D role, which sort of suits my interest a little bit better.
I had this opportunity come up where the University of Stanford have a fellowship program for journalists, where the idea is, it's called the JSK Fellows.
And the idea is they take sort of mid-career journalists and they pay them to spend a year
on campus at Stanford, effectively working on a project that is beneficial to the future of news.
And that's a very, very loosely defined. And I heard about this thing and I got in touch and
I said, well, I'm not technically a journalist, but I've worked in a lot of newsrooms.
I've worked for newspapers. I build tools for journalists. I've sort of I'm effectively a data journalist.
Could I be a good fit for this program? And so I ended up being the person on this program who was a bit of the sort of the wild card.
Right. I wasn't formerly a journalist, but I was working on journalist adjacent projects.
And it was amazing. And it completely ruined me because they paid me to spend a year working on whatever adjacent projects and it was amazing and it completely ruined me
because they paid me to spend a year working on whatever I thought was most interesting
and once you've done that it's very difficult to go back to having somebody else set the define
what it is that you were going to do so basically that was the problem is that I experienced freedom
for a year and I'm like I do not want to this up. I'm having so much fun working on these things.
So that's amazing.
And the last question I had on this topic was you mentioned running the startup with your wife.
Now, in many cases, this equation doesn't always work as productively where people who are partners or who live together don't always end up working
well together because of all sorts of frictions i'm curious how so we've we'd been together for
at least 10 10 years at that point um before we got married and we had worked on projects
together before we'd worked at the same companies in some situations um we had a whole bunch of
little side projects that we'd built collaborating together, which meant that, and that was really important, because we already knew that we
could work together. We knew we had very complementary skills. I do backend development and systems
operations. She does design and frontend engineering. So between the two of us, we can build a really
good web application together.
And the project that ended up being our startup was it started as a side project actually started on
our honeymoon we um we got married and we set off on honeymoon where the plan was to travel around
the world for like a year plus met with our laptops occasionally maybe doing a little bit of
like um like freelancing work to remotely to to to to keep to keep money coming in um and we got as far as morocco um
amazing place place to travel and we got food poisoning in casablanca and it was during ramadan
and in casablanca casablanca is not really tourist trail in morocco so during ramadan
everything shuts down um and so we basically we basically went we had to rent ourselves an
apartment so we could cook for ourselves to try and get through this.
And since we were stuck there for two weeks, we said, OK, we've got this idea for a website to show what conferences our friends are going to.
Let's build that as a little project and put it live.
And it was built. This was 2010 we were building this.
And we built it on top of Twitter.
We're like, hey, Twitter knows who our friends are and we follow people who we were building this. And we built it on top of Twitter. We're like, hey, Twitter knows who
our friends are. And we follow people who we like on Twitter. So you could sign it. It was called
Lanyard. And the idea was you sign it to Lanyard Twitter. And it goes, oh, you follow these 50
people. They are attending or speaking at these 10 conferences. Here are conferences you should
know about. And it works extremely well because it turns out people who speak at conferences have a lot of Twitter followers.
And we actually built the database where we'd say,
oh, at so-and-so is speaking at this conference,
even though they weren't a user of the site yet.
So when we launched, we had like a hundred speaker profiles
and zero users, like just the two of us.
But that was enough that if anyone signed in
who followed one of those speakers,
they'd get a recommendation, which felt like magic.
People like, oh my God,
this thing knows everything.
It's a hundred rows in a MySQL database.
That's the whole thing.
But it worked, right?
And so that,
we ended up applying to Y Combinator,
the startup accelerator from Cairo.
No, from Luxor in Egypt.
So there's a video out there,
which is Natalie and myself
standing in front of
this ancient Egyptian temple, pitching our YC idea. We don't mention the temple at all. We just
played it completely cool that there was this, no, that was in Aswan. It was the Aswan temple behind
us. That was kind of fun. Right. And so we applied to Y Combinator. We got in, our honeymoon turned
into three months in Mountain View in California doing Y Combinator, which was a little bit different.
And then we raised money from that.
We hired a team in London.
We spent three years sort of building the startup before we got acquired by Eventbrite, who moved us out to California.
So that's how we moved to America.
But yeah, so it was a fun startup experience.
But yeah, the whole thing.
And like I said, it started on our honeymoon.
It was a good thing that we'd been together 10 years already and we knew we worked on projects because
it's it's it's a tough thing you know the the when you're literally married to your co-founder you
have to set rules like no talking like no no talking about the company beyond like six in
the afternoon six in the evening that kind of thing um which we did not stick to but it's hard
that was the thing.
And so Natalie wrote up a really good story of the whole sort of startup story,
which I can share a link to as well.
Oh yeah, for sure.
That'd be really cool.
That's a fascinating story.
Thanks for sharing.
So Simon, this has been an amazing conversation.
Thanks for spending way more time with us
than we actually planned for.
We had a blast.
We got to learn a lot,
listen to a lot of good stories and learn about how you use these tools.
Is there anything else you would like to add before we go?
I think, yeah, the one thing I'll add is,
as practitioners using LLMs and using AI,
we understand this stuff better than 99% of the population,
which I think puts a responsibility on us to figure out the positive
ways of using this and then to share that. Like my sort of overall approach to ethics around this
is that we're not going to un-invent this technology. So if we can figure out what are
the things we can do that generally enhance people's lives that make the world a better
place, those positive impacts, and if we stay away from generating like garbage slop and dumping that on people, that feels right. So I feel good about
the way I'm interacting with these tools, mainly because I'm trying to help other people learn how
to use them effectively and sort of get over the kind of weird science fiction fear of this stuff
and say, okay, these are quite dumb. They are good at these certain things. If you learn,
if you put the work in to learn how to use them,
they can have a really positive impact on what you're doing.
That's really well said. Thank you so much, Simon.
This has been an amazing conversation.
Thanks a lot. This has been really fun.
Oh, by the way, I didn't, didn't mention this before,
but I would say two things that I saw in your talk and we'll link to that in the show notes too uh instead of generative ai i think you called
it transformative ai which is pretty amazing and instead of artificial intelligence you said
imitation intelligence which i was which i thought is so accurate uh so thank thank you for those
terms oh and now i'm thinking of the third thing too. You also coined the term prompt injection,
which was-
Oh, yes.
We haven't talked about that yet.
Yeah.
Prompt injection, it's the security attack
against applications built on top of models.
We won't go into it now,
but if you aren't unaware of prompt injection,
you will build stuff with horrifying security holes in.
So you need to learn about this one. then yeah the um imitation intelligence i i owe the
world a full write-up of this it's an idea i threw out in a pycon talk a few months ago yeah i feel
like artificial intelligence has all of these sort of science fiction ideas around it people will get
into heated debates about is i don't think this is artificial intelligence at all all of that kind of stuff i like this so i've been thinking about it in terms of imitation
intelligence because everything these models do is just imitating something that they saw in their
training data like and that actually really helps you form a mental model of what they can do and
why they're useful and it means that you can think okay if the training data has shown it how to do
this thing it can probably help me with this thing if you want to, if the training data has shown it how to do this thing, it can probably help me with this thing.
If you want to cure cancer, the training data doesn't know how to cure cancer.
So it's not going to come up with a novel cure for cancer just out of nothing.
And then what was the other one?
The other one was...
Transformative AI.
Oh, yes.
I like...
I feel like when you call something generative AI, that instantly makes people think, oh,
it just generates random rubbish, right?
It's OK, it'll cheat and write an essay for you, but it'll create horrifying images.
But is that really that valuable?
The most interesting application to these tools are transformative.
It's when you feed in the transcript of a podcast and say, hey, pull out all of the show, anything that should be in the show notes, which I always do for these kinds of things. Now, that kind of stuff is so much more interesting to
me. And so, yeah, I like that idea of emphasizing that it really is like what you get out is as good
as what you put in, but you can put in a lot of stuff. There's a lot of interesting applications
that just pump in a bunch of things, ask the right questions, and you'll get much more reliable and
interesting results out of it that
way sure well as we're finding out there's a lot more to talk about and we hope there is a second
time and we bring you you come back on the show uh but today thank you so much simon this was
this was amazing thanks a lot
hey thank you so much for listening to the show.
You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com.
You can also write to us at hello at softwaremisadventures.com.
We would love to hear from you.
Until next time, take care.