Software Misadventures - LLMs are like your weird, over-confident intern | Simon Willison (Datasette)

Starting point is 00:00:00 I call it my weird intern. I'll say to my wife Natalie sometimes, hey, so I got my weird intern to do this. And that works, right? It's a good mental model for these things as well because it's like having an intern who has read all of the documentation and memorized the documentation

Starting point is 00:00:15 for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas and they're massively overconfident. It's the intern that always believes that they're right. But it's an intern who you can, I hate to say it, you can kind of bully them. You can be like, do it again, do that again.

Starting point is 00:00:33 No, that's wrong, no, that's wrong. And you don't have to feel guilty about it, which is great. Or one of my favorite prompts, one of my favorite prompts is you just say, do better. And it works. It's the craziest thing. It'll write some code and you just say do better and it works it's the craziest thing it'll write some code you say do better and it goes oh i'm sorry i should and then it will churn out better code which is so stupid that that's how this technology works oh yeah it's kind of fun welcome to the software misadventures podcast. We are your hosts, Ronak and Guan.

Starting point is 00:01:07 As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they have learned, and of course, the misadventures along the way. Simon, so you've been building tools for doing data analysis in the past few years, but you also started playing with LLMs before it was cool. I think you started writing about GPD-3 like two years ago, and I'm sure you had different expectations after you started playing with it. Has there been any big surprises in the last two years?

Starting point is 00:01:51 Big surprises in the last two years? Hasn't been a big surprise, right? The last two years have been completely wild. Yeah, so I started playing with GPT-2 back in 2020, which was the very early precursor. And it wasn't, it was, there was clearly something there. I tried to use it to generate New York Times headlines for current affairs based on the style of headlines from different decades. So I like fed in like New York Times 1950s, 1960s, 1970s. And I mean, I didn't really get anywhere with it. I

Starting point is 00:02:26 sort of abandoned that project, but it felt like there was something interesting, but certainly not sort of life shattering. And then GPT-3 became available. And I started really playing with that sort of 2021, 2022. And that thing was extraordinary because it was this weird situation where the only way you could use it was either through the OpenAI, through their API, or through their weird little playground interface. And so nobody was using it, right? Like, I actually, like, I put up a tutorial. Here's how to use this thing because nobody was experimenting with it. And because nobody else was using it, there was very little, there wasn't much information about what it could do.

Starting point is 00:03:04 Like, you sort of poke around with it. It was also, GPT-3 was a completion model. So you didn't get to like chat with it. You'd have to give it a sentence and then put a colon at the end and have it complete the sentence. So you'd discover things like, one of the things that really clicked for me early on was the JQ programming language for manipulating JSON. I discovered that GPT-3 could write that so i could say hey here is a json document the jq program for extracting an array of names from this list of objects is colon and it would spit out working code and that was a bit of a revelation you know because i could never remember the syntax for jq um and so i was poking around with it and it was

Starting point is 00:03:42 increasingly clear that there were all sorts of things it could do that you wouldn't have expected something less to be able to do. But it never felt really like an AI. It wasn't like you were conversing with something. It was just this this sort of very weird tool that could complete things if you prompted it in the right way. And then ChatGPT came along and what that was November 2022, right? Yes, it was November 2022. And all they did, all they did is they slapped the chat interface on top of their existing model effectively. Like they tweaked it and trained it a little bit more. But you know, chat GPT was an experimental prototype and a bunch of people inside of OpenAI thought it was a bad idea. They're like, hey, this is a waste of time. GPT-4 is coming. We should just hold off until then. They didn't expect it to take off at all.

Starting point is 00:04:26 And it was, I think it's the fastest growing consumer application in the history of the world. It is. Which for a very obscure, weird thing is sort of astonishing, you know. But it was fun because this sort of this rocket just took off. The entire world swiveled and started paying attention to this field. And then because you've got millions of people experimenting with trying things out, that's when we really started figuring out what it could do and what things it was good at and all of that.

Starting point is 00:04:52 And so, yeah, I've been documenting and exploring that for the past couple of years. I also had an advantage in that I've got a blog and most people just don't bother blogging anymore. They might like tweet or post on LinkedIn, but very few people are writing sort of long form content about what they're learning. But because I was doing that bad at AI, I very quickly sort of established myself as a person that you go to and talk about this stuff with,

Starting point is 00:05:17 which is great because then you get all of these people who are figuring things out, talking to you directly, and you can learn much faster. Is like having a blog sort of like an accountability mechanism to have you yourself like go out and then find these sort of things that are maybe not working super well so maybe back like you know gpt2 back in the days just as like a new source of you know inspiration to write more posts so that in the process enormously interesting yes and that's actually that's the thing I've been doing this year is I very quietly started a streak. I'm trying to, like inspired by Duolingo and actually Tom Scott on YouTube

Starting point is 00:05:54 did this 10 year streak of making a video once a week, which I found incredibly inspiring because wow, like what a thing to manage to do. And so since January 1st, I've been trying to post something on my blog every single day. And I've done that. And it means that I do have that little extra incentive to make sure I find something interesting. So that's been helping. My blog, it's been an accountability mechanism for me for wider work for a few years now. Because I'm now sort of independent. I don't have an employer.

Starting point is 00:06:25 And so I started doing this thing I call week notes, where once every two or three weeks, I post a blog entry just saying, here are the things that I've worked on in the past couple of weeks. And that means that when I'm thinking about what to do, occasionally I'll think, you know what, I haven't done anything I can write about yet,

Starting point is 00:06:39 so I should really invest in one of my open source projects or do something so i can actually i've got something to show for it and yeah i i love that i think writing is thinking and it's such a great way of of forcing you to structure your thinking you know it's if if the best way to learn something is to try and explain to somebody else so if you've got a blog and even like my shortest little um like link blog things where it's like a link and two sentences of text, I always try and put something in there that's valuable that partly it's like to prove that I read the thing

Starting point is 00:07:12 that I'm linking to. But also it's like, if you read the summary in my blog and read the article, do you get something slightly extra from my perspective on it? And it might just be saying, maybe I'll link it back to something else and say, the Claude Trump caching they came out with a few days ago. And when I wrote about it, I linked back to Google Gemini, which has a similar feature. And I could compare

Starting point is 00:07:34 how Google Gemini pricing works and how Claude pricing works. And that's a little bit of extra perspective that you won't get from Anthropic. They're not going to write about Google Gemini in their announcement of the feature. So it's that kind of thing. It's having, it's forcing you to engage with the material just a tiny bit more thoughtfully so that you can try and say something interesting about it as well as linking to it. So when it comes to blogging, I think you had this tweet at one point, which was something like blogging is like planting a beautiful cactus. The best time to do it is 18 years ago. but the second best time to do it is today.

Starting point is 00:08:09 I think, especially when it comes to LLMs today, when generating content has become way easier, not necessarily good content, but there is just way more out there. How do you think about adding enough quality to the content where someone would actually read the post? The other part is also having an accountability mechanism to just do interesting thing is one perspective on writing blog posts. And I'm also curious to hear like, what are some of the other things that keep you going? Because after a point this, it takes a lot of work to write blogs. Well, that's the secret of blogging is that it takes a lot of work to write blogs well that's the secret of blogging is that it

Starting point is 00:08:46 takes a lot of work at first but i've been blogging for 22 years and you just get faster you know if you write every day you get faster at writing most days i will spend 10 to 15 minutes on my blog and that's it you know it's like two links maybe a quote it's a very very quick process to turn things around i actually have a second blog my um my til blog today i learned where the idea there is um it's the it's it should really be part of my main blog it's partly to play with different technology that i'm running it as a separate site but the idea there is anytime you learn anything new it's worth putting it out there and saying hey these are the things i learned it's all it is it's worth putting it out there and saying, hey, these are the things I learned. It's all it is. It's my personal notes, but very slightly cleaned up so that I can publish them. And actually, as a result of this habitually, when I'm writing personal notes, I sort of write

Starting point is 00:09:34 them well enough that I could copy and paste them into a public document, which is a good habit to be in anyway. But part of the reason I do the TILs is that it's the most low pressure form of writing that there is. Because with a regular blog, you feel like when you write something, you have to say something new. You've got to add something new to the world. With a TIL blog, no, you don't. The barrier for writing on TIL is did you learn it today or recently? And if it's like how to do a for loop in bash that still counts that's fine like i'm publishing it i honestly it's mainly for me it's sort of my public notes i can go back and

Starting point is 00:10:12 find if somebody googles for how do i write a for loop in bash and they land on that document that's great for them it's also um i feel like i mean i've had like i've got like 25 years of software engineering experience i feel like it's important to outwardly demonstrate that when you've got 25 years of experience, it's still worth celebrating learning for loops in Bash. You shouldn't get into that. There's that pattern people get into where they don't want to admit

Starting point is 00:10:37 that they only just learned how to do something. It's sort of a shame that I didn't know how to do for loops in Bash. I like using my reputation to do for loops and bash. Doesn't, you know, that, that I like using my sort of my, my reputation to broadcast out that no, be proud of that. Right. You figured out for loops and bash. Fantastic. There's a million other things that still to learn about everything involving computers, right? It's,

Starting point is 00:10:57 it's no biggie that you didn't know that already. So one thing that I struggle with always is I want to do this a whole lot more of. I have a blog which has four entries right now and maybe 10 in my notes, which I've never gotten to polish and publish. And it always goes back to, well, OK, today I have maybe, let's say, an hour I can either spend on writing this up, cleaning it up, or, you know, I could just spend the time doing some work. So I struggle with that balance and I'm curious how you think about it. I've got a great trick for that. So the thing that the way that I work, all of my work that I do, software work and a lot of my other stuff as well is in GitHub issues, right? It's free, it's got, you can have private issues, public issues and so forth. So every

Starting point is 00:11:41 single one of my projects has a very active GitHub issues like setup and I've got dozens of private repositories. I've got one just called To Do's that I use for personal stuff. And the thing I love about GitHub, basically the idea is that anytime I'm doing any project at all, I open an issue and I stick in a sentence at the top saying, do this thing, and then most of the work that we do as software engineers, it turns out is research, right? Like you have to gather so much information to solve a problem. You have to be like, okay, where am I gonna do it?

Starting point is 00:12:10 I'm gonna do this file here needs modifying, the tests for it live over here. I need to use this library. Here's some example code I found on Stack Overflow that solves this problem. I asked Claude a few questions and got these answers. And so I will very quickly pepper in like two or three or four reply comments to my issue with the research that I've done. And then I'll do the implementation.

Starting point is 00:12:30 And it means that firstly, programmers often talk about how damaging it is to be interrupted, right? There's this idea that you carefully build up the context of everything that you need for your problem. And then somebody taps you in the shoulder and asks you a question, and it all comes tumbling down. It takes you half an hour to get back into it yeah the fix for that is to have very detailed notes right if you have written down everything as you were going along i can be distracted come back read the last three issue comments and have everything back in place again and that's amazing for productivity but it also means that I'm maintaining over 250 active open source projects at the moment. And a lot of them are very small.

Starting point is 00:13:10 They're like little command line tools or plugins for my projects or whatever. But they're all maintained in as much as if somebody reports a bug and I see their issue report in amongst all of my notifications, I will fix that bug and I'll ship a new release. And the only way to maintain 250 projects is to treat every single one of them like you're going to forget every detail of it. Like every project has to be as if it was somebody else's project that you occasionally drop into and maintain. The way to do that is with issues, right? Every project I have, every single design decision I ever made is in an issue comment somewhere in that repository. So I can search through them, I can use git blame and say, okay, why did I add this code? It was in this commit, this commit is linked to this issue, this issue tells me what I

Starting point is 00:13:54 was thinking at the time, what options I explored, all of that kind of thing. And so this is an enormous productivity boost. It feels like writing all of these notes should slow you down. It's the opposite. It speeds you up. It means when you want to publish something, you've already written the rough outline of anything that you want to publish. Most of my TILs are copied and pasted from my GitHub issue notes, and then I'll clean up the wording a little bit

Starting point is 00:14:18 and maybe add some formatting, and that's it. It's done. So that's been enormously... I gave a talk about this at JagerCon a few years ago about increasing your productivity on personal projects through documentation and unit tests, right? The two things that people would expect would slow you down. Turns out if you put the right habits in place, having comprehensive documentation

Starting point is 00:14:41 means you can work so much faster, right? I can drop back into a project I haven't touched in a year, read the documentation as if I didn't know what the project was, and then start working on it. That's fantastic. And the same thing with unit tests, right? If you've got tests, you can iterate so much faster because you get over that fear of accidentally breaking something. You make a change. Normally, you'd have to manually test every single feature of the software to make sure it didn't break. If your tests are doing that work for you, you can drop in, make a five-line change, add a new test, run the test suite, and then publish it to PyPI or ship a release of it. It just works, you know?

Starting point is 00:15:18 That's super interesting. Do you also have design docs? I'm curious about like you know having all these projects being able to kind of drop in if it's something that's not like so give blame super useful maybe just search through the issues that sounds super cool um yeah like what about like do you also write like design docs or like yes but the issues are design docs yeah absolutely so the issues are the design documentation effectively. And the only that because the problem with design documentation and all documentation has to be kept up to date, right? If it falls out of sync with the code, then the big problem people lose trust in it, right? Like I've worked at companies where we've had internal documentation and nobody

Starting point is 00:16:00 uses it because they know that it's not being actively maintained. And so the way I see it, there are sort of two key forms of documentation. There's the documentation that has to be up to date, which tells you how the thing works or how to use the thing. So if you're writing software libraries, it's the documentation that tells you which functions to call. If you've got a web API, it's the one that tells you what the API endpoints are. Command line tools, these are the options and what they do. That I keep in my repository with the code. So there's always a docs folder. It's got a bunch of markdown files in it. Anytime I update the code, I update the associated documentation. And if I'm collaborating with people, that's part of the pull request design process, the code review process. If you submit a pull request and it doesn't update the documentation,

Starting point is 00:16:45 I'll either put in a note saying you need to update the documentation, or sometimes I will update the docs as part of that pull request. The idea being that the moment you land it on the main branch, it's got the test, it's got the implementation and the documentation all in a single commit. Because then when you use git blame, the commit shows you the documentation change as well. But the other form of documentation is, I've been calling it temporal documentation, it's documentation that was true at a certain point, but isn't guaranteed to still be true today. And that's where issues shine, right? If I read an issue, and it says 2017, January the 5th, a bunch of stuff, I know that that's

Starting point is 00:17:21 not promising to be up to date. So it's still useful because I can say, okay, well, in January of 2016, this was true, but it's not sort of ruining my trust in my docs because I look at it, I'm like, hey, is this still true anymore? I don't know. And yet, so very occasionally, I will write design documentation that says, if you are a maintainer of this code, you should look here and here and here, and this is how it works. But I often don't do that. I sort of leave it to the issues. The idea being that if you can spelunk through the code with git blame and the issues,

Starting point is 00:17:52 you can get that same information. You might have to put a bit more work in. See, I don't think any of my projects have significant design docs like current architectural documentation right now. And I might start adding it. Also, a lot of them are, you know, if it's a software library, significant design docs at the like like current architectural documentation right now and i might start adding it um also a lot of them are you know if it's a software library the design documentation and the api documentation are kind of the same thing you know the the the the design

Starting point is 00:18:15 is sort of presented through how the the api is built yeah i was actually thinking this is a practice that would be useful for even teams at companies where what ends up happening, at least in my experience, is something new will come up that you need to implement. Someone will go do some research, try out a few things. So you see the code changes in the PRs, maybe the approach changes in the Google Docs, mostly at least at my workplace. But they don't always end up linked together, and one usually gets out of sync. But I think this idea of using issues to do that,

Starting point is 00:18:48 where you can do that in the repository itself, and the issue may link to the Google Doc that you have, which is easier for collaborating and commenting on, I think that would go a long way. So it's something I'm going to try, actually. And you know what? I've got issue threads that are over 100 comments long, and they're all me.

Starting point is 00:19:04 It's just me talking to myself. I just realized that issues are a blog, right? An issue thread is basically a one-off blog for the story of this change, the story of this feature. One of the reasons I love issues so much, I used to write really long commit messages, like I'd do six paragraphs in a commit message explaining what I was doing.

Starting point is 00:19:23 I've stopped doing that now. What I do instead is if there's stuff that should be in documentation, I put it in the documentation and then include that in the commit. So it doesn't go in the commit message. It goes in the actual code. And secondly, it's every commit always links to an issue thread. Because the great thing about an issue thread is I can add comments to it a year after the commit. Right, you're running git blame, you see a commit,

Starting point is 00:19:44 you click through to the issue thread, and there might be a comment saying, 12 months later it turned out this was a terrible idea for these reasons. Also, issues accept screenshots, so I can put screenshots of the feature. So if I'm doing CSS stuff, I always include screenshots of before and after. You can do animated GIFs or videos in issues, so I'll sometimes do a little GIF demo of the thing. Issues can link to each other. You can embed code in them. They're a really rich canvas for all sorts of aspects of document.

Starting point is 00:20:16 You can't put an image in a commit message, right? But you can put a screenshot in an issue. So yeah, I'm definitely a GitHubithub issues power user this is super helpful thanks for sharing the tricks we'll actually link your talk in our show notes as well so that people can find it easily cool uh by the way one thing about the blog so i was looking at your blog and bunch of educational posts where people can learn about how to do various things and you want to get into some of those but i also saw that you have some of these posts linked on Substack. So I was curious, how do you use one versus the other?

Starting point is 00:20:50 That is a cutting trick that I came up with. So I have a Substack newsletter, which I put out once every two or three weeks. And all it is, is the content on my blog since the last newsletter. With maybe a sentence at the top, with like maybe I'll add a tiny bit of text at the top, but it's, it's purely, it's basically I'm using Substack as a free mechanism to let people subscribe

Starting point is 00:21:12 to my blog via email because I didn't want to pay to send emails and build all of that kind of stuff. And Substack, it's great for that. I've got like over 6,000 subscribers now on Substack. Um, and it's, it takes me about two minutes per newsletter to send it out. So it turns out Substack, they don't have an API, but you can copy and paste stuff into your Substack edit panel. And so I built myself a little tool, which it's actually an observable notebook. But what it does is it pulls all of the content from my blog, reformats it into HTML rich text, and then gives me a big copy button that I can click, which puts all of that on my clipboard. And so I go to this notebook, I click copy, I switch to Substack, I hit paste, I set the title of the newsletter, and I pick a preview image, and that's it.

Starting point is 00:22:02 I'm done. Like literally two minutes to send that newsletter out, because it's using copy and paste, copy paste as an API, which it turns out is a really powerful trick. There's loads of stuff you can do with software that thinks it doesn't have an API, and you're like, yeah, but I can paste stuff into you, so. Yeah, and that's been great. My only regret with the newsletter

Starting point is 00:22:20 was I should have started doing it years ago, because I've been doing it for about just a year and a half maybe and it's it's brilliant you know it's it's it's a really great way of getting things out there to people who live in their email clients so there's an argument of uh using either systems like substats or medium or having your own personal blog and the argument that i've heard to keep your own personal blog is that these platforms may or may not exist in the future which has happened uh for many of these platforms is that the reason why you still have the personal blog and substack is mostly just an email distribution

Starting point is 00:22:56 service of sorts that's one of the reasons yeah i mean one of the reasons i chose substack is you can export your subscriber list so if substack ever say hey we're shutting down next week I can pull out a CSV file with all of the email addresses in and I can move to something else that's really important to me because yeah vendors absolutely come and go my I've owned my domain name for again 20 or 20 odd years it built and you know it builds up SEO credibility and stuff over time but But also it's just having, there's something sort of wholesome about having a little corner of the internet that's just for you. Like that, that's something I genuinely, I really enjoy. It feels a little bit subversive as well

Starting point is 00:23:36 in this day and age with all of these giant walled platforms and things. Yeah, no, I'm, I've got a domain name and i'm running a web advocate website when i'm yeah i so there's a and it's just fun you know as a software engineer it used to be like 10 15 years ago everyone's intro to soft to web development was building your own blog system i don't think people do that anymore and that's really sad because it's such a good project you get to learn databases and html and url design and all of these and seo and all of these different skills um and yeah i mean my my blog itself is running it's a django application because i helped create django 20 odd years ago so i want to have

Starting point is 00:24:17 something in my life that's like a django app that i'm building on and it's all open source like the code is on github it's is on GitHub. Over the past six months, I've started updating it a lot more, just making little tiny tweaks to it. I changed the default typeface that I'm using for headings a couple of weeks ago. And I started doing more things with images. And it's just really nice. It's nice being able to dive in and try out something new completely in that space. I run it on Heroku behind Cloudflare. The great thing about Cloudflare is if I get a giant spike of traffic, like if I'm linked off the Hack and Use homepage,

Starting point is 00:24:53 my tiny little cheap Heroku instance doesn't even notice because Cloudflare absorbs all of the traffic. And that's great. I bet that helps. That's a nice way to do it. I think Elon Musk maybe linked one of your posts in a tweet at one point. So I'm assuming this would have helped. Yeah, I got 1.4 million hits on a page from that one.

Starting point is 00:25:11 And yeah, without Cloudflare, I would have instantly melted. Well, by the way, I think that also resulted in your first ever TV appearance, right? That's right. It was last year. It was when Microsoft Bing added their chat thing. And it was February. It was February last year. It turns out it was GPT-4, which hadn't been released yet. So Microsoft Bing was our first glimpse of GPT-4. And they hadn't quite figured out the personality. It was some early prompt engineering and it went completely wrong and it started threatening people

Starting point is 00:25:46 and it tried to break up. Kevin Roos from the New York Times, it tried to break up his marriage. It was just joyfully bizarre. And yeah, so I, there were all these people, these posts on Reddit from people going, yeah, it just told me that it wanted to have me arrested and all of this kind of stuff.

Starting point is 00:26:03 So I put up a blog post where I just collected together a bunch of examples of this and yeah and elon musk tweeted a link to it and it was on the hack news home page and yeah i got interviewed on live live tv news out of chicago talking about trying to get trying to reassure people that this thing wasn't going to steal the nuclear code even though it said it was this This is no Terminator. Yeah, that was deeply entertaining. Oh, I can imagine. By the way, coming back to LLMs, and before we actually go there, you mentioned this tool that you use to move your blocks from your site to Substack. Is this tool something open source that people can use, or this is something that-

Starting point is 00:26:40 Yes. Well, it's an observable notebook. And yeah, I think it's an observable notebook um and yeah the i think i think it's probably linked to from about page we can stick it in the show notes it's a little observable right now it only works against my blog so it's useful for me to create my own newsletter but because the observable notebooks is a it's a platform where you can create basically write a sort of interactive document using JavaScript, and the code is visible in it. So you can dig through and see exactly how it's working.

Starting point is 00:27:10 It's quite complicated because it actually pulls the content from a dataset instance. Like Dataset is my major open source project, which lets you build a JSON API on top of a SQLite database. But my blog is running a Postgres database in Heroku. So there's actually a whole chain of things that make this work where I've got code I wrote that does a backup of my blog from Heroku into JSON. And then it loads that into a SQLite database and then it publishes the SQLite database with dataset,

Starting point is 00:27:41 which gives it a JSON API. And then my notebook can use fetch calls in JavaScript to run SQL queries against the JSON API to pull in the content, to assemble it into Markdown, which it renders as HTML, which I copy out. It's beautiful. Like it's this giant convoluted chain of things that somehow works really well.

Starting point is 00:28:00 It's basically sounds like Unix, but in a notebook where you pipe the input. Oh, totally. And I'm big into the... The Unix philosophy is big in a lot of the work that I do. I love tools that just... You can pipe tools together. I said I've got 250 projects.

Starting point is 00:28:16 That's because they're all little tiny things that you can then plug together to pipe things from one to another. So talking about the Unix philosophy, I want to talk about the LLM CLI tool that you developed. And I actually recently came across it while researching for the episode. And I found it to be super cool because a lot of times I don't want to leave the terminal that I'm on. I just want to ask the question there. So having something like this, where you can choose the model you use, or sometimes even run it just locally when you don't have Wi-Fi, for example, that sounds super cool. Can you tell us a little more about this tool?

Starting point is 00:28:52 Yeah, so this was, I built this last year. I started this project last year. The idea was originally OpenAI were effectively the only interesting game in town for quite a while. You know, with GPT-4, they had such a lead. Today, that's not true at all. There are a bunch of amazing, like, competing models that I'm often using instead of OpenAI. But yeah, so my initial idea was,

Starting point is 00:29:15 you can, the OpenAI API is kind of cool. I like hanging out in the terminal. It would be great if I had a way of not just running prompts from the terminal, but also piping data to and from the model, because the Unix piping idea is always like, you get some content, you pipe it into another thing, which transforms it, you pipe it back out again.

Starting point is 00:29:32 That's all language models are, right? They're a function where you can give it some stuff, it does something, and it gives you back, you give it input, it gives you output. And so the original idea was to build a little API client for OpenAI where you could basically say, LLM space, double quotes, how do you do a for loop in bash? And then you hit enter, and then it spits out the answer on your terminal. But you can also pipe things to it. So

Starting point is 00:29:56 you can say, cat, hello, world.py, pipe LLM, and then give it an extra prompt saying, explain what this code does, or rewrite this in C or whatever. And I noticed that nobody had the PyPI, the Python package index, nobody had reserved the LLM name on there yet. And I'm like, oh, I've got to have this. So I grabbed this beautiful three-letter, so pip install LLM is how you get it. And that was fun. And then a few months later, I wanted to start playing with all of these new models, the Lama models that run on your laptop and so forth. And I'd already built plugin systems for other software that I've developed. So I thought, okay, what if I had plugins so you could install a new plugin for this command line tool,

Starting point is 00:30:39 and now it can talk to Anthropics Claude, or it can talk to Google Gemini, or it can run Lama on your computer directly. And so I built that. And now there's over 200 different models that this one command line tool can run if you install the right plugins for it. Other people have written plugins. The great thing about plugins is it's a way of building an open source project where you don't have to review people's code to add features to your thing. Like I can wake up one morning and my software can do a new thing because somebody else released a plugin for it. It's amazing, right? It's the best form of open source contribution as well. Because if you write a plugin for my project, you're not asking any time of me to sort of review your code and interact

Starting point is 00:31:19 with you. You're just putting it out there into the world. yeah so you can llm install um all of these different plugins for all of these different models and the other big feature of llm is that everything it does is logged to a sqlite database so anytime you prompt it the prompt is logged and the response is logged and it records which model was used so you can actually use this for model research where you can run the same prompt against five different models and now you've logged all of the responses and you can go and compare them later on i've got like 3 000 prompt response pairs that i've recorded just in my own local database from tinkering out around with this thing which is i'll be honest i don't go back and use it to compare the models as much as i want to

Starting point is 00:31:59 but the data's there you know i've hoarded the data at some point i can and it's also just useful to be able to say okay okay, show me the logs of my conversations, search those logs, export the log of this conversation and publish it somewhere. So yeah, that's been super, super fun. It does, it's very distracting because it means that whenever a big new model comes out,

Starting point is 00:32:18 I lose half the day to spinning up a new LLM plugin so that I can try it out. But it's been really, and it's super useful. Like I use it several times a week. Most of my personal usage is still through the web interfaces. I use claude.ai and I use chatGPT daily, like every single day for the last year and a half, basically. But yeah, having it on the command line as well,

Starting point is 00:32:43 it gives you all of these other options. It's also just really fun for hacking things together. Like you could write a bash script that implements a retrieval augmented generation against something like by scraping a webpage with a curl command and then piping that to LLM and running it against Lama and all of that kind of stuff. It's really, really fun.

Starting point is 00:33:01 It's like some of the guests that we've talked to, there's this sort of idea of, oh, you need to kind of build up this, what do you call it? This habit of like going to the LLM first before you like try anything else. I was curious, yeah, like what has it been like in this one and a half years for you

Starting point is 00:33:18 for like kind of using it daily? Did you, when you started, do you have to kind of force yourself to be like, hey, you know, I need to, even though I know the answer, I'm going to still still go to the lm just to see how you how did that like evolve over time that's interesting i don't think i've not been turning to lms for things i already know the answer to but you know i'm as a software engineer every single day i run things that i don't know and it might be a for loop in bash,

Starting point is 00:33:49 it might be something a lot more interesting and complicated than that. So I don't, I mean, the problem with LLMs is they're actually really difficult to use, which is very unintuitive. Everyone assumes that they're easy because it's a chatbot. You type things, it says things back to you. But to use them effectively, you need to build this really really really deep model of what they can and can't do like i would never ask an llm to count all of the the instances of something in a paragraph because i know they can't count which is totally non-obvious right it's a super sophisticated computer system how can it not count computers are great at counting that's like what they've been doing since we've invented them um i know that it can't look think i know that if i i've got a question where if i if i have a like a sort of if a friend of mine could read read a wikipedia page and then answer my question then i know that the llm will be able

Starting point is 00:34:35 to answer that question but if it's the kind of thing which the wikipedia page probably isn't going to cover it's less likely that the llm will be able to answer it but it's difficult because you really do just have to put the time in like you've got to spend um a friend of mine says it's less likely that the LLM will be able to answer it. But it's difficult because you really do just have to put the time in. Like you've got to spend, a friend of mine says it's 10 hours is the minimum you have to spend with a GPT-4 model before it really starts to click what these things even are and how to use them. And I think to develop that level of expertise

Starting point is 00:34:59 where I can look at a prompt and I can, 90% of the time, I will predict correctly if it's going to work or not. Like I look at somebody's prompt and say, yeah, of the time, I will predict correctly if it's going to work or not. Like I look at somebody's prompt and say, yeah, you're asking it to count things. That's not going to work. Or you're asking it about like quantum physics, but it's the kind of basic question that an undergrad like student in quantum physics would answer straight away. It'll definitely get that one right. Right. But having that intuition is, it takes a long, long time to build up and it's not transferable. Like I love teaching people to use this stuff, but I can't just dump my intuition into their head.

Starting point is 00:35:32 I can't be like, boom, here you go. Now you'll be able to use these things effectively. Like one of the lessons I think people need to learn as quickly as possible is you've got to run prompts where it gets the answer wrong in a really confident way like that earlier do that the better because otherwise you you can go into the decided that it is this sort of like science fiction AI that knows everything and so when I'm evaluating new model I always start with an ego prompt I ask it about myself I say provide a career outline for Simon Willison and because I've been blogging for like 20 odd years it knows a lot about me right there's a lot of stuff that ends up in the training data but it's still it makes it like they often say that I'm the CTO of github I have

Starting point is 00:36:12 never worked forget or that's I had one tell me the other day that I'd been to Oh a university that I hadn't been to like those kinds of mistakes so generally if you if you know somebody who's sort of internet famous right they're not like a a celebrity but they've been around on the internet for long enough that there's stuff about them in the training data asking questions about them very quickly exposes that these things are not knowledgeable that they're spitting out statistically likely text from their training data and that's so important like that it's it's crazy to me how to get the best results out of these things you need to have expertise in what they can do so experience using them you need to have a bit of expertise in how they work right you

Starting point is 00:36:56 don't need to understand the matrix multiplication and the key value pair and all of that kind of stuff but you do have to understand that they come from training data they're doing next token prediction you need to have that sort of basic level and you have to be a subject expert in what you're doing with them right like as a soft an experienced software engineer i can do amazing software engineering with an lm because i've got that expertise in what kind of questions to ask i can spot when it makes mistakes very quickly i know how to test the things it's giving me i like occasionally I'll ask it legal questions. Like I'll paste it in terms of service and say, Hey, is there anything in here that looks

Starting point is 00:37:28 a bit dodgy? I know for a fact that that's a terrible idea because I have no legal knowledge. Right. So I'm sort of like play acting with it and nodding along, but I would never make a life altering decision based on legal advice for an LLM that I got, because I'm not a lawyer. If I was a lawyer, I'd use them all the time because I'd be able to fall back on my actual expertise to sort of like make sure that I'm using them responsibly. By the way, I can attest to that one part where if you search for internet famous people, these things will, or LLMs, will very confidently tell you stuff which is not true. I've experienced that in doing research for a lot of our episodes, including

Starting point is 00:38:11 this one, where I saw this like, oh, and what I've started doing now is I would actually tell it, give me the source information. And at least, I mean, I use chat, GVT more than anything else. And I would actually say, give me the source for this information. And surprisingly, when it comes to fetching information from certain podcast transcripts, it's decent at doing that. But it's horrible at attribution because either transcripts are faulty or it just doesn't know who said what. And the other thing is it'll actually start sourcing things,

Starting point is 00:38:45 which look like links, but they're not clickable. If you search for that exact string. That's my favorite bug. Yeah. I wrote something about this last year. Because before ChatGPT had browsing mode, it would do that all the time. It was amazing. It would just hallucinate these URLs.

Starting point is 00:39:04 And one thing that you could do that's really fun is you could give it a URL and say, summarize this article. And even though it couldn't access the web back URL, it's a 404 page, and then you paste that in and it would confidently write a story as if it was a wired story about that happening. Like just utterly, like Claude now, because Anthropix, Claude can't access the web, they do at least have a little inline hint that shows up and says, by the way, I can't access the web. But yeah, it's, that's, that was a great one because people got so confused by that one. People who were absolutely convinced that ChatGPT could summarize webpages because they'd seen it do it dozens of times.

Starting point is 00:39:53 And you're thinking, wow, you've probably spent the last two months consuming summaries of webpages that were entirely made up. And you do not want to admit to yourself that you've got two months of crap. It's fascinating, right? There's so many traps in all of this stuff. And the interesting thing, I think perhaps you mentioned it in one of your talks that LLM interface is kind of interesting because it's just a simplified interface where you just get dropped into this chat box and you kind of have to discover the capabilities as well as limitations of the system like you can't find things out that it's capable of doing it's it's like taking a brand new computer

Starting point is 00:40:32 user and dumping them in a linux machine with the linux prompt and say there you go figure it out right like it's it's a joke it's an absolute joke that we've got this incredibly sophisticated software and we've given it command line interface and launched it to 100 million people what what were we thinking yeah one of the things i'm most excited about is alternative interfaces to these systems which we're beginning to see some really interesting stuff starting to crop up there um but i mean the chat interface it is really powerful and useful but it's such a bad way to onboard people and they've they've nodded like at least now these systems they'll at least give you a few ideas they'll be like why not try to get it to cheat on your homework or whatever but come on you know we could we could do so much

Starting point is 00:41:14 better oh so i gave uh i said i introduced chat gpd to my mom she doesn't speak english but recently uh she wanted to send a message to someone and she was asking me to help her format things a little bit. Format meaning like she had a rough draft and she's like, help me improve it. And I was like, you know what would be amazing at it? Chat GPT. So I just gave her the phone and I said, just speak into it. Forget about typing. And she's like, but what do I say? So just looking at that microphone prong, for for example like she had no idea what she could do but once she got started oh boy she won't charge a video on her phone now uh for all sorts of things she wants to do honestly people who don't who don't speak english who have english

Starting point is 00:41:56 as a second language this stuff is incredible right absolutely amazing and that's something like i feel like that's something that people often like there are lots of people who are very cynical about this technology and there are a lot of reasons to to there are a lot of like reasons to be concerned about it i feel like taking like we live in a society where if you have really good spoken and written english it puts you so it's such an advantage like you've got a problem which with, like, the streetlight outside your house is broken, and you need to write a letter to the council to get it fixed. That used to be a significant barrier. It's not anymore.

Starting point is 00:42:32 ChatGPT, if you get it to write a formal letter to the council complaining about broken streetlight, flawless. Absolutely flawless. And you can prompt it in any language, and I'm so excited about that i feel like the the and it's it also interesting it sort of breaks aspects of society as well because we've been using written english skills as a filter for so many different things like if you want to get into university you have to write one of those like like a formal letter and all of that kind of stuff which used to just it used to keep people out now it doesn't anymore which i I think is thrilling. But at the same time, if you've got institutions that are designed around the idea that you can evaluate everyone and filter them based on,

Starting point is 00:43:13 on written essays, and now you can't, we've got to redesign those institutions. That's going to take a while. What does that even look like? It's so disruptive to society in all of these different ways. I think this is like a nice plug that I saw on another podcast. You mentioned that, you know, the thing that I want to spend my life doing is helping people make the most use of these computers. And we want people to be able to automate their lives. Right. This is what computers are for. Right. Computers are supposed to automate tedious things in our lives. Right. And if you are a programmer, you can do that. Right. This is what computers are for, right? Computers are supposed to automate tedious things in our lives, right? And if you are a programmer, you can do that, right? If you've got a software engineering degree, there are so many problems in life that you can automate away.

Starting point is 00:43:54 The vast majority of people can't do that, right? They didn't spend two years getting a software engineering degree, which means that they frequently end up having to spend all day copying and pasting things i actually um last year i i was at an event where i encountered i heard from a fire chief right the guy who runs the a fire station who had just spent the last day and a half copy and paste copying and pasting names and phone numbers from one crm system into another crm system because it needed to be done and i'm'm like, this is, how are we taking people with like jobs of that much importance and leaving them so that they have to do this kind of manual copying and pasting

Starting point is 00:44:33 because computers are really, really frustrating to use. And there's no easy way to do that. If that guy had a computer science degree, he could have automated the export from the CRM system to the other CRM system and saved a day and a half of work and that's the thing which it feels and like there's this idea of end user programming for years we've been wanting to solve it so that users can actually like program computers without spending six months learning how to do it like apple like hypercard and apple script and microsoft excel is probably the best version of this, right? So

Starting point is 00:45:05 many people are programmers every day using Excel and they don't think of themselves as programmers, but honestly, if you can use Excel, if you can spin up formulas and stuff, that's programming, that's software, you are building software and automating things. I feel like language models could be the key to unlocking this. Like we're just beginning to see little hits for ChatGPT Code Interpreter and Claude Artifacts are two of the most exciting things in the AI space. And I continually hear from people who, firstly, people who really are using these tools on a daily basis who've never programmed before,

Starting point is 00:45:38 but now they can do stuff. And the other thing that's exciting is I talk to people who tried to learn to program in the past and they didn't get over that initial six months of misery where you forget a semicolon and you get an obscure error message and you get stuck for two hours. And a lot of people give up. They're like they assume they're not smart enough to learn to program. And that's not the case. It's that they were nobody warned them how tedious and frustrating it was. They weren't patient enough to get over that miserable initial learning curve. Those people, a lot of them are learning to program now because if you get that semicolon

Starting point is 00:46:10 error and paste it into ChatGPT, it tells you the fix. So it's like having a teaching assistant on hand 24 hours a day who you can call over and they go, yeah, you put the semicolon there. Amazing, right? Absolutely amazing. I was talking to somebody just the other day who had who's a very experienced professional in their own field and they've spent the last two months programming and really enjoying it having tried and failed to learn a dozen times because

Starting point is 00:46:36 they've got this new assistant that can help them that's amazing right that's as a professional programmer there's a little tiny aspect where you're like, OK, does this mean that our jobs are going to dry up? I don't think the jobs dry up. I think more companies start commissioning custom software because the cost of developing custom software goes down, which I think increases the demand for engineers who know what they're doing. But I'm not an economist. Maybe this is the death knell for six-figure programmer salaries, and we're going to end up working for peanuts. I don't know. I guess we'll just find out.

Starting point is 00:47:10 So there's a lot to unpack there, and I want to take a couple of directions. But before we go forward, there's one thing you mentioned when you were talking about the LLM being the interface for talking to these models. So I wanted to read one of your tweets where you're asking a question on Twitter or X. What are the LLM driven products that people use

Starting point is 00:47:32 which don't have this chat interface? I'm sure you would have gotten fascinating answers, but I'm actually curious, what are some tools that you use which don't have this chat interface on top, but are built with LLMs? That's a really good question. The most obvious one, GitHub Copilot was the first mainstream non-chat based. And actually, GitHub Copilot predates ChatGPT. Like that was a thing before ChatGPT came along. And that interface, the sort of gray text which you get to approve, seems so simple and obvious now.

Starting point is 00:48:06 They iterated on that a lot. Like the team that built GitHub Copilot, they were the first to sort of figure out how you do LLM integration into IDEs. They put a heck of a lot of research and work into that. They came up with something. It's one of those things where a lot of the really obvious ideas weren't obvious at all until somebody did the work to get there. So GitHub Copilot is my favorite example. I'll be honest, on a day-to-day basis, I'm not using anything that's not chat-driven that I can think of, but I do use the alternative inputs a lot. I use the voice mode on ChatGPT, and I've been playing with the Google Gemini one

Starting point is 00:48:42 a lot. I can go on a walk with my dog with AirPods in and I can write code walking my dog because I get ChatGPT to do it over the audio thing. It's amazing, right? So I use that. Images. I love image inputs. I feel like image inputs are actually still quite new.

Starting point is 00:48:59 GPT-4 Vision was announced in November last year. So we've only had, and these days, all of the models have amazing image inputs, but that's still like not that, it's still quite a new capability. So I will drop in screenshots of like a rough mock-up of a thing and get it to do HTML and CSS. I'll drop in screenshots of error messages,

Starting point is 00:49:19 all of that sort of stuff. The coolest demo still that I've seen of alternative UI is the TL draw guys. The TL draw team did this thing called make it real, where you've got this browser based vector editing software. So you can draw boxes and lines and add text. And they added a feature where you can then select a mockup and click make it real.

Starting point is 00:49:42 And it sends a screenshot of that to GPT-4 gets back um like tailwind HTML CSS JavaScript and it pops in a working version of the thing and you can literally like you can draw a you can draw a calculator like a Fahrenheit to centigrade Celsius calculator just draw the boxes put c f and a calculate button and you don't even tell it what's supposed to happen you say make it real and, oh, I bet clicking that button should calculate that from Fahrenheit into Celsius and update the two boxes. It's extraordinary, absolutely extraordinary. And that feels like there's so much more to be explored around that,

Starting point is 00:50:19 this idea of, okay, so we've got an interface that lets you draw something and we can pipe that through an LLM and turn it into working code, that kind of stuff. In my own work, I've been experimenting with the... So Dataset is software for data analysis. It loads... You have a SQLite database full of data, and it gives you a UI for exploring it, a JSON API for running queries against it, that kind of thing.

Starting point is 00:50:39 So I've got one plug-in for it, which is a ask a question in English, and have... That one's using claude haiku at the moment have that turn it into a sql query and then it'll run that sql query and then a lot of people who build those systems give you the answer straight away so you'll say how many records were in california and it'll say 230 records were in california that i think is a bad idea because in my experience with it it gets the right answer four out of five times. But what in five times? It'll like do it where state equals CA, but in the data, it was where state equals California. So it gets zero results, right? And that's a

Starting point is 00:51:16 disaster, right? You've just given somebody the wrong answer to their question. So instead, I'm redirecting them to the SQL query page. So you at least see this. So if you're SQL literate, you can look at that and go, oh, it's search for CA, not California. I'll fix that. If you're not SQL literate, it's not great. I'm trying to figure out, okay, do I do a human explanation of the query? Should I show like a join diagram? What are the other things that I can do to try and make this more obvious? I like the idea of showing you're working with these systems. But yeah, so that's one of my experiments is ask a question that gets into the SQL query, you get shown the SQL query, all of that kind of stuff. There's so many more things like that, that we can be experimenting

Starting point is 00:51:55 with. So that approach is fascinating. I think in this case, what you're the way you are at least building this application is assisting people right at the place where they would ask the system a question so like what typically happens when i am interacting with sql databases is so i use db where for example to connect with some of our internal my sql tables i used to be really good at sql a few years back i haven't written sql in at least the last four years i can write simple queries but when it comes to doing things beyond joins, where you need like a bunch of unions and join the other things and so on and so forth,

Starting point is 00:52:32 I'm like, I can do that, but I'm lazy. So I would go to something like chat GPT, give it a simple prompt and say, give me the answer. And then I run it. So I like the way you described what you're building, because in this case, in the same prompt, a user can say, I want to do this. It's like you see, it's kind of a debug log of sorts.

Starting point is 00:52:52 You see what it generates. Exactly. You click it right there. Also, so I use language models for SQL queries just all the time because they're so good at SQL. Like they're really, really good at sort of advanced SQL queries, all of that kind of thing. The problem is

Starting point is 00:53:06 you have to copy and paste the schema in first. You've got to give it the schema so that it knows what to do. And I'll do that, but again, when I'm building it myself, invisible to the user is I'm sending the schema. I can actually... I've also started experimenting with sending example rows. The thing where the state

Starting point is 00:53:21 column might be CN, it might be California, send three example rows, and the language model cotton's on. It's like, oh, okay, I should search for Florida because I know that it's full state names in this column. So, yeah, tricks like that are super important. I feel like generally if you're a developer working with these models, it's all about the context, right? What matters is it's all about the prompt. And the most interesting thing about the prompt is that you can slap in a full copy of the SQL schema, five examples of queries that have run in the past, those kinds of things. That gets really interesting. I'm a big fan of the term prompt

Starting point is 00:53:54 engineering, which is a term that a lot of people make fun of. A lot of people are like, come on, it's chatting to a chatbot. How is that engineering? But I feel like those people are missing the craft of this thing. Like, forget about chatbots. For me, prompt engineering is about figuring out, yeah, okay, for a SQL thing, we need to send the full schema. And we need to send these three examples and these three responses. We need to prompt it in this specific way. That's engineering. It is engineering.

Starting point is 00:54:22 It's complicated. It's very, the hardest part of prompt engineering is evaluating it's figuring out okay of these two prompts which one is better i still don't have a great way of doing that myself like that to me is the the people who are doing the most sophisticated development on top of llms are all about evals they've got really sophisticated ways of evaluating their prompts i aspire aspire to get to that point. Like I'm still trying to figure out the best way to do that. Yeah, reading your post was really helpful.

Starting point is 00:54:52 Like I love how you include, it's like, hey, this is my first prompt. This was like the code that you split it out, you spit it out. And then it's like, this is what's like how I changed it. Right, like I feel like that modification process is exactly the most important. That's super important. was like how i changed it right like i feel like that modification process um is uh exactly um the most important that's super important like as an end user of an llm it's all about the follow-up

Starting point is 00:55:13 prompts like a lot of people who are disappointed in lms will stick in a single prompt say write me code that does this and it'll spit out a bunch of code and they'll look and go well that was crap and sure it was crap so now you tell it you say refactor that to not to write some tests about or this doesn't work but you paste in the error message and that's all of the work the substantive work that I do with these things ends up being like 20 or like actually to be honest often I'll get there with two or three follow-ups but sometimes you you you you go you go longer than that so I will always try I love sharing my prompts because these

Starting point is 00:55:44 things are so hard to use. I feel like it's beneficial to show people what you did. And so I'll very frequently, I'll share chat GPT transcripts. I built my own tools for sharing Claude's transcripts because they don't have a good,

Starting point is 00:55:57 like full transcript sharing thing. My LLM tool makes it easy to pipe out the logs into Markdown format. I paste those into a GitHub gist and then share that. A little habit I've got is that when I'm sharing these things, I like to put them in private gists because GitHub private gists aren't indexed by search engines, but you can link to them. So it's a way of avoiding polluting the internet with giant mounds of LLM

Starting point is 00:56:22 generated text, but still letting people like giving people links. They can go and see it. That's just a little habit that I've got. By the way, are there any prompt engineering resources that you've found to be useful? One, just one, the Claude documentation. Anthropic are the only team who have really invested in good documentation on how to prompt their models there are i mean there are a million sources of like millions of people on twitter will tweet like crazy prompting tricks like threaten your grandmother and all that kind of stuff uh and and honestly some of those some of those are good tips the problem is filtering through them so if you want to read something which is reliable and

Starting point is 00:57:05 like i trust the anthropic prompting guide a lot that's not to say there aren't other good prompting guides out there but that's the one that if you want one resource that's the one i send people to and you were describing a data set and using llms to power some of the features so for folks who haven't been paying attention i want to say there's been a theme of SQLite in a bunch of things that you do with your blog, with LLM, the CLI tool, with Dataset as well. And you've built a lot of data analysis tools and worked on them over the last few years.

Starting point is 00:57:39 How are you thinking about this integration? Because at least when I first just learned about LLMs and I thought, well, having them answer random questions is cool. But I want them to do things on either my data or the context that I provide. And this idea of context was bizarre. At least it didn't make sense to me very initially. I thought you always had to just fine tune things on top. So, and I was discussing some of the ideas with my wife and she was like, well, you're not thinking about, she works on some of the LLM stuff. So she was like, you're not thinking about it in an LLM first way. That's just not how

Starting point is 00:58:17 you build applications on top. A lot of it is just prompting to build stuff on top. So I'm curious when you're thinking about building some of the features in Dataset, how do you go about building these features? And is that different from doing traditional software engineering where you rely more heavily on prompts than APIs, for example? Yeah, I mean, this is, like, as a software engineer, LLMs are incredibly frustrating because they are non-deterministic, right?

Starting point is 00:58:47 You give them, you tell them to do something, and there is no guarantee that if you'd say the same thing twice, you'll get the same answer back. Even if you fix the seed and turn the temperature down, you still might get slight differences. Unit testing, how do you unit test something which has a random number generator almost built into what it spits out really frustrating and difficult um i it's it's working with the computer that sometimes

Starting point is 00:59:10 just straight up says no right it might refuse to do a thing that's really difficult because the sort of larger theme of my work is around data journalism this idea of um like helping journalists analyze data and find stories in it data Dataset was originally designed for data journalists. It turns out it's applicable way outside of that field as well. But that's always been the sort of framing that I hold for this. And a challenge that journalists have is that if you're a journalist, some of the source material you work with is nasty, right? It's police reports about violent incidents.

Starting point is 00:59:39 It's fascist message boards, all of this kind of stuff. Right now, if you've got an LLM that's helping process these things and you like ask it to summarize the themes from this fascist notice board, it's going to say no, right? A lot of the LLMs will just straight up refuse to process that, which as a journalist kind of makes them not, it doesn't make them useless, but it greatly limits how useful they can be in all sorts of different things. Like if you analyze 10,000 documents and it 9,999 of them, it does analyze and one of them it rejects. Maybe there was something important in the one that it rejected. Like this is very frustrating. But yeah, so working with things that sometimes say no is really confusing. It means that you have to, you always have to keep the human in the loop. Like I feel like anytime you have an LLM

Starting point is 01:00:25 doing something for you, and then the result of that is used for something, and there's at no point could anyone spot if something had gone wrong, that's going to almost certainly lead you into difficulties. But then there are things they are good at. My favorite application of LLMs in journalism, and I'm getting the impression this is one of the most important business applications generally, is this idea of structured data extraction so you've got a document that's just typed up or even handwritten and you need to pull out who are the people and what dates are there and like that that and what are the job titles they are so good at this so good at this and that's like data entry is one of the most frustrating aspects of anything

Starting point is 01:01:06 involving computers data analysis like journalists often need to do data entry on thousands of documents but they can't do that they haven't got the the person power to to go ahead and actually do all of that work giving them access to an llm that can do that data entry and with data entry if the llm gets it 95, that's probably what you'd have got if you got a room full of interns doing the same data entry. The accuracy is not perfect, which is unfortunate, but a lot of these things not being completely perfect is still incredibly valuable. So the AI features I built for Dataset, there's actually three. There's the one I talked about, the ask a question, get back a SQL query. There's one called Dataset Extract.

Starting point is 01:01:49 And the idea there is that you can define a SQL-like table. So you can say, I want a table with restaurant name, restaurant address, number of Michelin stars. And then you paste in the copy of an article that talks about new restaurants and Michelin stars, and it will populate the database for you. That works so well. That's like absolutely fantastically effective. I have a question on that. So in this case, a user is just sharing two things. One is, here's a document that talks about Michelin star restaurants and kind of prompting the system to say do X. But that's what's happening behind the scenes is a little more than that. So what are the, maybe I'm using the word wrong,

Starting point is 01:02:31 but system prompts that you then add to what the user provides to make it do the right thing? I'm going to have to look. I will look that up right now because I can't remember. I think it's a very short one. Let's see. I don't know if I'm even using a prompt for that one because I'm using the structured data. Like with OpenAI, you can give it a

Starting point is 01:02:53 structured, you can give it a schema, effectively a JSON schema. Oh, no, I get the user to provide additional prompts. So when you're doing this, you can put in an extra prompt that says only include restaurants that are at least two Michelin stars, like for example. And then I to provide additional prompts. So when you're doing this, you can put in an extra prompt that says, only include restaurants that are at least two Michelin stars, like for example. And then I have a tiny, I say extract data matching this schema. And then I give it the schema in terms of,

Starting point is 01:03:17 you know, there's name string, or it's an array of objects. And each object has a string called name and a string called location and an integer called number of stars. That's the thing like we can stick in the show notes it's very very simple because this is so sort of fundamental to what these things can do but yeah and then i let my users add additional like prompt instructions if they need to one something i use a lot is um when you're extracting the date, format it as year, year, year, year, month, month, day, day.

Starting point is 01:03:45 Little clues like that. Well, that's it. It's spectacularly powerful for how simple the underlying system is. That's actually a good example of a UI for these models that isn't just a chat UI as well. It's a paste in some text, or it accepts images as well. You can drop in an image, give it a schema you select from, you sort of type in the name and you select text from a dropdown, then type in a name and select integer from a dropdown.

Starting point is 01:04:10 That's effectively it. And it works. It works really well. And then my third feature is I have a feature where you can basically run a prompt against every row in your database table. So you might have a table with 100 restaurants in and you'll say, in a, you can say, enrich this data for each of these 100 rows, write a haiku about this restaurant and stick it in the haiku column. I, haikus come up a lot for this stuff. And that's it, it works. So yeah, that, those are the three things, the very, very sort of early, like steps in what's possible with this but yeah the applications to finding to data analysis data cleaning finding stories and data it's almost overwhelming how

Starting point is 01:04:53 much potential there is there so i want to try something today uh but one of the things that we do uh before we record a podcast is we research about the guest to educate ourselves to inform the conversation as well and guang has been building an amazing tool that helps us collect a lot of this information guang can describe more of what that does but i'm curious if you've been exploring lms a lot so i'm curious to get your input on this if our goal is to, given an internet famous person, we want to know more about what they've done in the recent past or let's say over the years and we want to get notes for where the conversation could go and obviously we want

Starting point is 01:05:36 to dig more into it. How would you go about doing something like this with LLMs? So for this particular thing, the one thing I would not rely on is them doing the research, them knowing about, because like we said earlier, for people who are internet famous, it will make stuff up all the time. What's way more interesting, it's find reliable information and dump it into the LLM. So go and like grab their RSS feed from their blog or all of their recent tweets, which is harder now because Twitter doesn't really have an API you can use. Really frustrating. But yeah, or transcripts from other podcast episodes that they've been in, anything like that.

Starting point is 01:06:14 And then what I'd do, I'd use Google Gemini because Google Gemini's signature feature is that it's got a one million token or even 2 million token context, which like clawed and opened AI cap out about 200,000. So it's like five times the amount of stuff that you can pipe into it. Plus Gemini can accept audio clips, which I haven't really played with very much yet. It accepts video. So what I would do is I'd experiment with audio and video, but out of interest. I wouldn't necessarily

Starting point is 01:06:45 trust those to be the most effective way of doing it. I'd basically try and gather as many tokens about that person as possible. So copy and paste crap out there, Wikipedia bios and anything that they've written, all of that kind of stuff, copy and paste all of that into Google Gemini and then prompt it with, we are interviewing this person, what are some themes that we should do? I think that will work amazingly well. I think you'd probably, as long as you're feeding it the source data

Starting point is 01:07:14 so that you know that the source data, again, don't even trust it to go and read web pages because who knows what it's going to do. But copy and paste is the best API, right? Copy and paste half a million tokens of information about that person in. I am certain you'd get good results out of that. I'm going to give that a shot. That feels like it would work really well.

Starting point is 01:07:34 The prompting trick that I use a lot is, especially with these longer context things, is I always prompt and say, identify core themes for topics we should talk about illustrate each one for each one provide two illustrative quotes from the source material so then it'll say you should talk to simon about his llm tool simon said quote llm is my something tool for something something something partly as a fact checking mechanism because then you can take the quotes it gave you and you could search in the source material and see if it made them up in my experience it doesn't make those up if you ask for direct quotes it might even like fix a type of fix the punctuation or something but i can't remember having asked it for direct quotes where it did completely invent a quote which is useful it's

Starting point is 01:08:21 not to say it wouldn't do it, but it's a good trick. Yeah, that's super helpful. Thanks for sharing that. And I wanted to talk a little bit more about LLM-enhanced development of sorts. So I like this quote that you had in one of your talks, where you said, LLMs kind of make you more ambitious and the way you go about thinking about technology or any new technology is how it makes things possible which are impossible before or how it makes you faster or build things faster given all of this that's going on can you share a few examples of where LLMs have made you more ambitious or have you tried things which you wouldn't otherwise? Some of the recent examples that you're most... So many. Yeah. I mean, so many.

Starting point is 01:09:12 This is the thing is that, so as a software engineer, when I'm building a project, I like to have confidence that I've got most of what I need to build that thing, right? If I'm going to have to like learn Objective-C from scratch to do a project, I can't necessarily justify investing the time. I will try, I will find a different project to do. LLMs have kind of changed that equation for me. My earliest example of this is I've like had a Mac for 20 years. I've never learned AppleScript because AppleScript is a weird, weird programming language.

Starting point is 01:09:44 Like I've heard AppleScript described as the world's only, it's a read-only programming language. If somebody shows you some AppleScript, you can go, oh, I get what that does. And then you sit down to write it yourself and you have literally no idea what you would do to make it do anything useful. ChatGPT, it turns out, is so good at AppleScript, right? It knows AppleScript. The thing I wanted to build is I wanted to export all of my Apple Notes into a plain text format. And I asked for the AppleScript to do it. And it knocked out six lines that looped through every Apple Note.

Starting point is 01:10:13 And for each one, I output the title and the body. And I ended up writing a little Python program on top of that that embedded AppleScript in a Python program. And now I've got a command line tool that can export my notes to a SQLite database. That project was impossible. It was impossible for me to build that previously, because I would have had to spend, realistically, probably a solid week getting my head around AppleScript, which is not a well-documented language either. And instead of that full week, I got a working prototype in five minutes that proved to me that the thing I wanted to build could be done. And once you've got, like, my style of development is all about research and prototypes. Like,

Starting point is 01:10:50 you build a prototype to prove that the thing is possible and to fill in those gaps in your knowledge about what you need to know. And then writing the software around it's easy once you've figured out the Apple script you need to get the notes out, whatever it is. So that was an early example. And that just keeps on going. I have production code written in Go right now, despite if you asked me for a for loop in Go, I would have to go and look it up. I'm not fluent in Go, but the code that I wrote in Go with the help of, I think that was Claude 3 Opus I used for that one,

Starting point is 01:11:20 it's fully unit tested. It's got continuous integration, so when I commit to GitHub, it runs the test. It has continuous deployment, right? If the test pass, it deploys the thing. All of these things which I see as essential for production grade software. And I feel good about it. Like despite the fact that I could not sit down and write it off the top of my head, I know that when I go and look at that code, it's good code. It's well tested. I've thought about the edge cases it's it's like and it's been running in production for six months and serving quite a decent volume of traffic that's really cool like being able to no no i no longer look at a problem think well ideally i'd use go for this but i don't know go so i'm gonna just cross that off the list just the other day i um what was the thing i was working on recently i built a a little Django application that was a, it's like a webhooks debugging application.

Starting point is 01:12:09 When you're working with webhooks, the thing you really want is just set up an endpoint that logs everything. And then you tell Stripe, hit my endpoint, and you get logs in your database showing what it sent you. And then you configure things out from there. And I've always wanted a Django app for doing this, but it would take like a day to build that. And I couldn't quite justify spending a day on it. I got Claude 3.5 Sonnet to write the

Starting point is 01:12:29 entire thing. And it took two hours from idea to having deployed working software with unit tests in production that was solving this problem for me. And it's a great example of a project where I could just about justify two hours on that problem. I couldn't justify any longer than that. Like I should just use something off the shelf at that point. So yeah, time and time again, all of these little projects would not exist without LLMs. Not because I couldn't build them, but because I couldn't build them fast enough to justify the effort. So this is fascinating because one thing that we see LLMs being really good at is code because you can test it, you can verify it. Text, prose, not as much because it's hard to verify.

Starting point is 01:13:12 In a lot of these projects, what does your typical workflow look like? So you mentioned like you sometimes have chat, you have to write code while you're walking your dog, which is amazing. For something like this where you spend, let's say, a couple hours. So can you walk us through what that looks like from prompting to actually getting the thing in production? I mean, it definitely varies.

Starting point is 01:13:35 There are two types of projects. There are the projects where I know it's possible already, like building a webhooks endpoint for Django. I know that's possible. I could absolutely just sit down and write that. So that doesn't need the exploratory prototype, right? Whereas there are other projects like exporting my Apple Notes. The number one question is, can I even do this? And so if it's got those unknowns, that's when I'll jump straight into a prototype. And that's normally just have an idea, prompt an LLM a few times, say, hey, can you write me? Oh, a great tip with LLMs, always ask for options. So I'll say things like, what are my options for exporting Apple Notes? And it might say, you could do this, or you could use Apple Script, or you could do this, or you could do

Starting point is 01:14:14 this. That's the best way to work with them. Because if you ask for one option, if you ask it a question, they'll give you an answer. And if you're lucky, it'll be a good answer. But maybe it's not ideal. If you ask it for options, one of those four or five options is almost always the best thing. And you're better equipped to evaluate than it is. Because, I mean, it's just a random number generator, essentially. But, you know, it can spit out the... So I'll often start with, okay, what are my options for solving this problem? Sometimes I'll say, write me the code for option three.

Starting point is 01:14:43 And I'll do that in... Normally, I'll have it write it in JavaScript or Python, because those are my two daily driver programming languages. Occasionally, I'll try it in Bash, if it's something I can use on the terminal. That kind of thing. So if there's a prototyping phase, I'll be using the LLMs as part of that prototyping to answer those questions. The moment it turns into a project I'm actually going to try and commit to, I start a GitHub issue for it. And sometimes I'll

Starting point is 01:15:09 start a GitHub issue just for the research, like maybe in my private notes, like figure out if I can export Apple Notes, and I'll just copy and paste things that I learned along the way. If it's going to turn into actual software, most of the software I build is Python. And it's mostly, it's Python packages that I can publish to the Python packaging index. And those come in basically three shapes. They're either a Python library that I'm going to import and use. It's a Python CLI tool. So something where I type LLM space, whatever. Or it's a plugin for one of my other projects where I install it into Dataset and it adds new functionality.

Starting point is 01:15:46 I've got cookie cutter templates for all three of those. So cookie cutter is this great little Python tool that will spin up the directory structure and the readme and the setup.py or pyproject.com, all of that kind of junk based on a few questions that it asks you. So I've got three public open source cookie cutter templates that I use to get me started on that. Those set up the initial file structure. They set up the GitHub

Starting point is 01:16:10 actions workflows for testing. They set up the workflow for publishing the package to PyPI. So if I've picked a name for it, I can write a bunch of code, push it to GitHub, click a button in Git, or I post a release on GitHub and that will be published to PyPI. So that entire workflow of writing the code, testing the code, documenting the code, publishing the code is all automated for the most part, which is a huge productivity boost. Like I can, I've written, I've got like command line tools that I've published to PyPI where I had them live on the package index within an hour of the idea of the tool. That's something... And that's because I've done it 250

Starting point is 01:16:50 times now. So you've got the automation in place. It's just a very, very quick habit. I love the idea of release early, release often for open source things. If it's an open source package, I will often... If I'm not confident yet, I'll put it as an open source package i will often if i'm if i'm not confident yet i'll put it as an alpha i'll say okay this is the 0.1 a zero alpha release and i won't release code that doesn't run at least but you know if i'm not quite confident that the design's right or whatever and um some of my projects languish in alpha state for far too long i'm also trying to get better at committing to a 1.0 release. I've still got my main dataset projects on version 0.65 right now, I think. So I've had like 65 releases and

Starting point is 01:17:31 I still haven't done the 1.0 and I really need to do the 1.0 for it. But yeah, so that's the process. I've written quite a bit about this. I've got some good sort of write-ups on how that all works. Oh yeah, we would love to link that in the show notes i think what's helpful here to note is that it's not just using llm to tell you what to do but in a way you were in the driving seat you're kind of having it just assist you uh where you still have a lot of structure around it to make you more productive yes where it's not just like i call it i call it my weird intern i'll take it i'll say to my wife i'll say to my wife and Natalie sometimes, hey, so I got my weird intern to do this.

Starting point is 01:18:07 And that works, right? It's a good mental model for these things as well because it's like having an intern who has read all of the documentation and memorized the documentation for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas

Starting point is 01:18:21 and they're massively overconfident. It's the intern that always believes that they're right but it's an intern who you can i hate to say you can kind of bully them you can be like do it again do that again no that's wrong no that's wrong and you don't have to feel guilty about it which is great like sometimes when you're working with other people and like they're like they've done five iterations and you're like you know what i'm still not entirely happy with this bit but come on i, I'm not going to make them do a sixth. That's just not fair. The LLM, you can do that, right?

Starting point is 01:18:48 You can just keep on having to say, oh, actually, you know what? Rewrite that whole thing in Go. Or one of my favorite prompts, one of my favorite prompts is you just say, do better. And it works. It's the craziest thing. It'll write some code. You say, do better. And it goes, oh, I'm sorry.

Starting point is 01:19:05 And then it will churn out better code, which is so stupid that that's how this technology works. Oh, yeah. But it's kind of fun. It reminds me of our friend Austin. So we have a common friend, Austin. If you tell, if anything, let's say if you're struggling with anything

Starting point is 01:19:20 and if you go to him for advice and if you ask him, hey, Austin, what do you think I should do? He has one answer for every damn thing and that's try harder and i think it works really well nice yeah sorry i think you were saying something no no that's not very true so in terms of interns like the good thing is you don't have just one you have many of them with like chad gbtpt cloud and whatnot and as you were describing some of your projects you mentioned have you used different ones uh for

Starting point is 01:19:49 different things i'm curious how do you go about like using one over the other is it more try what works or do you have a pattern at this point that you go to it's so hard it's so hard right it's um i've been calling this it's vibes vibes-based evaluation, right? Because the only way to figure out if a model is any good is you have to use it repeatedly a bunch of times and try different things about it. And some people are really like, they're really sophisticated about this. They have like a document full of all their test prompts that are run through the new models. I'm not doing that. I should be doing that. I sort of, I have a few, like I have a few prompts that i always run

Starting point is 01:20:26 against a new model just to try and get a feel for it but a lot of the time i go sort of based on vibes from other people like if a whole bunch of people are saying no seriously i was all about claude sonnet but now google gemini 1.5 is is better for these things then i'll i'll start experimenting with that one as well at the moment my daily driver is Claude 3.5 Sonnet. I think that's the best model, but the new Gemini 1.5 from like two weeks ago is getting massive buzz. So I need to spend more time with that one.

Starting point is 01:20:56 I still use ChatGPT for walking my dog. The voice mode is amazing. And for code interpreter. Like if I'm writing Python code and I want it to actually test that Python code for me and fix any bugs that it finds i'll go to chat gpt for that claude also the claude artifacts thing where it can build little interactive web apps it's amazing like i i i'm using that i use that to prototype up little like things that i'm actually building i use it to build little one-off tools like a little pricing calculator for something just for me to use um i really love that feature then and on the

Starting point is 01:21:31 command line i'm i love playing with the local models the ones that run on my laptop the problem is that they are never going to be up to the standards of like cloud 3.5 sonnet so for actual real work that i'm doing i I tend not to use them. But because of my LLM project, I'm constantly tinkering around with them. I also, I think they're really good for people learning LLMs because using a kind of crap one

Starting point is 01:21:55 that runs on your laptop, it hallucinates way more often. It makes more mistakes. It helps you get that mental model of what they're good at much better than working with the really good models. So I always recommend people like five three um gemini uh gemma gemma 2b is really good llama 3.18b is currently my favorite local model it's quite easy to run it's a four gigabyte

Starting point is 01:22:19 download if you get the quantized version it's it's genuinely useful like it's shocking it's definitely it feels equivalent chat gpt 3.5 at least and it's really amazing to me that a four gigabyte file can be that useful running on my own laptop like the compression of these things is extraordinary um but yeah so it's vibes it's it's it's it's vibes based it's frustrating i wish i had better benchmarks of my own to try these things out. And a lot of it also comes down to prompting style. Like some people will say, oh, no, I tried Claude and it sucked. And it's like, yeah, but maybe that's because the way you prompt LLMs and the way I prompt LLMs, it's not like I'm doing it right and you're doing it wrong.

Starting point is 01:22:56 It's that your way is more compatible with ChatGPT and my way is more compatible with Sonnet in ways that I don't fully understand. So you've been writing a lot about LLMs over the last few years. And as we were going through your blogs, there's a lot of new stuff that's coming out. And in general, one thing that at least I struggle with is just keeping up to speed with everything that's happening.

Starting point is 01:23:20 Yep. It's like two weeks, work's busy, our life's busy, and then suddenly something has changed. And I'm not spending as much time building things on top of LLMs, but I'm curious to just learn more and see what the capabilities are and how it can be useful. I'm curious how you stay up to date, one, and also filter signals from noise because there's just so much of it. Right. The big one is, so Twitter is still the, like I tried moving to Mastodon. I'm very active on Mastodon.

Starting point is 01:23:51 Mastodon is mainly AI skeptics who don't like this stuff. All of the AI people hang out on Twitter still. So I maintain presence on Twitter because that's where the AI conversations are happening. So I've, following a bunch of people helps. There are a few accounts that I turn on notifications for. So I get a push notification whenever Anthropic or OpenAI put out a tweet because it's always like that's where the big news comes from.

Starting point is 01:24:17 The other thing is like private groups. You know, I'm on a couple of WhatsApp groups. I'm in a bunch of different discords. Those are great. Like those are sort of the highest signal stuff will come from a discord I'm in with like 15 other people who are very engaged with this stuff

Starting point is 01:24:31 and we'll be sharing notes with each other in there. And then it's blogs. Like I blog, having a blog means a lot of this stuff comes to me. People will like tag me and say, hey, have you seen this new thing? It's relevant to what you were talking about last week. That's super useful. And that's it. And I've got an RSS reader that's subscribed to a bunch of things and sub stacks and so forth. But the other thing is like,

Starting point is 01:24:55 I don't have a, I'm not employed by anyone else. So if I want to spend a couple of hours because a big thing just happened, I want to research it. It's nobody to tell me not to, which isn't necessarily beneficial for my own projects. know it's i don't have that accountability but yeah i'm in a privileged position in that i can afford to invest the time in figuring this stuff out as well so uh you mentioned that you're an independent open source developer and this is something i want to talk about uh but one question that i wanted to ask before was we're talking about using llms to kind of improve your productivity and being able to build things faster. But there's one thing which comes up, which is like learned helplessness. In other words,

Starting point is 01:25:37 it's more like your kind of muscles are atrophied. And in this case, your skills are maybe atrophied, where you can't just write things off the top of your mind uh for example like i remember some time ago in the recent uh this year itself wi-fi was out and i was writing some code and copilot wasn't working i knew what i wanted to write but i was frustrated because the damn thing were just not autocomplete and i was like why is this not working and it's like like, oh, Wi-Fi's out. So I'm curious how you think about that in general. Yeah, I felt a little bit of that, to be honest. The other day, I went and reported a bug against GitHub Actions for like, I was saying, hey, I'm running a Windows GitHub Actions thing, and the version of Python can't load SQLitelite extensions and i thought you'd fix that this is really frustrating and then after i'd filed the

Starting point is 01:26:28 bug i realized that i'd got clawed to write my test code and it had just written sqlite code that doesn't it had hallucinated the sqlite code for loading an extension and i'd gone and i'd literally i'd reported a bug and i had to close that bug and say, no, sorry, this was my fault. That code is wrong. And that was a bit embarrassing. I should know more than most people that you have to check everything these things do, and it had caught me out. And I'd lost like half an hour of time as well to trying to figure out what was going on.

Starting point is 01:27:00 It turns out it just hallucinated the wrong way to use SQLite. Python and SQLite are my bread and butter. I really should have caught that one. So yeah, this has happened that my counter to this is I feel like my overall capabilities are expanding so quickly. I can get so much more stuff done that I'm willing to pay with a little bit of my soul, right? I'm willing to, I'm willing to accept a little bit of atrophying in some of my abilities in exchange for, honestly, like a two to five X productivity boost on the time that I spend typing code into a computer. And that's like 10% of my job. So it's not like I'm two to five times more productive overall,

Starting point is 01:27:35 but that is a very material acceleration. And like I said, it's making me more ambitious. I'm writing software I would never have even dared to write before. So I think that's worth the risk. A lot of people are worried about the impact this has on new programmers. And I've sort of got two conflicting opinions there. One opinion is, like I said earlier, people like the fact that you've got a semicolon, you lose half a day to figuring out the semicolon. That sucks, right? That's just inexcusably miserable. fixing that for people is is a wonderful thing i think it opens i think way more people are going to learn to program and i think that the people who are learned to program will be able to learn faster but i think there are skills they're not

Starting point is 01:28:15 that they're going to skip over i heard a kind of terrifying anecdote from a friend recently where they had a they knew somebody who was a new programmer. They were just getting started. They were a professional programmer or just like very early stages. And they, they were calling code. They used the word, something like goop. And they said, so I got, I got, I got a chance if you'd spit out some goop and I paste it in, it seems to work. And then, and this didn't work. So I'm going to spit out more goop and I paste that. And then now that's working. And they were asked, well, how are you going to maintain the goop in the future? And they said, oh, I'll just get it to write more goop. i pasted that and now that's working and they were asked well how are you going to maintain the goop in the future and they said oh i'm just going to write more goop and that idea

Starting point is 01:28:48 the idea that code is now goop as a programmer that offends my very soul like that's that's sort of horrifying but you know it's if you get working maybe maybe we are going to have to like we have we currently live in this in a in a world where half of the world runs on Excel spreadsheets with no unit tests, with four, which with no backup, no version control, no unit tests. And anyone can muck up a formula and the valuation of a company goes down by half overnight because that's the world we live in today. Right. Excel spreadsheets are kind of goop already. And somehow society functions. So maybe those of us who are like, no, every line of code has to be perfect. Maybe we're wrong.

Starting point is 01:29:33 Maybe actually goop is the way forward. But that's a little bit terrifying, you know. It is. I think that's precisely what I was thinking about. That with all of these LLM tools, the amount of goop is just increasing and not just in code, but in almost everything else. And I think you talked about slop as well, which is like the unwanted and not good AI content, especially images coming out of many, many countries right now. In general, if you think about how these LLMs have been helpful from a usability standpoint, one is they're super cool and exciting and they have way more potential than what we are seeing today.

Starting point is 01:30:15 They've truly been impactful in increasing productivity for software engineers, where someone who knows what they are doing. And I think we were speaking with Steve Yeagy week and he mentioned like LLMs are way more safer or way safer in hands of senior engineers who know what they need to be doing as opposed to someone who doesn't. But we don't control who uses this and how they use it. And if you think about the quality of software

Starting point is 01:30:41 that's actually coming out, and I was having a discussion with one of my friends recently, and we were talking about this, where these tools are amazingly helpful to make us productive who are quote-unquote some sort of an expert in a domain where you kind of know what's right but a lot of the other tools that come out they are super nice from a prototype standpoint from a demo standpoint but you don't see quite as good tools when it comes to a production system that you would fully rely on. I know RAG is a thing that people talk about too, where demos are amazing, but then production is like, well, would you trust it to give it to customers?

Starting point is 01:31:19 So I'm curious, what's your take on that in terms of the sheer quantity of things being built from a prototype standpoint, but the quality isn't quite there yet? It's really interesting, isn't it? Yeah, like that. I mean, so many of these things are completely open questions to me. I still don't like will society overall in like 10 years time look back on this and say, OK, this technology had more pros than cons. Or will we just be flooded in slop and be like, wow, I wish nobody had ever invented this stuff at all. And it's harder for me to evaluate that because I think programmers are the best equipped to use these tools. Like

Starting point is 01:31:53 hallucinations in code don't matter because when you run it, you get an error and you fix it, right? Like we are, and they're better at code than they are at anything else. So I'm getting enormous productivity boosts out of this stuff and it looks amazing is that just because i happen to be in the one profession in this world that is most attuned to the benefits these things can can bring you and then yeah in terms of quality one thing i've been thinking is you keep every now and then you hear a story of a company who've got software built for them and it turns out it was the the boss's cousin who's like a 15 year old who's good with computers and they built software and it's garbage software the quality is absolutely awful but you know it's it's how these things happen and maybe we've just given everyone in

Starting point is 01:32:37 the world the overconfident 15 year old cousin who's gonna claim to be able to build something and build them something that maybe kind of works and maybe society is okay with that maybe that because that this is why i don't feel threatened as a senior engineer because i know that if you sit sound down somebody who doesn't know how to program with an llm and you sit me with an llm and ask us to build the same thing i will build better software than they will right that that's there's no question about that at all um but yeah mate so hopefully sort of market forces come into play and people the demand is there for software that actually works and is fast and reliable and so forth and so people who can build software that's fast and reliable often

Starting point is 01:33:16 with llm assistant use responsibly benefit from that that seems okay to me but yeah i don't know it's um one of my a big big frustration i have is i want like lots of computer science papers come out about llms i want sociology papers i want all of the sort of humanities doing research into the impact on these things how do people learn to use them all of this kind of stuff is and i think that research is happening but in academia it takes two to three years to get a paper out so like we're seeing papers come out today that is talking about GPT 3.5 from like December of 2022, which is so outdated at this point. So outdated at this point. But yeah, it's frustrating.

Starting point is 01:33:56 There's so many open, big questions like this that we don't have good answers to. Yeah. We're starting a family pretty soon. So these are questions that at least I'm thinking about these days and struggling with and don't know the answers to. And I would love to get some of those research papers as well, which I may ask these tools to summarize for me, which is a different problem. Oh, yeah. I read academic papers now. I never used to read academic papers. But you can copy and paste the app. I built a GPT called Dejargonizer. And it's just a prompt that says, yeah, you paste a prompt that says,

Starting point is 01:34:25 yeah, you paste text and it says, find all of the jargon terms and define what they mean. And so I can grab an academic paper abstract and paste it into de-jargonizer, and then I'll understand it. Because they inevitably use like five terms I've never heard before, but it, yeah, that's so good. It's so useful for that kind of thing.

Starting point is 01:34:40 So we've been talking about how these systems are beneficial for senior engineers. And I've been having some conversations with some friends who have kids who are either starting school or already just starting computer science or looking for a job. Which job market has been much tougher this year and the last year in general. At least for entry-level engineers. But let's skip the job market problem for a second. In general, for junior engineers who have these tools at their disposal to be productive and learn things much faster, what advice do you have for them,

Starting point is 01:35:20 for them to also develop some of the skills that you only develop through making mistakes or just building things in production? So I'm not qualified to answer this question because I was a junior engineer 25 years ago. So I do not have the learner's mindset. I will answer it anyway. I think it's all about projects. I think build things that do something and ship them. There is my very strong hunch, and this is going back throughout my entire career,

Starting point is 01:35:49 the fastest way to learn anything in software is to build something with it. And also to get beyond tutorials. You know, tutorials are fine. You can go through a tutorial and build that thing. Those will not have nearly as much of an impact on you as saying, okay, I'm going to build a thing that does this or take the inspiration from the tutorial and build something else. It's also, it's great for hiring, right?

Starting point is 01:36:10 I've been a hiring manager in the past. If a candidate can show me stuff that they've built, that's worth more to me than any degree. Like I've hired people where we hired them. And then at the end of the process, I realized I never even asked them if they went to university because it didn't matter because they showed me cool stuff that they'd built, and they could talk through it. If you've got a great demo and I can ask, oh, how did you solve this problem? What else did you try?

Starting point is 01:36:34 We can have an amazing interview. Also, there's that whole the fizz, buzz, leet code side of interviewing. I hate that stuff. I absolutely hate that. If you've got code on GitHub, which I can read through and I can look through your commit history and see evidence that you know how to fix a bug in a for loop or whatever, I can, if I can hit it in a web browser and see that you've built something. On that basis, like I love, I'm a massive power user of GitHub. I love GitHub pages.

Starting point is 01:37:14 So you can like just build a little static web app, host it on GitHub pages. It'll live forever, right? And it's a URL that people can click on and they can start using it. If you're doing server side code, it gets a bit trickier. I've been, I was, I've used Vercel a lot in the past. Vercel, if you don't give them a credit card so that you can't get accidental denial of service billing problems, Vercel can be really good.

Starting point is 01:37:38 There's always places that you can host code online if you look around for them. But yeah, having live demos of things that you've built that are hosted online, I think is the best possible sort of resume and it's the best way of learning. And so to this day, I've got a tag on my blog called project and every time I do a project, I tag it project.

Starting point is 01:37:59 And right now it has 404, oh, good number, 404 items tagged projects. And that's over the course of 20 years, you know. So it's, but every single project that I do, I learn just the tiniest new thing. And it's also like, if I want to remember how to like do a screen, take a screenshot using the Playwright framework, I've written code on GitHub that does that and I can go and look at it. Or if somebody asks me, how do I take a screenshot with that? I can send them a link to the code that I wrote in GitHub. So it almost becomes an external memory of everything that you've ever learned to do.

Starting point is 01:38:35 But yeah, for me, I think that's it. If you're a new programmer, knock out projects. It doesn't matter what they are, weird little things, fun little things. It's also a great excuse to do writing because one of the two easiest forms of blogging are something that I learned or something that I built. You can do a blog entry where you just say, I wanted to solve this particular problem,

Starting point is 01:38:59 so I built this. Here's a screenshot of it. Screenshots are amazing because they never break. I love screenshots. If you build hosted software, it's going screenshot of it. Screenshots are amazing because they never break. Like I love screenshots. So if you build hosted software, it's going to break eventually. Take a little video, take a screenshot, stick those up. Like when I've coached people going through boot camps before,

Starting point is 01:39:18 and one of the things I always tell them is they always do this sort of end of boot camp project. And they'll have a GitHub repository with their project in it. And I say, invest in the readme. The readme needs screenshots of your thing. Like if I'm a hiring manager and I click through, I'm not going to check the code out. I'm not going to try and run it. But if there are screenshots and a couple of paragraphs saying how it works, that puts you in the top 1% of candidates if you've got a readme with a screenshot in it. So do that, right? Yeah. So I think my advice is do lots and lots of projects. Small, weird projects, whatever is best. If you can get them deployed, that's excellent.

Starting point is 01:39:50 Then have a readme with a screenshot in, and that's a really good way of learning. That's good advice. Talking about projects, I wanted to jump on to life as an independent developer. Now, when it comes to someone working at an employer, at a company, for example,

Starting point is 01:40:10 the kind of projects you work on are typically driven by some business priority. And there are just problems to solve, and you don't need to go look for them very often. People kind of tell you what the problems are. Well, let me put it this way. If you're lucky, you don't need to go look for interesting problems they kind of come to you uh at a company and you're going to work on those and there's always it's not your job at a company to figure out what what is the important thing to do that's what the management chain is for exactly so you always have the steady input of things to do and And at least these days, everyone I speak with has more work to do than they have the time for.

Starting point is 01:40:49 But when it comes to working independently and having to define how you spend time, one needs to be very disciplined. Also have a way of identifying what you work on. So I'm curious how you do that, which is like figuring out how do you work on and keep a structure that keeps you going. Honestly, that is the hardest problem. It's really, really difficult. So I'm in a very privileged position in that my wife and I ran a startup for a few years. We sold that startup. It made us enough money that I don't, it's not that I don't ever have to work again, but I have a substantial runway where I don't have to worry about an income, which is almost like a requirement

Starting point is 01:41:28 for if you want to go out independently, especially doing open source stuff, right? It's very, very difficult to make. And I'm starting to spin up sort of consulting things and so forth, because I want to extend that runway. And ideally, I want to do what I'm doing right now for the rest of my life, right? To do that, it needs to be funded.

Starting point is 01:41:44 It needs, I need to have a repeatable source of income for, right? To do that, it needs to be funded. I need to have a repeatable source of income for it. So I've been building a software as a service version of my main dataset open source project. That feels like, in open source, that's one of the most proven business models. It's like WordPress, right? WordPress is open source or you pay automatic and they run the hosting for it. And they built a really successful business around that. I essentially want to do exactly that, because also it's kind of lonely, right, working on your own projects.

Starting point is 01:42:10 I would like to be able to employ a full team of people to work on stuff with me. Like, that's the sort of big ambition. But that said, honestly, like, what to work on next prioritization is so difficult when you don't have any external sort of forcing factors one of my big thing my i mentioned my week notes earlier just forcing myself to be accountable every couple of weeks to write the stuff up i don't care if anyone reads them or

Starting point is 01:42:35 not like the week notes are entirely for me they're for me to track what i've been working on and the progress towards things um i try and set myself deadlines. I occasionally do conference-driven development where you sign up to give a talk at a conference and you're like, this project needs to be in a state where I can actually present it on stage. The dataset AI features are almost all conference-driven development. I was speaking at a journalism conference about ways to use AI in journalism

Starting point is 01:43:01 where I better have the features ready by then. So yeah, it's really difficult, especially since in the AI space and in software engineering generally, everything is interesting. In the language model space, I've been calling it recursively interesting because any aspect of it that you look at, like audio models that can process images, or how does the training work, or how does fine tuning work just raises more questions you can just keep on getting deeper and deeper and deeper into any of these spaces so I don't think I have a good answer to that question to be honest like I've been kind of coasting on the fact that I don't have financial incentives that that force me to do

Starting point is 01:43:41 something and letting myself go run wild with all of these different projects and i would my number one goal to be honest is i'd like to be more disciplined in terms of saying okay here are the big goals how can i go after those i do have a goal at the moment and so my main software data set it's for journalists to try and find stories and data my ambition is i want someone to win a pullet surprise where for a piece of investigative reporting where my software was one of the tools that they used so I want dataset to be part of the mix in some pullet surprise winning investigative reporting and that's useful because I can say to myself okay am I building the right features am I engaging with the right people am I making sure it's easy enough to use and all of that kind of stuff? So that's the sort of like one of my sort of guiding ideas at the moment is that.

Starting point is 01:44:28 And so I can ask myself, is the thing I'm working on right now on the path to somebody else winning a Pulitzer using my software? But, yeah, it's I could do with I sometimes think I sometimes wish I was like raising money from investors just so I had somebody breathing down my neck saying, you said you were going to get this thing done this is the focus have you done it yet but yeah I'm still still completely I'm free of all influence at the moment it seems like in many of these cases creating some sort of uh force and function helps could be like you said content development absolutely uh by the way you mentioned the startup that you uh ran with your wife which got acquired congratulations uh i think it got acquired by eventbrite if if i remember it correctly that's right yes um and you also were at eventbrite for

Starting point is 01:45:16 some time after that and or is that true i think yes six years at eventbrite yes yeah and then you decided to go independent so i'm curious what prompted that decision to then not continue on the job because looking at your career, I think you could have gotten any job if you didn't want to continue at Eventbrite, but you chose to be independent. I'm curious what prompted that decision. So what happened is I was at Eventbrite for six years. I was a director of engineering, focusing on APIs and scaling and internal platform stuff and so forth. And then later on, I moved into more of a prototyping and R&D role, which sort of suits my interest a little bit better.

Starting point is 01:45:56 I had this opportunity come up where the University of Stanford have a fellowship program for journalists, where the idea is, it's called the JSK Fellows. And the idea is they take sort of mid-career journalists and they pay them to spend a year on campus at Stanford, effectively working on a project that is beneficial to the future of news. And that's a very, very loosely defined. And I heard about this thing and I got in touch and I said, well, I'm not technically a journalist, but I've worked in a lot of newsrooms. I've worked for newspapers. I build tools for journalists. I've sort of I'm effectively a data journalist. Could I be a good fit for this program? And so I ended up being the person on this program who was a bit of the sort of the wild card. Right. I wasn't formerly a journalist, but I was working on journalist adjacent projects.

Starting point is 01:46:42 And it was amazing. And it completely ruined me because they paid me to spend a year working on whatever adjacent projects and it was amazing and it completely ruined me because they paid me to spend a year working on whatever I thought was most interesting and once you've done that it's very difficult to go back to having somebody else set the define what it is that you were going to do so basically that was the problem is that I experienced freedom for a year and I'm like I do not want to this up. I'm having so much fun working on these things. So that's amazing. And the last question I had on this topic was you mentioned running the startup with your wife. Now, in many cases, this equation doesn't always work as productively where people who are partners or who live together don't always end up working

Starting point is 01:47:26 well together because of all sorts of frictions i'm curious how so we've we'd been together for at least 10 10 years at that point um before we got married and we had worked on projects together before we'd worked at the same companies in some situations um we had a whole bunch of little side projects that we'd built collaborating together, which meant that, and that was really important, because we already knew that we could work together. We knew we had very complementary skills. I do backend development and systems operations. She does design and frontend engineering. So between the two of us, we can build a really good web application together. And the project that ended up being our startup was it started as a side project actually started on

Starting point is 01:48:08 our honeymoon we um we got married and we set off on honeymoon where the plan was to travel around the world for like a year plus met with our laptops occasionally maybe doing a little bit of like um like freelancing work to remotely to to to to keep to keep money coming in um and we got as far as morocco um amazing place place to travel and we got food poisoning in casablanca and it was during ramadan and in casablanca casablanca is not really tourist trail in morocco so during ramadan everything shuts down um and so we basically we basically went we had to rent ourselves an apartment so we could cook for ourselves to try and get through this. And since we were stuck there for two weeks, we said, OK, we've got this idea for a website to show what conferences our friends are going to.

Starting point is 01:48:53 Let's build that as a little project and put it live. And it was built. This was 2010 we were building this. And we built it on top of Twitter. We're like, hey, Twitter knows who our friends are and we follow people who we were building this. And we built it on top of Twitter. We're like, hey, Twitter knows who our friends are. And we follow people who we like on Twitter. So you could sign it. It was called Lanyard. And the idea was you sign it to Lanyard Twitter. And it goes, oh, you follow these 50 people. They are attending or speaking at these 10 conferences. Here are conferences you should know about. And it works extremely well because it turns out people who speak at conferences have a lot of Twitter followers.

Starting point is 01:49:27 And we actually built the database where we'd say, oh, at so-and-so is speaking at this conference, even though they weren't a user of the site yet. So when we launched, we had like a hundred speaker profiles and zero users, like just the two of us. But that was enough that if anyone signed in who followed one of those speakers, they'd get a recommendation, which felt like magic.

Starting point is 01:49:44 People like, oh my God, this thing knows everything. It's a hundred rows in a MySQL database. That's the whole thing. But it worked, right? And so that, we ended up applying to Y Combinator, the startup accelerator from Cairo.

Starting point is 01:49:59 No, from Luxor in Egypt. So there's a video out there, which is Natalie and myself standing in front of this ancient Egyptian temple, pitching our YC idea. We don't mention the temple at all. We just played it completely cool that there was this, no, that was in Aswan. It was the Aswan temple behind us. That was kind of fun. Right. And so we applied to Y Combinator. We got in, our honeymoon turned into three months in Mountain View in California doing Y Combinator, which was a little bit different.

Starting point is 01:50:27 And then we raised money from that. We hired a team in London. We spent three years sort of building the startup before we got acquired by Eventbrite, who moved us out to California. So that's how we moved to America. But yeah, so it was a fun startup experience. But yeah, the whole thing. And like I said, it started on our honeymoon. It was a good thing that we'd been together 10 years already and we knew we worked on projects because

Starting point is 01:50:48 it's it's it's a tough thing you know the the when you're literally married to your co-founder you have to set rules like no talking like no no talking about the company beyond like six in the afternoon six in the evening that kind of thing um which we did not stick to but it's hard that was the thing. And so Natalie wrote up a really good story of the whole sort of startup story, which I can share a link to as well. Oh yeah, for sure. That'd be really cool.

Starting point is 01:51:13 That's a fascinating story. Thanks for sharing. So Simon, this has been an amazing conversation. Thanks for spending way more time with us than we actually planned for. We had a blast. We got to learn a lot, listen to a lot of good stories and learn about how you use these tools.

Starting point is 01:51:28 Is there anything else you would like to add before we go? I think, yeah, the one thing I'll add is, as practitioners using LLMs and using AI, we understand this stuff better than 99% of the population, which I think puts a responsibility on us to figure out the positive ways of using this and then to share that. Like my sort of overall approach to ethics around this is that we're not going to un-invent this technology. So if we can figure out what are the things we can do that generally enhance people's lives that make the world a better

Starting point is 01:52:01 place, those positive impacts, and if we stay away from generating like garbage slop and dumping that on people, that feels right. So I feel good about the way I'm interacting with these tools, mainly because I'm trying to help other people learn how to use them effectively and sort of get over the kind of weird science fiction fear of this stuff and say, okay, these are quite dumb. They are good at these certain things. If you learn, if you put the work in to learn how to use them, they can have a really positive impact on what you're doing. That's really well said. Thank you so much, Simon. This has been an amazing conversation.

Starting point is 01:52:38 Thanks a lot. This has been really fun. Oh, by the way, I didn't, didn't mention this before, but I would say two things that I saw in your talk and we'll link to that in the show notes too uh instead of generative ai i think you called it transformative ai which is pretty amazing and instead of artificial intelligence you said imitation intelligence which i was which i thought is so accurate uh so thank thank you for those terms oh and now i'm thinking of the third thing too. You also coined the term prompt injection, which was- Oh, yes.

Starting point is 01:53:08 We haven't talked about that yet. Yeah. Prompt injection, it's the security attack against applications built on top of models. We won't go into it now, but if you aren't unaware of prompt injection, you will build stuff with horrifying security holes in. So you need to learn about this one. then yeah the um imitation intelligence i i owe the

Starting point is 01:53:30 world a full write-up of this it's an idea i threw out in a pycon talk a few months ago yeah i feel like artificial intelligence has all of these sort of science fiction ideas around it people will get into heated debates about is i don't think this is artificial intelligence at all all of that kind of stuff i like this so i've been thinking about it in terms of imitation intelligence because everything these models do is just imitating something that they saw in their training data like and that actually really helps you form a mental model of what they can do and why they're useful and it means that you can think okay if the training data has shown it how to do this thing it can probably help me with this thing if you want to, if the training data has shown it how to do this thing, it can probably help me with this thing. If you want to cure cancer, the training data doesn't know how to cure cancer.

Starting point is 01:54:09 So it's not going to come up with a novel cure for cancer just out of nothing. And then what was the other one? The other one was... Transformative AI. Oh, yes. I like... I feel like when you call something generative AI, that instantly makes people think, oh, it just generates random rubbish, right?

Starting point is 01:54:27 It's OK, it'll cheat and write an essay for you, but it'll create horrifying images. But is that really that valuable? The most interesting application to these tools are transformative. It's when you feed in the transcript of a podcast and say, hey, pull out all of the show, anything that should be in the show notes, which I always do for these kinds of things. Now, that kind of stuff is so much more interesting to me. And so, yeah, I like that idea of emphasizing that it really is like what you get out is as good as what you put in, but you can put in a lot of stuff. There's a lot of interesting applications that just pump in a bunch of things, ask the right questions, and you'll get much more reliable and interesting results out of it that

Starting point is 01:55:05 way sure well as we're finding out there's a lot more to talk about and we hope there is a second time and we bring you you come back on the show uh but today thank you so much simon this was this was amazing thanks a lot hey thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

Your Ad Here

Software Misadventures - LLMs are like your weird, over-confident intern | Simon Willison (Datasette)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.