Python Bytes - #473 A clean room rewrite?

Episode Date: March 16, 2026

Topics covered in this episode: chardet ,AI, and licensing refined-github pgdog: PostgreSQL connection pooler, load balancer and database sharder Agentic Engineering Patterns Extras Joke Watch on ...YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: chardet ,AI, and licensing Thanks Ian Lessing Wow, where to start? A bit of legal precedence research. Chardet dispute shows how AI will kill software licensing, argues Bruce Perens on the Register Also see this GitHub issue. Dan Blanchard, maintainer of a Python character encoding detection library called chardet, released a new version of the library under a new software license. (LGPL → MIT) Dan is allowed to make this change because v7 is a complete “clean room” rewrite using AI BTW, v7 is WAY better: The result is a 48x increase in detection speed for a project that lives in the hot loops of many projects. That will lead to noticeable performance increases for literally millions of users (the package gets ~130M downloads per month). It paves a path towards inclusion in the standard library (assuming they don’t institute policies against using AI tools). Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+ An individual claiming to be Mark Pilgrim, the original creator of the library, opened an issue in the project's GitHub repo arguing that Blanchard had no right to change the software license, citing the LPGL requirement that the license remain unchanged. A 'complete rewrite' is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a 'clean room' implementation). Blanchard disagreed, citing how version 7.0.0 and 6.0.0 compare when subjected to JPlag, a library for detecting plagiarism. Blanchard told The Register he had wanted to get chardet added to the Python standard library for more than a decade since it’s a core dependency to most Python projects. Brian #2: refined-github Suggested by Matthias Schöttle A browser plugin that improves the GitHub experience A sampling Adds a build/CI status icon next to the repo’s name. Adds a link back to the PR that ran the workflow. Enables tab and shift tab for indentation in comment fields. Auto-resizes comment fields to fit their content and no longer show scroll bars. Highlights the most useful comment in issues. Changes the default sort order of issues/PRs to Recently updated. But really, it’s a huge list of improvements Michael #3: pgdog: PostgreSQL connection pooler, load balancer and database sharder PgDog is a proxy for scaling PostgreSQL. It supports connection pooling, load balancing queries and sharding entire databases. Written in Rust, PgDog is fast, secure and can manage thousands of connections on commodity hardware. Features PgDog is an application layer load balancer for PostgreSQL Health Checks: PgDog maintains a real-time list of healthy hosts. When a database fails a health check, it's removed from the active rotation and queries are re-routed to other replicas Single Endpoint: PgDog can detect writes (e.g. INSERT, UPDATE, CREATE TABLE, etc.) and send them to the primary, leaving the replicas to serve reads Failover: PgDog monitors Postgres replication state and can automatically redirect writes to a different database if a replica is promoted Sharding: PgDog is able to manage databases with multiple shards Brian #4: Agentic Engineering Patterns Simon Willison So much great stuff here, especially Anti-patterns: things to avoid And 3 sections on testing Red/green TDD First run the test Agentic manual testing Extras Brian: <code>uv python upgrade</code> will upgrade all versions of Python installed with uv to latest patch release suggested by John Hagen Coding After Coders: The End of Computer Programming as We Know It NY Times Article Suggested by Christopher Best quote: “Pushing code that fails pytest is unacceptable and embarrassing.” Michael: Talk Python Training users get a better account dashboard Package Managers Need to Cool Down Will AI Kill Open Source, article + video My Always activate the venv is now a zsh-plugin, sorta. Joke: Ergonomic keyboard Also pretty good and related: <code>Claude Code Mandated</code> Links legal precedence research Chardet dispute shows how AI will kill software licensing, argues Bruce Perens this GitHub issue citing JPlag refined-github Agentic Engineering Patterns Anti-patterns: things to avoid Red/green TDD First run the test Agentic manual testing <code>uv python upgrade</code> Coding After Coders: The End of Computer Programming as We Know It Suggested by Christopher a better account dashboard Package Managers Need to Cool Down Will AI Kill Open Source Always activate the venv now a zsh-plugin Ergonomic keyboard <code>Claude Code Mandated</code> claude-mandated.png blobs.pythonbytes.fm/keyboard-joke.jpeg?cache_id=a6026b

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 473 recorded March 16th, 26, and I am Brian Okin. And I'm Michael Kennedy. And as often lately, this episode is sponsored by you and us for everybody that supports the show through Patreon or through mostly through a lot of our offerings like the courses at Talk Python Training and Python. dot com and books get lovely books coming along um i might bring up a book uh later in the show but we'll talk about it anyway uh thanks a lot for everybody's supporting us um it keeps us going and also thank you to everybody that sends in uh topic ideas either by going to python bice dot fm and submitting something through the contact form or um we're going ahead and sending it to us on
Starting point is 00:00:53 socials. So we're at Blue Sky and Mastodon, and all those links are in the show notes, or at Pythonbytes.com. You also can watch the show if you'd like, either real time or after the fact. You can join us at Pythonbytes.fm slash live and be part of the audience. And one of the fun things about that is while we're recording this, you can now add comments and we might comment back or highlight your comment. That's fun. Anyway, the last thing I want to bring up is that But you don't have to take any notes while we're talking because it's all the stuff is, it's on the show notes to the links. But if you'd like that delivered right to your inbox, plus a little background information,
Starting point is 00:01:35 some extra stuff, especially helpful for if we're covering a topic that you're slightly thinking about, but maybe not. We'll send you some extra information. And you just sign up to be a friend of the show at Pythonbytes. FM and say join the newsletter. With that, what do we have to start with? Well, I've got a doozy. Oh, as they say, a doozy.
Starting point is 00:01:57 And this one comes to us from Ian Lessing. So thank you for sending it in, to your point about sending us ideas about the show. This one somehow missed my radar, but it shouldn't have. So, yeah, it's a big one. So this, you probably have seen char debt as in character debt. Maybe it's care debt. I don't know. I'm always, you know, sidebar.
Starting point is 00:02:17 It's always funny to think about abbreviations like lib or lib, you know. It's, if you pronounce it, L-I-B, I think it goes lib. But if it's an abbreviation of library, shouldn't it be libe? I don't know. So anyway. No, because that's weird. I know it is weird. So, Caradette is a library that, believe was originally done, originally created by Mark Pilgrim, but now was maintained by Dan Blanchard.
Starting point is 00:02:43 So I think this is, this alone makes it interesting because there's something happening with this project. And a lot of people are pushing back saying the maintainer, can't make a change to it, but they're the maintainer. Like not, I don't want you to, but you can't. So here's the headline. The care debt dispute shows how AI will kill software licensing, argues Bruce Perens and subtitles, this comes from the register. Alarm bells are ringing in the open source community,
Starting point is 00:03:10 but commercial licensing is also at risk. So told you it's a juicy. What is going on? So earlier this week, Dan Blanchard, and I want to point out, the maintainer of the library, released a new version of library under a new software license. It was L-GPL and he released it under MIT. In doing so, he may have killed copy left because, well, MIT is a do whatever you want, just don't sue me about it. You know what I mean?
Starting point is 00:03:33 License, which almost all the stuff that I do is MIT as well. I just want people to just, I'd rather have people just have access to do whatever. Like, I don't want people to say, hey, could I get an exception to use this for this thing like that? Like, nothing I'm doing is that important, you know? Yeah, and I think of the MIT stuff is like a good license. If you don't care if people use it in their commercial product. Yes. And if it's okay, I was not familiar what this is. I still don't quite get.
Starting point is 00:04:00 What does KERDETA do? So it is a character detection library that's used by millions of projects. Let me see. It has 130 million downloads a month. So it's used by a lot of things. Okay. Character and coding detector. Yes, yeah, exactly.
Starting point is 00:04:19 Like UTF, I think. you know, is this UTF or Unicode or is it what, how do I, I have bytes. What am I going to do, you know? So it takes a guess, right? Right, you're getting bites and you're not really sure. It's like, wasn't declared or whatever. So previously this was an L-GPL project. Dan Blanchard wanted two things from what I can read between the lines, putting words into their mouth. One, wanted to dramatically improve this library to make it better. Check. You'll see he did that. Two, wanted to set the stage such that this can just be part of Python. Like, could this just be part of the standard library?
Starting point is 00:04:56 So previously, there's this move to say we should have less in Python, and I agree, but detecting whether or not something is getting code or ASCII or whatever, maybe that does belong in the library. Anyway, that was the goal. Like, could we put it in there? Well, LGPL says no. It would change the license of Python, I believe, right? So as long as it's a GPL-based license, you can't.
Starting point is 00:05:18 can't move this library into the standard library. I don't know if the core developers, or even if Dan is a core developer, was interested in this, but that was one of the goals, right? So, no problem, we're gonna change it. Well, an individual claiming to be Mark Pilgrim because, yeah, verify people on the internet for sure, the original created the library.
Starting point is 00:05:37 So it's a little bit like Flask where Armand Ronecker created it and then now David Lord maintains it. And David Lord gets to do wherever he wants with it. It's his project now as the maintainer. But I'm sure Armin still has, influence over the community's opinion if you were to take a strong position one way or the other, right? Neither of them chimed into this as far as I know. I don't think. Maybe Armand did. I can't remember. There's a lot of chat about it. I still think it's kind of like giving somebody a puppy and then telling
Starting point is 00:06:03 them what they have to, where they have to take it, what vet they have to take it to. Yes, exactly. Yeah, exactly. I'm leaning on the side of Dan Blanchard here. Just setting the state. I have a slight, there's a lot of complexity. And I'm not like totally, just say this is how it is. But let's keep setting the stage. So Mark says, you can't do that. You can't change an LPGL. I believe that's the typo. L-GPL license requirement. It requires that the license remain unchanged. License code when modified must be released under the same GPL license. But I get that when somebody gets it from the source. They make a change. It must be released under the same license. As the owner of the project, I thought you could change the license.
Starting point is 00:06:48 on Nuco. I don't know. It's your software effectively. If you want to change the license of it, I don't know. This is a little bit of shaky ground here to say that you can't change the license as the owner of the license. You know what I mean? Anyone else in the world should not, they have to follow what I just read. But as the owner license, is that true? Well, so here's what Dan did. Dan said, here's what I'm going to create a new better version. I'm going to rewrite this entire project from scratch, not using any of its source code. and re-release it into the same package channel as the old one. Okay?
Starting point is 00:07:22 Now, one of the problems under that is, as the maintainer, he's deeply familiar with how it works. And one of the challenges is, if you know how it works, your idea, it's like hard to do a fresh from scratch rewrite if it's burned into your mind, how it works, you know what I mean? So what he did is he just gave the specification to Claude
Starting point is 00:07:43 and said, Claude, write this so that the test pass. And Claude wrote it. And it wrote it extremely differently. There's a plagiarism detection algorithm. So it's probably more for English, but whatever. It said it's only 1% similar to version 6. Version 7 is only 1% similar to version 6.
Starting point is 00:08:03 So that means it's pretty different. Dan also said it's like structurally the files are not named the same. They're not organized the same. It is basically not at all the same thing. The only thing that's that 1% is like arg pars structure and stuff. you have the same arguments, you know? And so they believe this is, there's nothing here. This is a new project. This is what gets the MIT license. Now, to be clear, this is a mega improvement. It results in a 48 times improvement in detection speed. It now supports multi-threading
Starting point is 00:08:36 for Python T. You can do free-threaded Python and it supports that. There's a lot of benefits to this new version. So I don't think anyone is saying you've messed up the library. It's like clearly a better library. It's only this, we hate AI or AI is theft or like there's a lot of these different angles that are like focusing in like a laser onto this change. You know what I mean? Yeah. So Dan says, I was just trying to accomplish these goals with the tools and times I had available. I never have been paid to work on this and I have a full-time job.
Starting point is 00:09:08 You know, software licensing and the laws around it haven't been tested a lot in this new world of AI system development and a long time open source developer. I'm also curious how this is going to shake out. Yeah. But somewhere it says, yeah, after maintain this library for years, I've wanted to make these improvements, but I couldn't. FOD gave me the ability to do this in roughly five days, right? So I think this is also really interesting. But why change the license?
Starting point is 00:09:33 Because he wants to put it into Python. Oh, okay. Yeah. Or maybe he just wants people to be able to more freely use it. And he just doesn't care about copy left. I don't really know. But I believe, for the article makes it imply like he wants to put it into Python and L-GPL to make that not possible.
Starting point is 00:09:48 And Arnaker actually did, or Armin actually, Roaniker actually did post about this saying that he welcomed the license change and he's wanted it for years. And what I was expected he wanted to say as well. Now there's a issue that has been created that version 7 presents unacceptable legal risk to users due to the copyright controversy. There's so much.
Starting point is 00:10:11 There is so much going on here. I don't know, because to me the license only goes less restrictive to more permissive or more restrictive to more permissive and if it turns out that it's the old version you know you're back to where you started so i don't know i'm telling you there's a lot of um this may be the bigger issue is there's issue uh 327 uh fired by being mark pilgrim hi i'm mark pilgrim the title is no right to relicense this project and it's absolutely um toxic is the word so i don't know it's it's very interesting here i i mean version six is still there, people can just keep using that or fork it if they want, if they still want the old license.
Starting point is 00:10:50 I think what's happening is this is becoming a lightning rod for the debate of licensing, intersecting with agentic AI. I mean, how many people actually care that much about character detection? You know what I mean? It's an utility. Apparently a lot of people. I know a lot of. It's extreme. This is like, there are just pages of stuff to go through on all of this, like page after pages. It's crazy. So, okay. I understand the sensitivity of it. I forgot my popcorn. I know. I know.
Starting point is 00:11:18 And I get it that AI ingesting the world's work and then turning that into automation, I'm not even sure where that sits legally. At the time, it felt like a lot of theft. I'm not sure if it's a good tradeoff or not. I don't get how that relates to this project, though. Okay, so let me add it one more detail. Like I said, there's a lot going on here. We'll wrap this up pretty soon, but it's a super interesting discussion.
Starting point is 00:11:42 I think so one of the reasons that said, she said, well, I did it with ClodCode. They said, well, it doesn't matter. ClaudeCode trained on GitHub. Therefore, it trained on the original source code of Cardet. Therefore, it's not a clean room re-implementation. So I asked. I don't know if that matters because I can lead one company, go work at another company. As long as I don't take the source code, if I just take what I remember and do similar work, I'm allowed to do that.
Starting point is 00:12:06 Yes, you're a human being that gets to interact in the world. Yeah, exactly. Yeah. It's not like, well, I saw a picture once, and it was of a tree and it was copyrighted. never ever create a picture of a tree again like because I looked at it right yeah I I I that's why I said I'm on the side of I'm I'm down I feel like I'm on Dan's side here so I let me look so I pull I wrote a little I didn't write I ask Claude for some research on like well what is the legal precedence of this here's the situation at least in the US are there rules rulings that have come down previously so I put a little
Starting point is 00:12:38 document up for people to look at but it says the closest precedence is this Thomas Reuters versus Ross Intelligence, where somebody, can remember here, they took a bunch of Westlaw headnotes for legal advice and then did their own custom AI training on it and built an exact tool for legal research. That turned out to be a violation. But the exists here, this is, this is interesting. The existing copyright framework requires two things to prove infringement. Access to the original work. Check, Claude did have access to the original work. And substantial similarity in output. No.
Starting point is 00:13:13 Not even close. Not even 1%. I know it's at 1.3% but that was like structure of argpars. You know what I mean? That's argpar's structure. That's not charted structure effectively. So I think this strongly fails.
Starting point is 00:13:28 Those are the two criteria that you have to have to prove similarity. But there's other stuff. The emerging judicial consensus is that developing is developing that Training a general purpose A.M. Model is highly transformative, therefore is free use. But there were some specific examples where it wasn't. The U.S. Copyright Office's position is that using copyright materials for AI model development may constitute prima facie infringement. And what's really crazy, Brian, is if things like that said, no, this is copyright infringement,
Starting point is 00:14:02 like what happens to everything created by AI, period? You know what I mean? And I don't know how that's going to shake out, but it's so far down. I mean, let's take like the extreme case. and you go, well, you know what? Although current models have been trained on license stuff. So let's just like not. Just start over.
Starting point is 00:14:20 It's going to cost a ton of money to retrain a model. But do it right. The only transfer of the stuff that's available license-wise. Right. You could just look and say, is it a GPL license? We're not trading on that. Is it an MIT license? It's on.
Starting point is 00:14:34 You know what I mean? Yeah. There's probably plenty of information still out there to build out your models. Anyway, it's pretty wild. I think people can have a look. I would certainly say the folks who took the time to comment are very much against this. There's a lot of toxicity and Dan support going out to you just mentally because I've been on the receiving end of these types of things and they're not fun. But I'm, I kind of, I think Dan has a point here.
Starting point is 00:15:02 However, this could all be solved if he just said, okay, version 7 is char dot 2, new project and just put like a straw more and like we will we will never change. charge out one ever again except for security patches and then all the things that depend upon are go fine we'll just take this one like I want 48 times faster and multi-threaded sounds better to me let's just do that yeah and if like we push it too hard though one option is he just stops maintaining it and doesn't transfer maintain or ship to anybody else yeah yeah and we don't want that either so yeah yeah I certainly I think this this debate has far be far outgrown character and coding concerns it's its own special lightning
Starting point is 00:15:41 like I said. Yeah. All right. How are you? Can we talk about, I got just a small tool that I, this is a small tool suggested by Matthias. Well, it's not a small tool, but it's quick to cover. Refined GitHub.
Starting point is 00:15:59 And this is awesome. I didn't know about this. So this is a web page or website. Web browser plug in. Browser extension. Browser extension. Thank you. That does some cool stuff if you work with GitHub a lot.
Starting point is 00:16:15 And I, you know, looking through this, I'm like, what's wrong with GitHub right now? Well, there's a bunch of stuff. The highlights, there's some highlights of the top, makes white space characters visible. That's cool. So you can, I mean, that's not cool enough to get this, but there's a lot more coming. It tells you whether you're looking at the latest version of the repository, if there's any unreleased commits. That's kind of neat. The shows how far behind a PR head branch is tells you its base commit.
Starting point is 00:16:47 There's a bunch of stuff here. I'm going to highlight down to some of the stuff that. One of the nice things, there's lots of features, but they put fire beside things that you might care about. Like adds a build CI status icon next to the repo name. Love that. Adds a link back to the PR that ran the workflow. That's cool.
Starting point is 00:17:08 This one, I installed it just, for this one feature, enables tab and shift tab for indentation in comment fields. Because if you're in a web browser, you hit tab, it goes to the next field. I just want to put a tab in the field. Anyway. For Python people, it might not matter that much.
Starting point is 00:17:28 But if you're doing C++ or something, you don't want to make spaces. I still hit tab. I just expect it to add four spaces. But anyway, let's see. Auto-resize the commit field. add reaction, reaction avatars showing who reacted to a comment.
Starting point is 00:17:46 That's interesting. The other one that I want to highlight just to, because I think it's cool, is highlights the most useful comment in an issue. So it'll, you know, if there's a lot of people talking about a comment or whatever, it'll, you know,
Starting point is 00:18:00 highlight that. So, you know, just scroll around. And actually, I haven't really noticed. I've turned this on. And it just sort of stays out of the way. There's just more features.
Starting point is 00:18:10 and more, it's just a nicer experience. So, yeah, kudos to them. This is an absolutely mega. So what's notable about this is you wouldn't look at your UI and know anything is different, but there's like a hundred little changes, right? Yeah. So, yeah. Anyway, I'm always nervous to install browser extensions.
Starting point is 00:18:29 I have maybe five or six that I really love from places I trust. But go to the top. See how many stars this has? 30,000. Yeah, you know, at that level, I think it's all right. it's probably totally trustworthy, right? So let's, yeah, you know, I think it's good. I think it's good.
Starting point is 00:18:44 I think it's good. I would probably install it. I had to look and see if it'll inspire me. But yeah, I don't know. I'll play with it for a while. I'll see if my entire computer blows up. But yeah, if you, if your computer gets. It's also up and around for a while.
Starting point is 00:18:59 It looks like nine years or so. Wow. Really? No kidding. Well, at least in the front top, there's the, the editor config is nine years ago. So at least there's some commits from nine years ago. Yeah, yeah, exactly. I would imagine it is. Yeah, that's very, very wild. Awesome. Okay. Let's move on to talk about databases, and in particular, Postgres. So this project I want to talk about, I want to feature I ran across,
Starting point is 00:19:24 and I think it's been around for a little while, but it's called PG Dog. Okay. And what it is, is it's a performance enhancing layer for Postgres. So if you're using, you know, maybe you're using MySQL, not MySQL, using SQL Lite in dev, but, But then in production, you're using Postgres, right? Something like that. And it's starting to outgrow its performance. So either it needs better uptime, the database is getting too large or something like that.
Starting point is 00:19:51 Postgres doesn't have certain features like connection pooling and other stuff that could be, better high performance. So you don't have to reconnect as much. This thing handles a whole bunch of those. So we go down here to their repo. It, by the way, has 4,000 stars. And its age is a year, two years.
Starting point is 00:20:09 years it looks like last year is all it's probably its most recent things so there's been other projects like this as well for example pg bouncer is a friend a colleague pitt of software i guess another thing that does the same thing so what this is is it's a proxy for scaling postgres and it does connection pooling load balancing for queries and it does sharding of databases which sounds bad but is actually a potentially good thing so you just create a tumel file to set it up and then off it goes i got a bunch of notes here for all these little that kind of spread around. So for starters, it's a load balancer across Postgres. So you can run Postgres in a replica network configuration. So I can have a Postgres database, but then I can
Starting point is 00:20:52 have, let's say, four other Postgres databases that are all copies of that same data and they stay in sync. Okay. And from a read perspective, you could read from all five of them if they all have the same data, right? And that basically five X's your database query performance, right? Okay. just by simply sending them to different machines with exactly the same data, the same database, yeah. But the problem is the consistency, right? So it knows which one is the primary database, and it can do writing to that and make sure that it propagates to the others before it tells you that it's committed, which is kind of the magic of replicas. Because if you write to it and then immediately do another read, but it happens to have gone to this time it's round robin to a different database server. That's bad because it might not be there, right?
Starting point is 00:21:37 Like I saved the database, I queried, and it wasn't there, and I went to test why it wasn't there. Then it was there. I don't get it. What is going on with the world, right? So you want it to definitely manage that kind of stuff. It also does health checks. And if you've got this read primary replica configuration that I'm talking about, if one of them goes down, it will just take it out of its rotation. And if it's the primary one, it'll pick another primary, I believe.
Starting point is 00:22:01 So it has a single endpoint behavior, which I talked about. So you can, you know, it understands the post- WordPress structure, like the basically T-Sql. And so it updates, it knows if it sees an update or insert or create table and things like that and sends that to the primary and then leaves the other ones chilling to do their thing. Has the failover I talked about and it has sharding, which is really cool and it does a bunch of stuff to manage and keep that in sync. You can even have different sets, different clusters of database and say keep this one in sync
Starting point is 00:22:30 with that one. So for example, imagine you've got an e-commerce site and it's starting to go to slow. People do a request for, I don't know, let me, let me give you an example that probably resonates more with people, a health provider database. I don't know about yours, but whenever I go to figure out something with my next doctor appointment or something, it's like the page slowly loads in and then it spins, checking records, checking records, checking records, and like five seconds later, chunk will come in and more of, and like, what is going on with this? Why is it so slow, you know? And there's probably just some huge database with a bunch of insane joins
Starting point is 00:23:07 and weird queries and stuff just to tell me that my appointment is at 10 o'clock. So what you could do is you could say, okay, your health record ID is going to be the shard key and we're going to have 20 different servers, right, running our cloud setup. And for that, we're going to somehow determine which database it goes.
Starting point is 00:23:25 So maybe we're going to say take the hash of the health ID and use the first two letters to figure out which database it actually goes in. So like A-A through B, B, whatever, right, goes to the first database server, and the second, the third, and fourth, and so on. So when you do a query, you say, I want the thing for this user, it just goes,
Starting point is 00:23:45 okay, great, well, that means I only query that one server. Instead of trying to query the 100 million records, you query, what did I say, 25s, you query four million, which is way, way faster, right, on any given server. So that's a really cool aspect and one of its main features of the sharding capability. Okay.
Starting point is 00:24:04 So pretty neat, pretty neat. But if you're really trying to find out like health information, it might, the hash might be the problem. Stop doing hash, man. I don't know why these systems are so bad. They're so bad. Bad joke. Sorry. Yes.
Starting point is 00:24:18 Yeah. That's true. I get it. Now I get it. All right. Awesome. Over to you. Okay.
Starting point is 00:24:25 Well, I, this is partly a public service announcement. Maybe. This is, I want to cover. Simon Willison. So we know Simon Wilson's been playing with AI and agents and stuff since like, since they came out or something. And I appreciate all of Simon's work. And I've been watching here and there and just like learning from him and not having to do all the experimentation that he's doing. But he's really great at explaining it. So what I want to, he's got this sort of book like thing together that we're going to link to called agentic engineering patterns.
Starting point is 00:25:06 And this is a series of blog posts, but they're fairly concise and short and it's really good writing as well. There, and I think anybody especially, well, it might be useful for really everybody, but especially people with teams, it would be good to make sure that everybody's kind of one like good. I think there's the information here is good for everybody. So there's principles, getting started, like some intro on how agents work. Testing in QA, there's this three posts about that, which I love.
Starting point is 00:25:40 Understanding code, using it to walk through, using agents to walk through code and stuff. Even these are, didn't notice these when I was looking at this the other day. An appendix of prompts I used, that might be interesting. But also, GIF animation tool using WebAssembly and GIFSicle, annotate. That might be fun, but maybe not appropriate for everybody. But the one that I love right here is anti-patterns. So in principles, there's some anti-patterns. Well, everything.
Starting point is 00:26:14 Everything in the principles definitely go read. The writing code is cheap now. What is agentic engineering? Bored things you know, basically, not basic, but like keeping track of, like, for instance, making tools. and doing snippets, doing little tools, having those available not only for you to remember, but you can also tell an agent to say,
Starting point is 00:26:36 hey, I already kind of solve this over here in this project. So use that, but apply it to this other project here. Super cool idea. And also, these two, AI should help us produce better code. So if you're having AI produce your code, I think it should be better code than you would produce by yourself, not worse. I don't like this notion of people not reading their code at all.
Starting point is 00:27:03 And I think that's going to blow up on us. And especially if you're working in teams, a bunch of anti-patterns to watch out for. And the top one is about inflicting unreviewed code on your collaborators. This anti-pattern is common and deeply frustrating, both in open source, and I'm dealing with it at work myself. Don't file pull requests with code you haven't reviewed yourself. I'm tired of reviewing reams and reams and reams of code that I know that nobody actually read that. And why so why do they expect me to read it?
Starting point is 00:27:36 So anyway, great resource here. I love the cheat sheet on red green refactor is pretty great also. And I promise to highlight that since the testing is kind of my thing, right? And this is the all he has tested it and the phrase, use red slash green. So use red green TDD is a pleasingly succinct way to get better results out of your coding agent. So you can tell your coding agent to do this and it will know to write a test first and make changes until it's green. What's interesting is normally we think of TDD as red green refactor. The refactor part, that's when you need to get involved.
Starting point is 00:28:16 So you can have the agent do the red green part, which is come up with a test that describes what you want to do, write code until you have code that does that. Now you go and review that code and you can talk to the agent. You don't have to must necessarily change it. You can talk to the agent and go, this part of the code is weird. Can we change it to a different pattern? Or is there some way to clean this up?
Starting point is 00:28:37 And I've had really good results with that actually to just say, kind of good, but this part, why did we do that? And it's surprising to me to have the agent come back and say, oh, yeah, that's weird and change it to what I would expect. Why didn't it just do that in the first place? But it doesn't. So, and maybe it will in the future. And the future might be next week.
Starting point is 00:28:59 Who knows? But for right now, these are great engineering pattern. Great things to watch. So thanks, Simon. And I trust him to keep these up to date. So anyway. Yeah, this looks super interesting. I definitely want to check it out.
Starting point is 00:29:11 I've already spread this around to work. And especially the people that have sent me code reviews that I'm like, you didn't read this. I know you didn't. So I think that's part of the pushback as well as like people are lazy or they don't know what they're doing and they just here's 2,000 lines of code that fixes what I was asking for. You're like, no, go away. Yeah. Where if they spend some time, you're like, actually, can you narrow this down to a 10 line change?
Starting point is 00:29:37 This is all I want. Please don't go do other things. Like just help me understand this and why this needs to change. And then I think we're still learning how all this stuff works. And there are engineering practices, but it's so the stakes are so low for getting started. You know, normally you're like, okay, we're going to set up our build tool chain, and then we're going to learn the language and the syntax and the structures and the keywords. And now it's just like, oh, just use regular English to just tell it stuff and it'll probably figure it out, right? That gives the sense that I don't need to learn this as a skill, but you do.
Starting point is 00:30:06 Yeah, I also think that we're getting a lot of advice about how to utilize agents from startups. And startups have a different field. Startups are greenfield for the most part. They're writing new code. Whereas a lot of software jobs are maintaining existing code bases that have been around for decades, possibly, or at least years. And you can't just not care what goes into it. You've been handed this thing that is making your company money. You can't make it worse just because the agent decided to rewrite everything.
Starting point is 00:30:42 Yeah. Yeah. Anyway. For sure. Well, do we have any extras? I got a few extras. Why don't you go first? Okay, a couple ones.
Starting point is 00:30:51 The first one comes from John Hagan. Thanks for mention this because I almost made this a top level story, but there's not much to say about it other than this is awesome. Upgrading Python versions with UV. So if you do, we know, know that to get all these new features, any new features from UV, you have to say UV self-update. I think, is that right? I think self-update.
Starting point is 00:31:14 Yeah, UV self-update. Yep. But after you've done that, now you can say UV Python upgrade. You can give it a specific one. So like for instance, if you say UV Python upgrade 312, it updates 312 to the latest version, the latest dot release, which is cool. But if you leave that off, which that's what I do, it just looks at all of the, all of the Python versions that you have installed on your computer through UV and updates them
Starting point is 00:31:45 all to the to the most recent like bug fix release um and what like why not we should be doing that all the time i'm going to set this up as a cron job or something i don't know yeah so that's cool and uh yeah so thanks uv making things either easier once again awesome awesome job i've already incorporated it into my little updator scripts that i run periodically uh next is also uh something that's suggested by a reader and i understand new york times magazine is um is behind a paywall uh but um but But for some reason, I was able to read this fine. I don't know. I do have a New York Times newspaper subscription.
Starting point is 00:32:22 So maybe that's it. Anyway, coding after coders, the end of computer programming as we know it. This is a description of basically talking about whether or not, like, it's not just whether or not like AI is the end of coding jobs. I don't, you know, we don't think it is. The conclusion here is it's not, but it's also more about that. It's more than that. It's talking about basically kind of some different, different lives, like different changes.
Starting point is 00:32:48 And it also talks about, I believe it talked about the different differences between percent of improved percent of efficiency improvement of Greenfield versus legacy code, whereas like a lot of startups say there are 100 times faster. But Amazon has said it's on average 10 percent faster, but that's not nothing to get, you should still get excited about 10 percent faster, but don't expect, I mean, people maintaining your old code to be 100 times faster. The reason why it was passed to me was because there's this great line. If I get, see if I can find it.
Starting point is 00:33:22 Pytist. Pushing code that fails Pytist is unacceptable and embarrassing. Apparently, this is like an instruction that somebody has in their markdown files to instruct Cod to always run the PITES and be embarrassed if they don't. I love it. This is good. But anyway, those are my extras. I actually think this is a well-written article for somebody that doesn't understand.
Starting point is 00:33:51 Apparently, the author has been covering the tech world for a while. So nice. And also, Pytas got into the New York Times. Yeah, that's pretty cool. Anyway, what you got extra for us? All right. Well, I've got a few. Let's start with TalkBython Training.
Starting point is 00:34:08 Byrd, per request from one of the users, they said, hey, I would be really great if I could, When I log into my account, have more information. So I updated the people who have accounts there. If you go log into your account, it will show you all the courses you are actively learning. I have 48 of them. I haven't finished a bunch of them. People might be like, Michael, you have courses on the website. You haven't finished.
Starting point is 00:34:30 By the time it gets to the website, I've watched the videos two to three times. I don't have to watch them a fourth time in sequence and have the system record me watching them. So no, they're not all done. But it'll show you things like the ones you're working on, how far are you through, and when did you last watch it? And when did you start? Apparently, this is things like if you're submitting this as a training evidence for your employer, knowing when you started, when you finished, and so on, and whatever, how far you are. And there's also a whole bunch. It shows you completed ones.
Starting point is 00:34:57 I'm going to be generating certificates for people. It's easy enough for me to make PDF downloads, but I want to make stuff that you could say post to your LinkedIn profile as an accomplishment. you know, like a, I've done the fast API course at Talk Python as part of your, like, LinkedIn record in other places. You can put those kinds of things. So it's not as simple as just a PDF. But hopefully stuff like that comes. Anyway, this was fun to build. I think it looks really neat.
Starting point is 00:35:21 I think especially if somebody is buying the bundle, they have access to a ton of courses, and they might not remember, like, what course was I taking last month? Yeah, exactly. You didn't even buy it, but you took it, then you forgot which one you're doing. This totally solves that problem. Yeah. Exactly. Yeah. That's what the request was like.
Starting point is 00:35:38 I know I took a course. I don't remember which one I was working on at which order. You know, help me get back to that. And then of course, when you're in a course, it has a resume button. So you just click that to presume where you left off. But it doesn't have a cross-cutting resume. You know what I mean?
Starting point is 00:35:51 I do like that how you split it up so that if, like you said, if I took the whole course and there's something I want to go back and review, I can just look through and go and watch those videos. It's labeled well. Yeah, thanks a bunch. All right. I talked about using latency.
Starting point is 00:36:07 to increase security for supply chain stuff, right? Like, hey, if I do a PIP, a UV PIP update or upgrade sort of thing or similarly with sync and ad and so on, just doing like an exclude newer then, or whatever, give it seven days or a week or however you do it. There's this article by Andrew Nespin that says package managers need to chill. And right at the top, we have this post requested by Seth Larson, the security guy at the PSF. So yeah, anyway. It talks about all the different how you make your dependency manager who will chill like UV as an exclude newer, which I've been using. And it's mostly awesome, except for when there's a vulnerability that appears in one and you get a notification that you've got to fix it. But it just came out to fix.
Starting point is 00:36:52 So you don't want to exclude it. But in general, it's, I think, a better thing than not. What? Like, remind me why? Why would you want to exclude newer stuff? Because for popular packages, if somebody uploads a virus in the, inside the package, like they take over the build chain or they fish the person who create it, like the Sake Sharedet, they fish Dan, they get access to his GitHub and they
Starting point is 00:37:16 install a subtle thing that downloads some root kit or whatever info stealer to your account. That usually gets found within the first couple days. And if you're always just going update, update, update, give me the latest, give me the latest, you know, the chances that you hit that are pretty high, right? Because they won't get found in the first hour. if it's found in the first hour, will people be able to react and communicate within the first hour to deal with it, you know? But if you just say, give it a week, like probably most of the popular ones, if there was something wrong, it would have been found out by then. But okay, what if it
Starting point is 00:37:49 was, fat got found out and got fixed and the weak boundary is there and I like upload the week old one that has the bug? Or do they remove it completely from the... If there was a virus, they removed it from Pi Pi. Okay. It's not even there, even if somebody picks an old one. I knew that. I was just sort of playing along. Yeah, yeah. Yeah, yeah. Exactly. Yeah. Okay. So basically, just more people singing the same message, but this is a nice cross technology. Are you in dot net? Are you in Ruby? Are you in JavaScript? Here's how you make it chill. Okay. So back to AI real quick. Paul Everett and I did this video debate, although it was not not that much of a debate, but it was more of a conversation, but kind of debate format about will AI kill open source?
Starting point is 00:38:33 not the licensing part of it, but just will it make open source unnecessary? Will it just stop using open source and so on? We don't think so, but we had a really nice chat and I did a little quick write-up, but mostly the write-up links to the video. So check out the video. I also did a write-up called Always Activate the VE and V, a shell script. So I talked about this before, I believe. This is not the thing.
Starting point is 00:38:55 This is the leads into the thing I want to talk about. And so as I change directories around my computer with just the terminal, it automatically finds and activates virtual environments. But there's, like, this was a thing in DIR, DIR-Inph, they said, well, we can't do this. What if somebody maliciously sends some kind of virus and, like, commits a virus called VE&V into the repo and, like, it runs the activate script. What if that activate script is malicious, you know, that kind of thing? So, with some nice feedback from Scott H., I made a much more secure version that white lists,
Starting point is 00:39:32 And if you're it's not white listed, it says, hey, do you really trust this thing or do you not? Because you might just open up a folder and go, oh my gosh, there was a virtual environment somewhere and it activated it and ran some thing that I didn't know was gonna happen, right? All that I think is super polished and really nice, and I'm loving it.
Starting point is 00:39:47 So here's the news. Virage Kenwanda, or Kenwant, said, I wrote the antidote for ZSH plugin management, and da-da-da-da-da-da. I ran across Michael's Secure Aware Virtual Environment Activator script, which was pretty awesome. So this is now a Z-shell plugin.
Starting point is 00:40:05 Oh, my Z-S-S-S-H-Safe V-N-V-A-O is what it's called, which I thought it's pretty awesome. That's pretty cool. Yeah. All right, that's it for my extras. Cool. We each got a joke, right? Yeah, I took mine down, though.
Starting point is 00:40:19 So I'm going to have to rely on you to bring it into your thing. All right, I'll find it. No worries. So this one is so good, and it follows this AI theme that we've been going. Remember the Stack Overflow keyboard? And it's exactly the same vibe as the Stack Overflow keyboard. The Stack Overflow keyboard was like the coders keyboard. And it's had a control and a C and a V for the joke of just copy and pasting from Stack Overflow.
Starting point is 00:40:44 Yeah. Well, if you've done anything with Claude Code, it often asks permissions to make changes. And it says, do you want to allow this once, allow this always, or do you want to reject this change? And so it's the super fancy Apple-looking keyboard that just says, allow once, always allow, or reject. So this is funny on its own. You all have to check out the picture. I put it in the show notes. It may or may not show up in your podcast player.
Starting point is 00:41:07 I don't know. Maybe I can, I'll just make it the poster art. But also, there's two too many buttons. I think you just all need allow always. I know. Well, let's review the comments because, oh my gosh, they're so good. There's 223 comments. Yeah, exactly.
Starting point is 00:41:25 Issue says, waste of two buttons. A truly productive agent should only have allow all. that it got somebody this is like the the remember the joke last week that was like so you're new to sarcasm the person that looks like an AI generated image to me yeah exactly it didn't obviously and there's the secret button dangerously skip permissions somebody added it to their stream deck for real that it actually allows it yeah and that says too too many buttons but if we go down oh my gosh there's um there's there's there's this is the one The actual one, Brian, there's a used version.
Starting point is 00:42:04 It says update after day one. It shows the same picture, but the allow always is like cracked and smashed and just like, it's just been hit like brutally. Just yes, yes. Oh, this is really good. And so someone says you got to be safe. And they create a little like Rube Goldberg machine that just like automates hitting allow once but forever. Yeah, just a little bobber thing that just hits it all the time. That's funny.
Starting point is 00:42:30 Yeah, Devin says, no, no, no. We need the, you know, Claude code. We'll say, I got to ask you a few questions. Here's three options. Do you want one, two, or three, or sometimes there's four, or you got to choose others. So there's one that has like a second row that says, one, two, three, four, other.
Starting point is 00:42:45 I would actually use that. I know, it's so good. Another one has like the three, the Lao ones, allow always rejected it has a microphone button to dictate to it. These are so good. You got to look at the comments. That's funny.
Starting point is 00:42:59 Anyway. That's good. Yeah, that's my joke. But my favorite one of it is where it's like crushed with like after day one. Yeah. So, well, let me try to get mine up. Let's see. I can pull up for you if you don't have me.
Starting point is 00:43:13 Okay. Yeah, just go ahead and pull that link up or something or the picture. So this was submitted with something else submitted by Paul Cutler. Has some news about AI too. Are you getting it? You want me too? I get it. It's just slow.
Starting point is 00:43:27 Okay. It's just slow loading. There we go. So set this. over on mastodon, Paul Cutler. Today it was mandated at work that we install Claude code because as they said it has built-in PowerPoint creation capabilities. What a reason. FML. Yeah. Because you know what's coming next, hour long meetings with lots of powerpoint. You know, I thought this was super funny at first, but also like it drives me kind of nuts with it because,
Starting point is 00:43:57 because, you know, I'm a coder. So if I have to write a PowerPoint presentation, it's unusual. So this probably is a good idea that it could save some time and not waste time on creating PowerPoint. So yeah, no kidding. Well, it's actually, it was a, it's a pretty neat integration.
Starting point is 00:44:12 It's not just that it knows how to do PowerPoint, but if you open up the Claude desktop app, the same one that does co-work, it has like a little, what's new button? And you click it and it says, install in the PowerPoint. And it actually adds like a clodd section inside of your PowerPoint presentation. Why?
Starting point is 00:44:31 So you could like highlight a picture and say, could we get a different picture for this? Or highlight the text? And could you animate this in from the left? Oh, like not while you're presenting, though. No, it's during the building time. Okay, yeah, that makes more sense. You know, you got like format picture, animation tab,
Starting point is 00:44:47 and then you've got Claude now. Yeah. It's actually pretty, as opposed to just read my PowerPoint file and do this, you know. Sorry, Paul. I want it to be during the presentation. So when you're presenting, you can go, Hey, Claude, does anybody in the audience stopped at Starbucks before they got here?
Starting point is 00:45:03 Or something like that. I forgot what I'm talking about, Claude. Please tell the audience what this means. Yeah, anyway. Yeah. Awesome. Well, fun talking with you as always. And I don't know if we need to change the name of the podcast to like Claude Bites.
Starting point is 00:45:20 Or probably not that. I don't think so. But I mean, honestly, it's, it's, good point but as a meta comment for the audience out there it's really challenging to cover this stuff because so much of the energy in software development and tech in general is in AI yeah but we obviously realize that there's plenty of stuff that's not really AI at all at the same time it's transforming the industry like basically like the web when the web came around and it's like well now we have the web but we don't talk about it because it's you know I don't
Starting point is 00:45:52 know it's it's tough it's a balance also I just am aware that there's people that care about Python but also they have to care about this right now whether they want to or not so it's something I'm willing to cover as well so yeah yeah it's just and it's wild and may we live in interesting times by later Brian

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.