Python Bytes - #475 Haunted warehouses

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 475, recorded March 30th, 2026. And I'm Brian Ockin. And I'm Michael Kennedy. And this episode, as is regular lately, is brought to you by us. All of the stuff, the books, courses, head on over to Python bytes. Wait, yeah, PythonBytes. We have links to everything.

Starting point is 00:00:24 But there's also talk python.com. That's right. Talkcom will get you there. they'll just redirect to .fm. It's all good. Okay. Talk Python. Dot Python.

Starting point is 00:00:33 Right. Okay. And Talk Python training, of course. I've watched and done so many courses on there. It's a great resource. And if you'd like to learn Pytest, there's a course there. But there's also PythonTest.com. And thank you to our Patreon supporters, as usual.

Starting point is 00:00:50 And also, thanks to everybody to subscribe to the newsletter because it's fun to put together. And we got a lot of background information. So we like to send out all of the links. to everything we talk about on there. And you can reach us to send us topics that you'd like us to talk about or topics you'd like us to stop talking about, whatever. You can, the contact stuff is on Pythonbytes.fm, but we're on Mastodon and Blue Sky and, yeah,

Starting point is 00:01:19 and there's also a contact form there that you can get. And if you're listening to this, thank you. And also, if you'd like to watch the show live or at least watch it, The recording later, you can go on to Pythonbytes.fm slash live and either be part of the audience or like a ghost. Like a ghost. Let's lock the ghost. How about that? So there's this interesting article at c-ert. .at I'm guessing that is the way. And this one is super relevant to us. This is a security place, security website. Lock the ghost. In the software world, remove is not always equal to gone.

Starting point is 00:01:57 completely gone. This is crystal clear. There's always a good reason for that, but even the best reasons do not, does not have to be intuitive or expected by the users. Let's take a short trip through how Python package index handles removals and how we can lock the ghost in a UV lock file forever, forever. So this is a security thing and it's specifically uniquely an issue for UV and the UV lock file in particular. So if you're using UV like I do with, like UV PIP compile, UV, and then requirements.TXT, that kind of thing, doesn't apply. UV.com. We're both huge fans of UV, but this, and one of the reasons we are fans is because of the performance, right? It's so fast and it bundles so many tools together. Some of these are making really interesting tradeoffs. Often, those tradeoffs are certainly fine, you know, like a short caching period.

Starting point is 00:02:52 So if you ask it to install something and it did it 10 seconds ago, it's not going to go and ask the APIs for it again and that sort of thing. Or UV Python install, which is awesome. It gets you Python in a couple of seconds instead of forever with a bunch of buttons. You know, next, next, next, confirm, next, yes. You know, like that installer experience. So those are all good, but I guess this is a bit of a negative consequence of having some of these optimizations. So I pulled out some, I'll read my notes here. So the essence is in the UV lock file, it points directly to the final file on the CDN, I'm guessing, or even the storage, but, you know, even if you remove something from the storage, it doesn't necessarily remove it from the CDN Fastly and so on, right? So however it is,

Starting point is 00:03:37 it points to the very final file. In when something is yanked or removed from Pi Pi Pi, it goes out of the listing. You can't find it. You ask Pip to install it. It's not there. But the underlying file is still hanging around. So if you have a direct URL to the result file, instead of following the redracks or whatever, that file doesn't necessarily get removed. That's what that opening was about, right? So that's basically the problem. If the file is still there, the file is still there, even if it gets yanked, right?

Starting point is 00:04:05 So there's a couple of interesting knock-on effects. So UV-Lock uniquely preserves this, these ghost packages they call them in this file. So instead of removing them, they just link directly to them as an optimization, I presume. However, no other thing like Hatch or PDM or whatever links to them, right? So they don't do that.

Starting point is 00:04:25 This is specifically about UV. So it creates an interesting supply chain problem. I mean, that's just like the security problem to jour or the year, whatever year in fritches. The problem, all these things are getting some level of takeover and then, you know, that's flowing into packages and other libraries that are built into code and then obviously that amplifies them massively.

Starting point is 00:04:50 So in this case, an attacker could upload a malicious package and then immediately remove it, but still have the UV lock file point at it. Okay? Yeah. So if you immediately remove it, you might outrun the scanners. The automated scanners, go, let me scan the new inbound Pi Pi packages, because that package doesn't exist anymore.

Starting point is 00:05:12 We don't need to scan it. But you could craft a specific UV lock file that still points to the ghosted remnant. You know what I mean? Yeah, but aren't the lock files on the client side? So it would be just people that had created a lot file. I created the client lock files during the... Yes, that seems possible, but imagine this. I create MaltinClaw or whatever, like the world's third most popular GitHub project out there,

Starting point is 00:05:37 put it up, get it working normally, and then after it gets really popular, I update a lock file, not even the input, not the PipeProject.com or nothing. I just link, I just update the lock file itself to point at this ghosted malicious file. So anybody who installs it, well, they UV sync, that installs everything in the lock file, and off it goes. So it's not that you ran and installed the thing. It's that somebody could craft a lock file such that if you sync that project, then it's installed on chair machine and off to its regular badness, you know, with its setup. Dot Pi or whatever. So beware, folks.

Starting point is 00:06:17 Beware. I'm not sure exactly what the solution here is, but it's something that could happen and maybe the astral team. I'm sure the astral team has already heard about this. This was from last week. Okay. Interesting. Well, we'll wait to hear back. Yeah, I haven't heard anything.

Starting point is 00:06:32 I mean, I guess if I go to the end, there's not like an update. How should I live? This is a funny. How should I live? To someone up, I presented that removed packages could still be. I don't know. Yeah, well, I mean, there's a lot. It's security is a big thing.

Starting point is 00:06:51 Anyway. Yeah, and supply chain security is extra. because it's not even necessarily the things that you're using. It could be the things that you're using what they're using, you know? Right? And something could change there. Like I'm not checking on, I don't know, CARDET, for example. Just pick something out of thin air because I'm not using it directly. I'm not tracking its releases.

Starting point is 00:07:08 I happen to maybe be using something that uses CARDET that then, you know, if something happened to that package, I'm not saying it has, right? Just like thinking of like really popular third party, third level dependencies. Yeah. And yeah, there's, there's, anyway, we'll get into, there's, we've got more security topics coming up. So let's, let's, we're not going to run out, are we? No.

Starting point is 00:07:29 So the next step, I want to talk about a little bit more security, but this is how to, how to rein in your AI a little bit. So this really, what am I going to talk about? This is suggested by Martin Hecker. I think it's Hecker. It's German name, H-A-C-K-E-R. Anyway, thanks, Martin. Anyway, for context, this seems so long ago, June of 2025, it was less than a year ago.

Starting point is 00:08:01 Simon Wilson wrote a blog post about the trifect of AI agents of lethal, the lethal trifecta, which is giving them access to private data, exposure to untrusted content, and ability to externally communicate. That's pretty much what coding agents are like now, especially if you run it in YOLO mode or dangerous mode. because and it seems like people wouldn't do that right but it's so much faster uh so you to if you don't if you have uh your agents on like ask mode it just like hey can i run this command yes how can i run this other command yes um and so you can say uh just stop asking right now i trust you but should you i don't know so if you've got private data on your on your device um uh so there's there's something to be concerned about so that one of the one of these

Starting point is 00:08:51 solutions is sandboxing and you can or one of the solutions is create a VM and just don't put the stuff on the VM that only you only want the A to use that's a lot that's a little that's a extra that's a little extra and it's uh for people that are normally using um vms might be fine or or um either virtual machines or uh those other things containers right if they're normally using containers Great. But if that's not your normal workflow, it's a little, it's a tough ask. So Cloud code has sandboxing. I haven't tried it out to see how clear it is. It's a little, it apparently works great on MacOS Linux and WSL2 uses Bubbleware app. So if you're using WLSL2 for Cloud cursor, or that might, or Cloud code, that might be okay. But what about other agents and stuff?

Starting point is 00:09:48 So what we got a suggestion was that CloudCode has this built in. I'm not sure how well if it's really restricted or if it's suggestions. Anyway, I haven't tried it out. So I'd love to hear what other people think about the sandboxing stuff. Anyway, the same kind of idea that CloudCode uses is pulled out as something else you can use with different AI agents if you want. So this is a project called Fence. It's lightweight sandboxes for terminal agents, and it uses this similar sort of stuff that CloudCode does.

Starting point is 00:10:24 And this is pretty exciting to be able to, like, restrict what it has access to, like file permissions. You can restrict how much your file system it has access to. You can restrict the network access, which websites and stuff it can access. And even GitHub repos, restrict what's repos. That's all cool.

Starting point is 00:10:49 And it's also really cool that this is open source. So this is Go code, but it's a fence project that people can contribute to. And it's very active right now. So I'd be excited to hear what other people think of fencing if you think it's safe enough. Anyway, I'm definitely going to try it out because I was actually considering buying an extra computer so that I could run it isolate. I mean, I know that a container is way cheaper than an extra computer, but also an extra computer's not that much either. So anyway, what do you all think about this?

Starting point is 00:11:25 What do you think, Michael? Yeah, it's interesting. I mean, a Mac Mini is very cheap, right? If you're 400 bucks or something like that. It's a pretty cheap computer if you want to have a separate machine. But also, a VM potentially would work if you wanted to have some isolation. I think this is a neat idea. I like that it's open source.

Starting point is 00:11:42 The one thing I don't like, and I don't know, that there's necessarily a great Bix for that, just given the way that it works, is it seems like you can have it work on any terminal command, right? So like Claudecode or Codex, CLI or Gemini, CLI, whatever. But say, VS code, purser, I charm, you want to run one of those, but have the agents that run in those more proper editors limited, that seems harder, you know? It doesn't seem like it supports up. Yeah. So, interesting. That's the way I like to work. Honestly, This might be a minority opinion. I think Claude Code and friends, the way that they work, are an anti-pattern for how real software

Starting point is 00:12:25 developers should be coding. And what I mean by that is Claude Code and other CLI ones encourage you to just have the code just like rip by, like, do this. And it's just like, you see the code screaming by. And it's like, okay, I'm done. And then your job is just like, accept that or whatever. Or you wait 10 minutes for it to do a thing. I was doing a project a few days ago.

Starting point is 00:12:44 Claude code spun up five agents that all ran for 15 minutes in parallel. And then it gave me the result. So that's a lot of code changes. And that's a lot of my credits in addition to just time to wait 15 minutes and see how it came out. So what I much prefer is to have some kind of editor, VS code, Python, whatever, where the work is happening. And as it's making changes, I can roll up, okay, made this change. Let me look. Actually, it's going down the wrong path.

Starting point is 00:13:13 Hey, stop, stop, stop. No, look, you did this wrong. Go that way. You know, you're not following the patterns of this. So with the just like streaming by like a social media feed, it encourages you not to review it while it's working. And I think that that is not right. I know the trend is to like not review code at all, but there's the trend is also to get a bunch of like unstable software. So thank you.

Starting point is 00:13:33 Anyway, I don't like the CLI ones because of that. Therefore, I probably won't be using this, but I would like to. That's my take. Yeah. It's interesting because like, this is similar to. you know, hiring somebody to do work for you or, um, or having a, an intern or a new hire or something, um, that you don't quite trust yet. Um, of, of saying, hey, I want you to do this, this job, but I'd like you to, um, like, you know, work for like four hours at most and then

Starting point is 00:14:02 check in. Um, right, right. Like, work on it this morning and then check in with me after lunch. Something like that, right? Yeah. Yeah. So with, you wouldn't want like four hours of, of, uh, cursor, cloud code to run, but you might go, you know, use this many tokens or something and then check in to make sure that you're in the right track. Yeah. Also, testing helps. Testing absolutely helps. It does.

Starting point is 00:14:27 But the problem is sometimes the agents are like, that test doesn't seem relevant. It was also hard to make it fixed. So we took it out. You know, that's happened to me. And if you got enough, enough tests, it's like, oh, there's some 1,100-something number of tests. You don't notice that the one that you really needed is gone. Yeah. Yeah, we're getting out of tangent, but I was listening to a podcast this morning or

Starting point is 00:14:50 interview with somebody that had used a like clause, which I haven't done any clause yet or anything, but having a thing that controls lots of agents to do things like control his house with these pool temperature and at lights and everything. And I'm like, if I want my lights on in my room, I turn the light switch on. It's, I haven't coded anything. In theory, I want a smart home and practice. I'm like, boy, that's not really that helpful. Buttons are really easy, though.

Starting point is 00:15:20 Okay. Well, let's go on to the next thing. What do you got? Indeed, let's go on to the next thing. And this one is, this one is called malicious. And it has to do with, it's also an AI one. So I know some people are overwhelmed or uninterested in AI stuff, but I don't think this is the AI in the sense.

Starting point is 00:15:41 that you're thinking about. This is crazy. So this is an open source copyright concept. And it doesn't necessarily have to do with AI. It just happens to be that AI is the workhorse of it. So check this out. And I don't know if this is a real project that people are making real money. You can. There's like real pricing here. So what is the idea? The idea is, so I don't know if this is a real project because it could be put out here to cause such a backlash that it causes a lawsuit. That's what I'm saying. But there is real pricing. So here's the thing. Remember how we had that, there was like this big debate, just I think last week about Chardette, right?

Starting point is 00:16:17 Yeah. That the current maintainer who is not the original copyright holder had AI recreate one, create the library based like one, generate the description and the specifications. And then another one that has never seen any of the code, take that and then turn that into the new project 70 and then change the license because this new bit of code is no longer the same thing. Right. Basically, this is that as a service. Interesting. Yeah. So it calls a clean room as a service. Finally, liberation from open source obligations. It's pretty shady, you guys. This is bad news. Our proprietary AI robots independently recreate open source projects from scratch. The result legally distinct code with corporate friendly licensing. No attribution, no copy left, no problem. And there's pricing for this. I know. It's really crazy.

Starting point is 00:17:11 pricing is transparent paper kilobyte pricing. So it's focused on JavaScript at the moment. Every package is priced by its unpacked size on NPM. How about that? So for example, left pad, left pad, if you wanted a copyright, not copy left, left, left pad, it would cost 50 cents. If you want to express the Node.js powered web framework, 73 cents.

Starting point is 00:17:38 You want Moment? I don't know what moment is. Apparently it's pretty big. It costs $42. What do you think about this, Brian? This is nuts, huh? It's just real. I mean, like, it could be. That's like I said, I don't know if this is real or not, but I think it is a real copyright conversation and it is a- Malice. I know. M-A-L-U-S. Yeah, I think we need to create a competing one that's called spite. Spite and Malice. Anyway, amazing. Liberate open sources. the H2 like how how nuts so is this like I said I think it could be something that's just trying to get

Starting point is 00:18:17 attention to this problem and like cause some kind of final legal decision to come down about it or it could be something people are just paying money well yeah we'll take it yeah I honestly don't know you know what what what's creepy is like a decent like an evil but decent business model might be to do something like this and just keep track of all the companies that have paid you to steal from open source. And then, you know. And then like, you know, sue them or like, you know, or like, you know, anyway.

Starting point is 00:18:49 Well, I leave this here for people to ruminate about, but I do think it's pretty wild. I think it's pretty wild. Also, I guess it's good to talk about it because people are going to do this anyway, right? People are going to try to do clean room solutions and round stuff. Clean room solutions have worked.

Starting point is 00:19:08 I mean, there was Miguel de Icaza. I don't know how that. I'm not sure how to spell it. The guy created mono, which was the open, open source version of dot net when dot net was, or yeah, of dot net and C sharp when it was still completely commercial. And just made sure that whoever they hired to work on it had never looked at the source code or worked, you know, and they rebuilt it. Ultimately, the outcome was that Microsoft bought them because they,

Starting point is 00:19:36 they thought that open source was better later instead of a virus or whatever they called it at the time. So, I mean, that's a historical precedent for this clean room concept. But if you just, the difference is that took multiple people six months to a year. Whereas this is like an afternoon. You know what I mean? If you turn Claudeco loose on it. Such is the world right now. Yeah.

Starting point is 00:19:56 Yeah, such is the world right now. But anyway, I honestly don't know how I feel about this. I mean, it seems like a really crappy thing to do. At the same time, it seems like you should be able to, you know, in the Google, Google versus, I think, Oracle case, the case about Java, and I think it was Java and Android, the Supreme Court, wherever the highest court it went to, ruled that APIs, the signature of the APIs are not copyrightable, right? So that's part of the precedence, but this is the internals. But if you take something and scrape out, these are all the APIs, and here's the description of what it does, you know, and you feed that to an AI. That's pretty close to doing what Google did, but they had a team of hundreds of people or something. You know what I mean? Like I said, I don't know how to feel about this.

Starting point is 00:20:40 I'm just going to put this out there for people's awareness and move on to your next topic, Brian. Well, I want to talk about, just change it up a little bit and talk about security. So this one comes from us from Manteas. Shogel, I think. Anyway, thanks, Matthias. I sent it in through email.

Starting point is 00:21:03 email, which yeah, we've very easy to find email. So the article, this is kind of fun because in the email, he said, you know what? I've been, I wanted to suggest this, but also this topic, but also I'm trying to get better about writing blog posts. And I appreciate that because we, we like blog posts. I like to read blogs. So there's, he's got an article called Harden your GitHub Action Workflows with Zismore, dependency pinning and dependency pinning and dependents. cool down. So there's three topics. So you've got, and actually this came up because he was looking at an article like, please let me get this. Okay. Like from step security saying an AI powered

Starting point is 00:21:45 bot actively exploited GitHub actions involving Microsoft Datadog, CNF projections, lots of things. So this sort of, you have to basically making sure you get hub actions are secure also, not just your whatever thing you're building, but your actions might have a problem. So we had actually covered Zizmour, but I went and looked and see to see when it was. So it was episode 408, November 2024, we covered Zismore.

Starting point is 00:22:19 And then look at the repo. So Zizmore GitHub repo. It's Zizmore is a static analysis tool for GitHub actions. I thought it was pretty cool. So we covered it. And it's got a bunch of sponsors now. And look at the star count. Hmm, we covered it in November 2024.

Starting point is 00:22:36 And right after that, it kind of took off. That thing totally hockey sticked. How about that? Maybe it's because of us. Who knows? Probably not. But anyway, so that's pretty cool. I'm sure at least one of those stars is from us.

Starting point is 00:22:50 At least one of the stars. Yeah, like the one I put on there, maybe. Anyway, so what can you do? So there's supply chain issues. Doing static analysis of your GitHub actions, definitely something to do. And this is not, what I'd like to put out is this is not just, if it's business critical stuff, it's really anything that you're putting out on GitHub,

Starting point is 00:23:16 and especially things that you're releasing through Pi Pi Pi, because even your little like left pad thing might get exploited, whatever. You might not think about it, but somebody else could take advantage of it. So it's to lock stuff down. So we've got the static analysis. The other thing he brought up is dependency pinning. And this is related to the light LLM exploit from last week, which I don't think we covered, but hopefully everybody heard about this.

Starting point is 00:23:45 So there's one of the, and this one is creepy because apparently the, even if you pinned the dependency with version numbers, that wasn't enough because a malicious, a malicious package got over overrode the the um the the the binary with the same version number so you really should be checking the s h a key is that shah or s h a i don't know how to pronounce that but i think typically said shah but if you call you talk about the hashy now we must think people say s sh a so it could go either way right so but some of those that some of those are a little bit a little bit hard to, I mean, it's hard to deal with. It's not really hard, but it's, it's less of a, it's more of a pain than just typing out the version. So there's a, there's a tool

Starting point is 00:24:35 apparently called Renovate that helps for, helps for that, that part of it. And, you know, UV pins, like, I was going to say UV locks, but now we have a problem with UV locks on. So, whack-a-mole. It's like whack-a-mole. It's definitely whack-a-mole. So, So using things to check those shawes also. And then dependency cool downs. I think you brought this up either last week or recently to be able to say, hey, I'm going to update everything, but don't update if anything's like newer than seven days or something like.

Starting point is 00:25:13 I would like to point out that I do not do this. I do not. When I say it, I say one week. Oh, you just. That's an improper fraction right there is what that is. No, I literally have mine says one week. that says seven days, but whatever. Same idea. I think it's a very, it solves the problem that it solves the problem, because after seven days, that thing's not going to exist on the package

Starting point is 00:25:32 manifest, right? And it solves a problem here. It's a super simple thing, and it's not perfect, but it's a layer of defense. Yeah, so I don't, this is a, I don't think this is too much. So I think that I'm going to, I'm going to, I'm going to, I've got a project that I'm a little, yeah, I'm going to try this out. I'm going to try these things. And it's, my guess is it's going to take me longer to figure out what to do than to actually implement everything. So, yeah, that's how a lot of stuff is. Like, I changed, I did change one line, but it took me two days of research to figure out what the right choice of that one line was.

Starting point is 00:26:03 I mean, and let's get real. I'm just going to point an agent at this article and say, could you do all this stuff for my project? This seems like a problem. Read it, fix it. Research it, fix it. Yep, exactly. Maybe.

Starting point is 00:26:14 You can get a non-GPL version if you pay a few cents and sit it through malicious. All right. So, real-time follow-up. I just want to, I forgot to credit Paul Bauer, who sent in the thing about malicious. So thanks for that. And you mentioned LeftPad. I was curious, is there a Python Left-Pad? Yes.

Starting point is 00:26:29 In fact, there is a Python Left-Pad. Really? Yes. Inspired by the famous Left-Pad package on NPM that broke the Internet. It's a joke. I mean, but it works. You can PIP install it. It's called it a port of the infamous Left-Pad NPM package.

Starting point is 00:26:45 Interesting. Okay. Yeah. Okay. I think we're on to extras. I just said, I have one. Do you have some extras? Yeah, I'll go ahead and go first and time with my screen.

Starting point is 00:26:57 Yeah. All right. So I want to talk about a new SaaS that I released, Brian, that people have seen me, see me using, but they don't know that necessarily had anything to do with me called interview queue. So this is a Python-built platform for doing podcasts. So if people are out there, they're content creators, they're podcasters, they do interviews, whatever. Give this thing a look. The whole idea is from starting out with like Bracier

Starting point is 00:27:21 about an idea all the way until you push something out as a final bit of audio file or video or whatever. It's there to like make every step a little bit easier and guide that. So I knew I was going to talk about that this week. So last week I pressed a stopwatch start stop when I, from the time I had downloaded the audio files from our interview last week until I had shipped it with chapters with album art, all that kind of stuff.

Starting point is 00:27:43 Edited final, like raw video down, right audio downloaded to final audio. And the podcast feed, 18 seconds, 51, 18 minutes, 51 seconds. Oh, wow. So, super excited about this. Mostly, I built it for myself, but I thought, you know, I'll put in some extra effort.

Starting point is 00:27:57 Keep fine. I actually had to rewrite it three times because I'm like, yeah, this is the right UI metaphor for how this works. And I tried it on a few podcast episodes. I'm like, nope, no, it's not. This is horrible. I can't. It's just so, it's worrying.

Starting point is 00:28:09 Do it again. I think it's really nailed now. So people are doing podcasts or interviews. I know that's not most people listening, but it's a really cool Python app. It's a mega app. It's like 75,000 lines of Python or something. It does a bunch of stuff.

Starting point is 00:28:20 Okay. Nice. Yeah, thanks. Good dog fooding. Yes, dog fooding. I built for myself. One of the things that I learned as part of that is so that gives people 250 megs of free storage. Unlimited, it does free transcripts.

Starting point is 00:28:34 It does all that kind of stuff. One of the things that makes that work is you need to be able to store stuff that's not too expensive. So if you store something on S3 or something like that, Azure, blob storage probably the same price. They all seem to copy each other, except for digital which is a little bit cheaper at seven. I don't know. It's, um, it's at one cent per gigabyte per month for a regular, like S3 storage. But they just came out with this thing called Spaces, which is their S3 cold storage. So you can put something up to say, I'm not going to access it very much. And if I do access it, it costs a little tiny bit more. Like instead of,

Starting point is 00:29:14 it costs a cent per a gigabyte when you access it. so which is you know more than their their default pricing or whatever but if you don't access it it's 0.007 cents per gigabyte per month think how cheap that is that is awesome and you don't have to have like oh we have glacier which is its own storage system and then if we want to we can move it back into s3 and out of us like it's literally the same API as s3 you just use boto to talk to it but if you your access pattern is very infrequently which you know it is you record a podcast maybe you touch it once or twice. There's like a little cool trick with disk cache. So most of the time, when it's sort of in an active mode, it doesn't even go to the internet. It just works with like a local volume at Hetzner.

Starting point is 00:29:56 And then if it needs to go back, it's still pretty cheap. Isn't that cool? What, so what would you put in the cloud that you don't access very often? Um, backup files. Like so, for example, let's say you want to store the, for, let's go back to interview Q has something concrete, right? Just so it's concrete. One of the things that will do is that will generate transcripts for you. So it could take that that VTT or SRT file or whatever like a text file put it into this cold storage also put like a 30 day local cache where it works with it but after that it just you know or runs out space it throws it away so maybe it's in this little local cash for like the two days that you're editing the podcast but how often do you go back to a podcast you did last year and then pull up the

Starting point is 00:30:36 transcript segment and want to look at it most people who would use a service like this would just go like well once I've produced it and downloaded the final transcript like they don't go back and mess with it again right so it's that kind of thing. It's like when you're creating something or you're actively editing it, then you want those files there, you want that access, but then pretty soon it's going to fall into like, I just want it historically kept for me. I think there's a lot of access patterns for that. All right, back to Fire and Forget. So I talked about this last week, this fire and forget pattern and how this was pretty sketch that I thought. I still believe that to be true. I have two things

Starting point is 00:31:08 on it. One, I'm sorry, I don't remember who sent me this message. I can't, I'm sorry. I can't remember who sent me this, but thank you for sending me. They said, actually, I said, start. starting in Python 312, this has been a problem. What they said is starting in Python 312, what happened is the documentation pointed out that this was a problem, whereas previously, it was a silent, sort of unknown issues. So they think that it has been there since 3, 4,

Starting point is 00:31:32 whenever Create Task got defined, and ASync I go defined the year before ASync and away, which I think that's 3, 5. Anyway, for a long, long time that it has been there, but in 312, the documents were, documentation was updated, say, hey, this is a problem, be aware of it. So it could be that this has always been a problem and it's just that, you know, the, for people who don't know, if you just go and say, hey, I want to fire something off in the background to let it run on the event loop, asyncio.

Starting point is 00:31:58 Create task and you give it the async function, that's not enough. That is not enough to keep it from getting garbage collected potentially because the loop itself doesn't hang on to it. Okay, so that's the issue, right? They think that that's been the case forever and they just document it in 312. So thanks for pointing that. I don't know that should be true. I looked into it and didn't find a great answer. The next thing, though, is another person pointed out, Richard pointed out that Will McGugin wrote an article called the Heisenbug lurking in your async code. What does it talk about? Well, if you do create task, guess what? It could be garbage collected. It may disappear without warning

Starting point is 00:32:34 during garbage collection. Da da da da da da. And so that's all well and good. Thanks Will for writing that. So I did another post that sort of talked about that. But what's interesting is, luckily, Will, added numbers and concrete search values. So if I go here, there are, wait for it, 586,000 separate code files that have this pattern. Because people have been telling me, it's not a problem, Michael. This is some weird edge case that only you care about. Me and the 586,000 other people, right?

Starting point is 00:33:00 Look at this. The very first hit is like, boom. They're not putting it into like, hmm. So not every one of these 586,000. Actually, like this is actually a documentation line here. This one, they are holding the task. But even on the first page, which is like a very small, out of those half a million, there's five instances where they're doing the thing that you said

Starting point is 00:33:19 you're not supposed to do. So, all right. That's it for my extras. But I thought that would be a fun follow up on two accounts. Yeah. I just have one extra. And that is that GitHub is, I went to GitHub this morning and notice that on April 24th, they're going to, GitHub co-pilot is going to start recording interaction data for their AI model

Starting point is 00:33:40 training unless you opt out. So a company is actually asking before they spy on you. So that's nice. But they're going to spy on you. Yeah. Well, you can opt. Apparently you can opt out. Yes, I've already opted out.

Starting point is 00:33:52 Have you? Yeah. I was going to and I'm like, do I really care how they my GitHub interactions? And honestly, it's kind of a no-op for me or, you know, a tree falls in the forest. No one there's to hear. Like actually the tree does still fall. That's a pretty human-centric perspective of the world. But this is GitHub co-pilot interaction, not your.

Starting point is 00:34:14 repository data, right? That's what it says. On April 24th, we'll start using GitHub copilot interaction data for A model training unless you say no. I don't use GitHub copilot. So maybe they can have all my interactions or none of them, they'll be the same. When I first saw that, I thought, oh, they're going to start, they're asking for permission to use my code in my repository and my issues and stuff for training. But that doesn't sound like what it is. What are they, okay, the GitHub co-pilot interactions with. Yeah. So the ones, probably the ones, I'm responsible for. Like, when am I using GitHub Copilot? Okay. Yeah. And like if you go to the GitHub homepage, there's a Ask Copilot sort of thing and there's other, you know, there's other areas where

Starting point is 00:34:55 if you do a search, I think, some co-pilot stuff and the PR you might believe, especially if you're a paid user of co-pilot, that's a very, that's a much bigger thing. Yeah. One of the interesting things is you can ask, where'd it go? I think you can ask, you can ask an agent to like, oh, yeah, here we go. If I'm looking at an issue, you can assign a, it to an agent to have them fix it. I haven't tried this. I might try this on this one. I've already been having mine do that, but not through gopilot.

Starting point is 00:35:24 In Claude code, I just say, hey, Claude, issue 199 of this repository. I would like to work on that. Can you plan that out with me and have a conversation? It just goes, logs into GitHub, use the GH, CLA, pulls it down, understands it, and then keeps working with it. So it's not exclusive to GitHub and copilot if you have the GHCLI installed, which is very cool. Okay. Yeah, that looks more scary to me before. And now I'm like, actually, I don't care.

Starting point is 00:35:54 I don't care. Should we talk about something funny? We shall make a joke. So for an interview queue, I'll press Marcus asks. There we go. So I can't tell for sure if we did this before, but if so, it's been long enough that I think it'll be fun. Okay. All right.

Starting point is 00:36:08 So Will Smith and I robot, I think that's a good sort of future, but looking back to like now type of thing, right? So Will Smith talking to one of these robots, can an LLM write maintainable code? The robots, the robot stares back with its like mechanical eyes. Can you? Oh, snap. Oh, snap. Yeah. I mean, it's a funny joke.

Starting point is 00:36:36 I think it's a funny joke just because of the time and so on. And there's a lot of variations that you could have on it. I haven't read the comments. We have to read the comments. But there are certainly co-workers I've had in the past who I would take Claude over that coworker for working on my code together. Yeah, definitely. Yeah.

Starting point is 00:36:52 Yeah. Not saying the clock code is perfect. I just wanted to let it run loose. But I've had some people are like pretty bad, especially people taking some of my training classes. And how did you get into this? I mean, this company? I had some, I'll tell you.

Starting point is 00:37:04 I don't want people to feel like I'm making fun of people over like being too, like you're elitist. This is a person who worked at a, either a bank, let's say a bank, like something like a big enterprise company. And this was when I was teaching C-Sharp way back in the day. And we would do like an hour's worth of presentation and demos. And then it was, okay, now you guys for the next hour work on this thing that's like a derivative version of what we've been talking about. Right. And this person who has been employed at this company for six months as a software developer professionally at a bank read the instruction. So Michael, I need help. no, no problem. What's going on here? Like, well, I can't get this to work. And they had variable name equals some sentence. No quotes around it. I said, oh, you got a couple problems here. That's a string. So you need to put quotes around the string. What are you talking about? Like, I don't know what to tell you. Like, you need to put the quote character at the beginning. So like the compiler knows that this is actually a string bit, not just other keywords and stuff. Like, see the thing left of the enter. Shift, press that and put it at the beginning. And it was like a challenge to get those quotes in there.

Starting point is 00:38:08 And then it still wouldn't work. I'm like, oh, you could you have to declare the variable as a string. Like, so you have to say string, space, email equals whatever or whatever it was, right? What do you mean? I'm like, six months as a professional developer in this language. This is not like where they're starting this language. I'm like, okay, I will take clog code all day. I will take this robot thing all day over that as a coworker. So I don't think I'm being harsh to say that that's that's out of bounds of like you shouldn't be.

Starting point is 00:38:34 You should have gotten past that step after six months, eight hours a day. So lesson out there, if you know what quotes are, you might be able to get a job. Yes. You know how to make a string in a programming language. Okay. While we're on the tangent, I'll just get one more tangent. So I had an interview once somebody came in and it was a contract position. But still, I usually start with a real low ball question just to just to make a contract position.

Starting point is 00:39:07 sure. And I usually say something like, okay, I just want, you can write a function that in Python, write a function in Python that takes a user input string or takes a string and or actually, what is it? Write a function that takes two numbers and adds them and returns to the answer. This was a long, it took a while to get to the point where I could say, let's actually, let's stop. And I don't want to try to be cold. So I usually like ask about their background and whatever and fill out the hour. But it was clear that this wasn't going to work because they this first, they started out with like print statements to the standard out and using the input command to get user data.

Starting point is 00:39:53 And I'm like, no, it's a function. It just has parameters. That's it. Oops. So yeah. Anyway, lots of different background. that get into software. So yeah.

Starting point is 00:40:05 Yeah. Definitely some that I would take an agent over. So but that's funny. Can you read? Let's look at the comments real quick. Okay. John says, man, this is going to slay on LinkedIn. Oh my gosh.

Starting point is 00:40:22 Yeah. Right. Everyone acting like their line is Torval's. Yeah. So would you, LinkedIn's weird. I, every time I pick my head LinkedIn I like try to back out because I think it's all just full of bots I don't think there's any people there left so yeah well you haven't embraced your 100 day ones attitude

Starting point is 00:40:44 guess not anyway a good episode fun talking with you thanks to everybody that showed up to listen and we'll see you all next week bye everyone

Python Bytes - #475 Haunted warehouses

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.