Two's Complement - Speed of Thought

Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Radie. And this is To's Compliment, a programming podcast. Hey, Ben. Hey, Matt. Been a while. How you doing? Good.

Starting point is 00:00:23 Yeah, good. So I'd like to abuse this podcast to ask you something that I would like to have asked you if we'd have met up in person. And it strikes me that it might actually be a good podcast episode. So I can kill two stones with one. bird or whatever stone two birds with one kill i don't know one of those things i feel like that's like the fundamental basis of this podcast is that we just talk and then it's an enforced meeting between you and i uh enforced wow uh sounds awful encouraged yeah like i don't like this no but it's a there's a point on our calendar where we will sit and talk to each other even though uh yeah we don't work together

Starting point is 00:01:06 anymore, which makes this less of a thing that we just do naturally. But anyway, my question, last I spoke to you, we were talking and you had some clever ideas about how you were sort of starting to use carefully AI to do some of the things that, yeah, I guess some of the more humdrum work or, I don't know, how do you, how would you explain what you're doing? But I know that you have a clever way of using it. Yeah, well, I've gone through many. It seems like every few weeks or every month, you know, everything with AI and coding agents is changing so incredibly quickly right now. So it seems like every month or every few weeks, I have a new iteration on.

Starting point is 00:01:55 So my mental model of where you are, maybe like several iterations behind, given that we haven't really caught up in a while. I mean, you know, I don't think it's that far behind. I think you're maybe one version behind. Okay. The last, and it's just, you know, it's this whole progression. I think we've talked about it on prior podcasts. I'm not going to recap from the beginning, but the last iteration I was on was creating. I had created basically like a Docker image that I could run either locally or on remote machines or in the cloud.

Starting point is 00:02:32 out in various places. And the way that I would use it is I would, it had an embedded Git server, which, you know, for those that don't know this, a Git server is just SSH, right? Right. There's really not a thing as a Git server. There's not a process that you run. It's like, oh, I killed my, you know, I peegrepped for my Git server.

Starting point is 00:02:53 It's like, no, that's just SSH. SSH is your Git server. So it's like, what, SCP kind of thing? Like, there's only a few things it can do. there's a Git remote command that it runs, I think, right, effectively on your behalf as... Well, they recommend that you use a special shell for the Git user, but you don't have to, actually.

Starting point is 00:03:15 You can just make a bear directory on another machine and use that as a Git row, and that makes, yes, uh-huh. Not that kind of bear. The Chicago Bear Directors. The Chicago Bear Neighbor Directory. The Chicago Bear Naked Lady directories. Whoa. Not to be confused with the Territory.

Starting point is 00:03:32 I'm never any no I'm not going on the tangent there's a bridge involved we're not going to talk about that okay but anyway Git is yes than you might think it's relatively to running a problem it's real it's real simple there's no there's no long running process that needs to be managed with the Git server right but anyway there are directories and permissions and sometimes shell and SSH configuration perhaps that should be done if you want to, you know, have the, have the thing be particularly secure. So you, I just created this, this Docker container and an SSH server you can get into inside the Docker container. And on the other end of that, it has all the things it needs to be a Git server. Yeah. Yeah. And honestly, the only hard part about any of this is credential management,

Starting point is 00:04:28 which I feel like is true 80% of the time anyway, right? Like, how do you get your public a key registered, how do you get the credentials for the processes that you may want to run? But long story short, and actually there's one smaller other tangent is that there is actually an open source version of this that Aquatic released under the Aquatts organization. I was going to ask you about that. Yeah, and it's very like 60% complete at this point, right? Like I wouldn't necessarily, I mean, I would say it's great. inspiration for your own implementation, which you can probably create with Claude in about an hour.

Starting point is 00:05:08 So there's not a real reason to necessarily use it out of the box, but it is there and it does kind of do this thing, which is finally punchline. The idea is that you set up Git remotes that point to the running instances of the Zoccur container and the post-receive hook for these Git remotes basically just runs the headless form of Claude, you know, Claude-P, with a very simple prompt. And that prompt is something like, make this code better, right? And the idea is that you have enough, you know, Claude.md and skills and other things, perhaps checked into whatever repository it is that you're pushing, such that you can have like

Starting point is 00:05:53 a local working copy and you're like, yeah, this code is like close, but it kind of sucks. You know, there's some, you know, commented out tests that fail and there's some, stubbed out methods and classes and it's like how the general skeleton of sort of what I'm trying to do, but it's not really there yet. And I'm just going to commit that. And I'm going to push it to this remote that I've set up to run this or maybe a few of them. And then it's going to just make it better. And then when it's done after a few minutes, perhaps, and like each, you know, the instructions that I put in there tell it to do it in very incremental steps. So it's like, you know, for each to do comment, for example, fix it and commit it with a description in the commit message and what it

Starting point is 00:06:35 is that you did. This is committing into its own local copy of the repository. So it's no way of getting out from that, right? It's in the Docker container, right? So this is like the post-receive hook for the sort of main repo directory in the Docker container creates a working copy. And then Claude is operating on that working copy. and there can be multiple clods working on multiple working copies if you want to configure it that way.

Starting point is 00:07:03 You might use work trees and your local machine, but you're effectively just doing clones, clones all over the place for each claw. And so one thing is, like, usually they need a lot of guidance and or they will say, hey, I'd like to edit this file now and like there's nobody on the other end of that. So how do you... Well, so part of running it in Docker is that I run it with the dangerous permissions, right? And, you know, a more evolved form of this might have some like IP tables rules or other things to try to limit its internet access.

Starting point is 00:07:28 The general idea here is to create a safe container so that it can run without. Like, the whole design of this is to not interact with it. That's the whole reason I created this in the first place is because I was sick and tired of babysitting a bunch of clods. And so I'm like, no, I'm going to try to structure things and change the way that I work such that the intention is I send you crappy code and then you send me commits. And then I can review those commits and decide whether it was that I liked what you did. but at no point should I ever have to interact with you in between those two points, right? Got it. And so how, yeah, that's, so let me sort of reading that back to you.

Starting point is 00:08:06 Yeah, this is a sort of self-contained world. And the only way that it starts up is you inject from the outside world something new into it by get pushing. Mm-hmm. Mm-hmm. And then it churns away and churns away and churns away. And you don't even, you're off walking the dog, you're off making tea. Right. And then when it's finished, how do you know?

Starting point is 00:08:29 And how does that process? So usually when I'm running these things, I'm running them in a T-Mux so I can see the output of each one just for debugging purposes. I'm not interacting with them, but I'm like looking at what they're doing. If one of them misbehaves or something goes wrong, I can poke it or restart it or look at what it did or whatever. With claw dash P, that's like one shot mode, right? You can't talk back to it or anything like that.

Starting point is 00:08:53 it just keeps running and running and. Uh-huh. Yeah, that's right. I've never really used that. I mean, I've used it like to say, Claude dash P, tell me a joke, you know, that kind of thing, just to see that it works.

Starting point is 00:09:02 But I always felt like something would need to be more babysat. But you're saying, no, give it the permissions, let it run, and then just mark it at the end, right? You don't have to look at all the intermediate steps that it went through necessarily. You don't have to micromanage it, which is, of course, the exhausting and part that's frustrating, which you're trying to avoid is this whole babysitting part.

Starting point is 00:09:26 Yeah, and I've tried this with different variations, like, you know, with like the agent teams and you have like an agent, the lead of the agent team that is responsible for watching the commits go in and out and, you know, having different agents working on different things. And all of it is, it's all just like different variations on the same thing. And nothing, no that's like really any more amazing than anything else. You know, there are advantages and disadvantages maybe. Again, this is one of those things where it's like,

Starting point is 00:09:55 I'm not even doing this anymore because it's been obsoleteified. Right. Right. That's what you were doing. That's what I remember you explaining to me patiently in the pub. Yeah. That's what you were doing. And I was like, cool, that sounds great because, yeah, when I do use Claude,

Starting point is 00:10:10 it feels like a powerful refactoring tool, but I have to be with it all of the time. And, you know, then if I try and do anything more complicated with it, I have done some pretty complicated things with it. But like, you just say, hey, simplify the code or whatever, then you have to sort of watch it. And it feels like, as I say, it's exhausting. And you feel like you should be doing something else at the same time. But context switching is burning you out.

Starting point is 00:10:33 And it would be nice to just throw it over a fence and say, I'll have a look in half an hour. Without the worry, inevitably, when you say, here's a big piece of work to do, one second after tabbing away from it, it's stopped on a question saying, hey, would you like me to do this? And you come back to it. Right, right. Yeah. Yeah.

Starting point is 00:10:49 But the general model has not changed. The general model for me that I'm trying to work. towards, which, you know, happens in fits and spurts and I run into problems with it. And, you know, but the thing I'm trying to move toward is a world where I create, it's a little bit like getting the right answer on the internet by posting the wrong answer. I create a vague, partially specified broken in pieces skeleton of what it is that I want to exist. And then rather than creating like complicated prompts, I just put comments in the code because that sort of creates like the most context for Claude, right? Like instead of describing like,

Starting point is 00:11:27 oh, in this function in this file, maybe you should consider doing this. It's like no, right. To do, do this, to do do that. And then it can tell from the like surrounding function structure and maybe what's calling it and like the tests that are calling it. Like it gathers more context and it like focuses because it's all about managing the context window. Right. Like you just have to get the right information in the context window no more, no less. And so I feel like the best way to do that is to say, okay, I'm going to take a bunch of crappy things. I'm going to have one agent break up the work that it sees in those crappy things. And then each piece of crap is going to be worked down by one agent, which will create one

Starting point is 00:12:08 commit. And then all of those commits together, then come back to me as we clean up all your mess, right? Go to it. And that is the workflow that I'm trying to move towards. and eventually I kind of want to get to a point where after that process is complete, I don't even need a PR. I don't even need anyone to review it because the PR is baked into the continuous deployment process. And that's just something that because here's the thing is we've created all these amazing tools for writing more code and making more changes.

Starting point is 00:12:43 But as you know, if you have a pipeline with multiple stages and you increase, the throughput of one stage without increasing the throughput of the next stage, all you do is create a cue at the next stage, right? You don't actually increase the throughput of the overall system at all. This has been my life of like, it used to be that I would open up like say, Compiler Explorer and look at the 40 open pool requests and go, oh gosh, I really have to get through these. And this is when humans were generating these pool requests, right?

Starting point is 00:13:08 And I want to make sure that I give them the time and the thought to interact with them and say, yeah, this is great or can you change this, all that kind of stuff. But it's a lot of work and ultimately is great. I don't want to sound like I don't like that thing, right? But now when I ask an agent to do like, hey, can you fix bug 27? It goes off. Finds bug 27. Often it can if they're a certain form.

Starting point is 00:13:34 And then now instead of me working on bug 27 and effectively pre-reviewing it because it's my own code, right? And, you know, CE is not a professional product. so like, you know, I can merge straight to main if I want. I choose generally don't, but I still, you know, the review is kind of like advisory more than anything else. I can fix Bug 17 and we're done. But now it's like, no, I've asked the robot to do the bit that I enjoy doing. And then I get to stuck with the bit that I don't like so much, which is the review.

Starting point is 00:14:06 And then you're like, oh, now I have 80 open PRs because I've got some, oh, oh, dear. Yeah. Yeah. Yeah, I mean, so, yeah. But, you know, without, without human interaction at some point, how do you prevent wildly spinning out of control things? I'm sure you've seen this, you know. Oh, yeah, yeah, yeah. You know, I mean, models are getting better at this, but, you know, I actually was working with one just a few minutes ago.

Starting point is 00:14:38 And I saw it several times go, no, let me take a big step back. No, let me take a huge step back as it was kind of floundering and realizing it was floundering. and whatever magical crafting in the prompts and whatever they're doing, notices that and then tries to do what a human would do, which I think is clever. But you know, but you do need someone to say, wait a minute,

Starting point is 00:14:56 otherwise you're going to be in like some kind of like millions of paper cuts all over your code base. How do you deal with that without a human going, wait, wait a second. Something needs to look. I have the context and I can. Yeah. I mean,

Starting point is 00:15:08 so I don't think that the way to do this is to give it a large, chunk of work to do and then go and make tea and then come back in 30 minutes. That's not the model that I'm trying to move toward. The model that I'm trying to move toward is one where I'm doing enough work to get, you know, to sort of like seed the work of some number of agents. And while those agents are turning away, I'm kind of like thinking about what the next step is and maybe creating those pieces, some of those pieces. And one advantage of this model is that you can do a Git pull at any time.

Starting point is 00:15:49 And so if like the easy ones start coming in first, right? So like if you have a to-do comment that's like, you know, simplify this method, right? There's some unnecessary complexity in here. Like you'll see that one finish and then it's like, all right, that took all of 10 seconds. Get pull. Okay, cool, I'll get that and I'm going to look at that.

Starting point is 00:16:09 And so it's not like a, you know, create a big bunch of work, push off a big bunch of work, get back a big bunch of work, review a big bunch of work. It's more like, create a bunch of individual things, or maybe just a handful, push them out, and then start pulling them in and reviewing them and reading through them and maybe even iterating on them. Or I'd be like, you know, I'm going to revert this commit.

Starting point is 00:16:31 We're going to try again. Or I'm going to add another comment in that says like this still needs more work. So just as I'll get my, maybe I missed this part. you can pull from this repo and there it's just a different remote for you. So you would, you're still, as in you, Ben, are still the main repo is on GitHub or GitHub enterprise or whatever it is. And then you are just pulling from yet another source or pushing and pulling to another source, much as if it was like a vendor branch.

Starting point is 00:16:58 It's just like this is the agent. Right. Right. I see. Yeah, it's a different. It's a different, get remote. And usually I'll just name the remote for whatever the host name is that the Docker image is running on. Right.

Starting point is 00:17:08 Makes sense. Makes sense. Yeah, no, I get it. So, and then ultimately, those changes make it through into the main line through you, having pulled them and looked at them. And so you've looked at them locally rather than through GitHub. So that's kind of, that's interesting. That's interesting.

Starting point is 00:17:23 So that's different from what the model that I've been sort of trying to work towards myself in my own way, because for me, the goal, and this is very compiler explorer centric is many, many, many of the open issues in Compiler Explorer are please install Compiler X. And Compiler X is, and we have readmeats for this, people can send pull requests for it, but usually it's a multi-step process. You know, it can be up to four steps, which is a pain, but, you know, it's, and it's almost automatable.

Starting point is 00:17:58 You know, someone says, hey, I have this Clang build over here. I'm like, great, what GitHub repo is it? Oh, yeah, which branches it in? Oh, yeah, whatever. And you go and you go and pull it down. It's like, yeah, it didn't build. I said, oh, yeah, you need these extra flags because we need these things on our building.

Starting point is 00:18:12 Okay, that can be done, right? You know, there's no. But I had to try the build out, which is, you know, like a 45-minute build on a beefy machine and then discover that actually, you know, I need to change it. And I have to iterate a few times

Starting point is 00:18:23 to get the exact flags. Then having built it, it gets some, it needs to be installed, and the install usually picks up a few things, and then it has to be configured, then it has to be tested, whatever. Each of these is a different repubes. for unfortunate legacy, I don't know if they're unfortunate, for reasons, right?

Starting point is 00:18:39 You know, most people who pull the compiler explorer repository aren't actually installing three terabytes of compilers. They're about to use it on their local machine, and they don't want to have anything to do with all of the things that to do, the orchestration and infrastructure of compiler explorer, the website. So that's a separate thing. They just have the code, right?

Starting point is 00:18:58 And the code is just version that we pull in version 12 of the code into, you know, that kind of stuff. But it does mean that now there's two repos you have to manage. And then similarly, even in the infrastructural thing, it doesn't make sense to have like every docker container we've ever had to build every possible compiler ever, gets rebuilt every single time, every push happens. So they're in their own repositories and so on.

Starting point is 00:19:18 And so it all makes sense, but it's also painful and it requires orchestration and multiple steps. Anyway, popping the stack, it is nearly automatable, but you need something with half a brain to watch it and go, oh, yeah, maybe we need to change this flag. Oh, it has completed now. now I can test install it locally. Oh, it does work locally.

Starting point is 00:19:36 Okay, cool. Now I can make the pool request, all these things. And so that workflow is what I would like to manage so that I can sick the something on that and say, can you just do this thing and send me the pool request when it's done. And I can review it on my phone while I'm walking the dog and go, yes, that's fine, and merge it. Because most of the time I can see if it's okay.

Starting point is 00:19:57 And if in particular I can see that it's been through the test steps and the CI bill is given it the Clean Bill of Health. So that's what I'm working towards to try and design away that kind of thing. Yeah, it's kind of a different model, I think. Yours is more of a, I want to use this as a pair programmer or a mob programmer with me. And I'm kind of pulling and pushing with it. Is that a fair observation or would you say it's different than that? You know, I don't know that it really, I wouldn't really compare it to like pair programming or I wouldn't compare it to

Starting point is 00:20:32 many practices that we have used in the last 30 years, honestly. It's just a new and different thing. It's just a different way to build software. If the closest analog is, I've had situations in my career where I have intentionally created an incorrect PR and sent it to someone. The same, you know, someone's wrong on the internet kind of. Yeah, like you get, you get into a discussion with someone like,

Starting point is 00:21:01 I think you should do this. And I said, you know what, the easiest thing for me to describe what I'm trying to do here is to do a third of it. And then you'll see the other two thirds. So I'm just going to do that. Now, you know, it's very limited in the situations where I've done that, right? But now it's basically that all the time. Or at least that's what that model was, which again, is not what I'm actually doing right now. I was going to say, yeah, you just maybe think of something else that I've found is a useful thing for me to do now, is that the cost of prototype.

Starting point is 00:21:32 typing with now we're, I'm just talking generally about this is going to be a meandering conversation. Sorry, listener. The cost of exploratory work is so low now that I will happily, so I now have a restricted user on my machine and I run it in a sandbox and I put a T-Muxing to it like you're suggesting. And it allows me to running YOLO mode as well. So then I can kind of say,

Starting point is 00:21:55 oh, knock yourself out, go and, you know, I don't care. The worst you can do is delete your entire directory and everything you have in here, which in cases will inconvenience be not. one bit, you know, you'll stop working and then I'll come find you. And it has, you know, it's a separate Unix user. It has no credentials, all the things, all those good things. I don't necessarily internet sandbox it as well. I mean, if I told it to hack my network, I'm sure it could. But anyway, the point is, I can point at saying, so like, I don't know how to do this. Can you just have a spirited attempt to it? And then I can see where you get stuck and where

Starting point is 00:22:26 the problems lie, right? The cost of prototyping is spiking out an idea and going, would it be possible to do X. Like, hey, what if we were to completely replace this? So in the window behind you, my JavaScript BBC Microemulator, it's like a 400K download in the web thing because it has two separate G unzip libraries for like very ancient reasons, right? There's just like two, you know, silly reasons. And nowadays, browsers have decompression stream, but they don't support the one thing that I kind of need in it. And I'm like, is it possible to hack the heck out of it and make it work reliably on both Node and Chrome and Firefox, go. Yeah, yeah, yeah.

Starting point is 00:23:03 And it came back and I can see behind you now. It's like, yeah. And now I haven't looked at the code. It may be an atrocity, in which case, I'll throw it away. And we'll pretend it never happened. But if it's okay, that was great. Yeah, and that's not the kind of thing that I particularly want to do. And again, this is something interesting.

Starting point is 00:23:21 Maybe this is a pivot now, right, away from the kind of things that would be automated. It's like, and maybe you're feeling this too, because you said about the way software engineering is changing, the way that you're developing, this is something new that is not pair programming or mod program, whatever it is, whatever this is nowadays. It's LLM assistant, agent assistant coding. And, you know, there's a lot of people who quite reasonably have a lot of issues with it. There are people who have, you know, issues in terms of the power that it's using and things like that.

Starting point is 00:23:49 And that is a very valid concern, but not what I want to talk about here for all the complicated reasons that I'm not well thought about. And honestly, a little bit put my fingers in my eyes and hoping that someone, clever person solves this. I suppose I'm pinning my hopes on the fact that there is at least one prior piece of art that says that you can think and not take an entire data center's worth of power because it's between my ears at the moment, right? And it's not apparently not taking more than a few tens of watts.

Starting point is 00:24:18 I don't know what a brain takes, but it's certainly not hundreds of kilowatts right. Anyway, that to one side. the folks of our age, I think, are having a bit of an interesting experience here because we can see the utility of being able to do stuff very quickly in prototype like I just described. We're seeing the utility of getting stuff done really quickly, and maybe we're sort of still discovering what really quickly is, whether or not it is a silver bullet or it's just a sort of copper bullet

Starting point is 00:24:52 that's not quite as good as we think it is, whatever. something like this, right? It's a good thing. Maybe it's not as, we're not as effective as we think we are. Maybe that will play out in the medium term and we'll discover that. But what it means is that we're not programming as much anymore. Like literally hand on keyboard,

Starting point is 00:25:11 open bracket, type some things, close bracket. I don't feel like I'm doing that. There have been days recently where I've spent the entire day going. And then at the end of the day, I was like, oh, let me just look at the code.

Starting point is 00:25:21 Oh, I haven't opened even Visual Studio code. I've just been talking to a, you know, various things. It's like I'm chatting in Slack and I'm chatting in Windows and things are happening. And I'm reviewing them on like tool requests level things, but I'm not necessarily programming in the same sense. And like there's definitely a part of me. It's like, oh, I like that. Yeah. So I'm, you know, maybe I'm just using this as, you know, therapy session.

Starting point is 00:25:48 I'm still trying to work out how I feel about it. Like I can't deny that I'm excited to get things done. that are exciting to me. And I enjoy the outcome, but I feel like maybe I'm missing something on the journey, you know, and maybe that will bite me in the future as well. Yeah, no, I've thought about that a lot,

Starting point is 00:26:09 you know, in the last three to six months. And, you know, the conclusion that I kind of came to is that the career that I enjoyed for 25 years is now over. And my number one priority in exploring all these things And the reason why I change it every month, less than a month, is because the number one priority is that the new thing is fun. The productivity gains, I don't care. I'm sure they'll come.

Starting point is 00:26:36 It's kind of inevitable that they'll come in some form. I'm not worried about that at all. The sort of efficiency, the like, you know, resume fodder of like, did you use this? I don't care about any of that. The only thing I care about is that I create a new way of working that is fun. Because if it's fun, I'll do it. And if it's not fun, then I won't do it. And if I do it, then I'll get better at it. And if I don't do it, then I won't. So the only thing that matters is create something that's fun. And the great thing about this is that it's like a brand new world.

Starting point is 00:27:07 Like, you get to reinvent the rules all over again. You know, I've been having a lot of discussions with people lately about maybe rewrites aren't a bad idea anymore. Yeah, maybe not. Maybe language choice doesn't matter anymore because you can just have it, rewrite it in whatever language you want, right? Like, all of the rules might be out the window. Some of them are not going to be, but we don't know, right? Some of the rules that are closest to your heart are probably still very valid. You know, the fact that having a strong framework around the outside about

Starting point is 00:27:40 sort of intersubjective and test for, is this what we want it to be? be it, I said test. I was trying to avoid the word test. It's test, right? It's notifying correctness, right? Yeah, yes, thank you. The requirement that you verify that the software that you produce works correctly for the definition of correctly that you probably are in charge of maintaining, right?

Starting point is 00:28:03 Yeah. That's your job. What does it mean to be correct, right? Like that's why you're there is to figure that out. None of that has changed. The tools that we use to do it, I'm sure, are going to change dramatically. Right. Yeah.

Starting point is 00:28:16 Right. It's interesting. So I was having a discussion about how the kind of, if you cast your mind back to one of our first guests, Claire McCrae. And the, I always want to call them golden tests. Acceptance tests. Is that the right one? Approval tests, perhaps. Approval tests, thank you. Oh, gosh, yeah. But, you know, those tests seem to be. not let me try and think the the discussion i'm trying to remember what the discussion was i you know effectively i prefer a well-designed unit test because then i you know like you right um if something breaks one test breaks in one place and you don't have like this collateral damage of like well i changed the log line and now everything is different apparently yeah not everyone thinks that way and someone made a reasonably compelling argument to me that i'm forgetting exactly but of like

Starting point is 00:29:12 in an LLM driven world, the value of having a targeted unit test that tells you what the problem is is not as great as the fragility of like, oh, I'm changing my API and now everything has to be updated, whereas like the bloody output of your program

Starting point is 00:29:28 should be the same. Kind of makes sense to me then. But it's, yeah, you know, it doesn't sit comfortably with me yet, but I could see it. And I was like, this is the first time I'm nodding a little bit towards this. No, I,

Starting point is 00:29:42 It's funny because, again, all the rules are up for debate, let's debate, I could make the argument that the sort of isolation of those tests in the sense of like one bug produces one test, what I would have been called an informative test, is less important, but only to the extent that the tokens don't cost you too much, right? Like if you've got clod or many clods in a very tight loop running tests over and over and over again, and they're trying to make changes. And every time they make a change,

Starting point is 00:30:11 We get hundreds of failing tests. That's going to fill up to context window pretty fast. But the thing that might matter more is speed, right? Like, the faster your tests run, the more iterations you can go through. And if you can automate that, then you can get things done very quickly, right? These are things that we're going to have to find out. Yeah. Yeah, no, I think that's very, very valid.

Starting point is 00:30:34 It's, in fact, one of the more interesting things I have seen people do again recently is change the model that they're using to a smaller one. dumber because it can go faster. Yeah. It's like I don't care that it's dumb because, you know, it's protected from itself. And then I can use a slower model at the end to kind of like clean it up or whatever. And that was like, oh, that is interesting. So let me ask you another.

Starting point is 00:30:55 This is really far-reaching now. Sorry, we've reached half an hour into this conversation. We've did, we had literally zero plan. How do you feel about the fact that you're working and then, you get a, you know, 500 error from the Claude endpoint. And now suddenly everything comes to a slamming halt. And you're like, oh, oh, I can't work. And as a proxy for, oh, what does it mean for my hobby that I could previously do with an air-gapped computer?

Starting point is 00:31:30 And obviously there's 100 reasons why that's not true. But let's just go with it for now. And I'm on the plane. I have no internet access, although also that's not true anymore. I'm just tinkering around and I can do my thing. and I'm enjoying it, suddenly it's not only the fact that there is no internet there, it's the fact that there is a vendor that is the thing that is preventing you. That, you know, as much as I am an anthropic fan boy, that does scare me.

Starting point is 00:31:57 Yeah. And I'm like, oh, shucks. What do you? Yeah, that's a real serious problem. That's a real serious problem. And you know, you can see why folks are looking at like, you know, self-hosting solutions. Yeah. Again, dumber models.

Starting point is 00:32:09 It's like, well, look. Yep. I have a rack of GPUs behind me or whatever, and it's running there, and I know where I stand with it. You know, it's my own electricity building my own. Well, this leads me, and the local models thing leads me to the next iteration of things that I am doing. And one of the reasons why I am doing it.

Starting point is 00:32:30 The next iteration is I am just fully leaning into remote development environments, and I am trying, I have right now a project. where I'm trying to just set up an Amazon, AWS account, it's really just the EC2 instance and a few other attendant things right now. But basically, like, the way I want to think about this is that,

Starting point is 00:32:55 like, I do all of my programming in a cloud account, and I'm doing that for multiple reasons. Which mirrors the kind of way that you and I've had to work in finance for a while. Yeah, it's not that for that. Exactly. Yeah, we've kind of gotten used to it. And also, you know, IDEs are good at this these days. development, yeah, over SSH works pretty well, yeah.

Starting point is 00:33:14 For me, it works a tree as well. You know, I'm forever just using Mosh to talk to the machine that's under my desk here, where the real development happens for the same reason. And it doesn't matter which laptop I'm on or where I am, I'm the same dropped into the same environment, carrying on with the same sessions that were there from before. So it's, yeah. But, yeah, so you're saying this up.

Starting point is 00:33:30 Yeah, and the metaphor here, I think a useful way to think about this is like a long time ago in a galaxy far, far away. When we built systems, they ran on a computer. Like you'd go into a, you know, server closet in your office building and they'd be like, all right, this is the Dell computer that we got to run our, you know, whatever. It was on that computer. And then we built, you know, much bigger systems that run on lots of computers in the cloud and all of that.

Starting point is 00:33:57 This is what we're doing with the development environment. I think development environments are going to move from it's running on your computer that's like on your desk or your laptop to like, no, it's a whole system of interoperative. interconnected services and computers and things for you, for you individually so that you can do all of these things with agents, right? And a key part of this is security, right? Because, you know, a lot of times in a lot of environments that I've worked in, they've had this sort of like, you know, hard exterior, soft interior security model. Yeah. Where you're protecting yourself from outside attacks, but then inside the network is is fairly loosey-goosey.

Starting point is 00:34:38 Right. And I think when you have a structure of like, you know, very competent engineers internally and you are reasonable about taking precautions and various things, that model works really well, I don't want to set a whole bunch of Claude Yolo agents loose in an environment like that, right? Right. So my- seen the tweets from people saying, oh, I, you know, Claude just deleted the database, whatever, you know. And like, the thing is, you and I, as humans have made idiotic mistakes in our time. maybe not to that extent, and that's only to be expected. And the only thing that you can do to protect that is to make sure those humans know the consequences and are scared of them and don't do them.

Starting point is 00:35:19 But you can't scare the agents, so you have to just bubble wrap them and say you're not allowed to do the things that you think you're out to do. Even when it would be extremely useful for them to be able to alter the live production data is to do a rollout for you or whatever. You're like, no, that's not how we're doing things here.

Starting point is 00:35:34 Yep, yeah. And, you know, so to have brain controls and say, Like, you've got read-only to these things. You can do with your life with those. I'm with you. Sorry, yeah, I cut you all. And cloud providers are really good at fine-grained access controls.

Starting point is 00:35:47 It's something that they've been doing for decades, right? Yeah. And so I'm like, all right, my whole development environment now is going to live in AWS, Google Cloud doesn't really matter. And it's going to be managed in Terraform, and it's going to have this, like, very particular setup to it. And that's going to do a number of things for me. One, it's going to solve a security problem for me. So I don't have to worry about, like, I don't have to recognize. or fit my company's internal network to have passwords everywhere, right?

Starting point is 00:36:14 You know, and then the second thing it's going to do for me is it's going to at least potentially allow me to start running my own models, right? So I can at least explore the idea of being less dependent on third parties in order to basically do my job. Because right now, when that happens, I have to drop into the slow mode and be like, all right, well, I'm going to fire up, you know, Pye Charm or IntelliJ or V. Vim or whatever it is. And I'm just going to run the test myself and I'm going to go back to the way that I worked for

Starting point is 00:36:43 25, 30 years. And that is slower. It's definitely a lot slower. And so I think this is just like, again, and I expect this to last like a month. Because things move so quickly. It's funny. I did a panel, AI agents and the use of it in C++ at CPPCon. And they just released the video.

Starting point is 00:37:07 And it's like, you shouldn't have. bothered. You know, that was six months ago. The landscape's completely different now. Yeah, yeah, yeah. You know, it's, it's, it is, I want, yeah, I want to, so this is really interesting. And I know I asked you about this and that was your opinion and whatever and the thing. Obviously, you're now almost doubling down on the vendor lock-in with the fact that you're now, like, well, I'm running on AWS. Now, obviously, other providers are available, that kind of stuff. But like, I can put that tear form somewhere else and it'll do just fine. You know what I mean? Sure. Sure. Sure. Yeah. I am permissions are a little bespoke, but other than that.

Starting point is 00:37:38 Yeah, and, you know, yeah, there's a lot of things that, but yeah, having, you know, holding a thing in your hand and saying, this is my thing and I can do whatever I like with it, you know, in fact, until earlier I had my BBC micro behind me, which is like, the epitome of like literally everything is inside this box. That is that maybe we have to accept that it's going or, you know, like everything's a cycle. It used to be, you know, everyone would go and get timeshare on their timeshare machine. It's the only thing you had, you know, it's kind of a comeback around to that. But there's something about the models being so, you know, in principle, I can buy computers. I can rack them in my basement. I can connect them up with a network switch. And I can go on and I can be like, yeah, I can continue to do this, you know, what I like to do.

Starting point is 00:38:18 But I can't necessarily get access to the technology that lets me run an agent in the same way. Right. That's not just, you know, the GPU stuff. It's like the whole IP of it and whatever. And that just sort of, I suppose having been so open source for so long, where, you know, in principle, I can compile every piece of code that I use from source. Not that I ever do, but I could on an operation, on a machine that is mostly open to me and is well understood. That's one thing to have it sort of like taken away. We'd be like, oh, yeah, you know, my company uses SQL server from Microsoft.

Starting point is 00:38:55 You're like, wait. Oh, but it doesn't work. You're like, tough. Right. Sorry about that. It doesn't sit well. but I can't deny the, and this is what I wanted to go back to, as we kind of probably should finish up soon,

Starting point is 00:39:09 but like the joy, the fun aspect and making it fun, I am having fun, that's probably why I'm so invested in trying to find a way to make it work well for me so that I can enjoy the things that I'm doing. And that's why I think about this whole difference between outcomes when you just, I want the cool thing, make the cool thing, make the cool thing, and I don't actually care how the cool thing was made, and having the cool thing is more interesting to me than the making process. But then, you know, accepting that some of me enjoys the making process,

Starting point is 00:39:40 but knowing that I can choose, certainly in my personal life, I can choose to make the cool things. And the hope is that the thing maker, the automatic thing maker, can do the things that only, sorry, that I can't do, or I don't want to do, rather, and then I'm just left with the things that I can do, and I am good at my USP. And that's what I hope. We're not there necessarily yet. And now I can't remember where I was going with this big long meander.

Starting point is 00:40:07 But yeah, the joy. Oh, yeah, it's burnout. I want to just talk about that because I get so, you know, I'm reminded of the times when I first got into programming and I was burning the midnight oil, doing all sorts of things and I would go to sleep and I would wake up early in the morning and my mind was still buzzing with like, oh, I could do this thing.

Starting point is 00:40:24 Oh, I wonder if we could do that, whatever. And I'm finding that about programming again. but it's perhaps got a slightly darker tone to it because all the time I'm not able to feed the agents that are doing useful work for me I feel like it's wasted time and so I you know and somebody was saying to me

Starting point is 00:40:45 someone I worked with was saying like oh I woke up in the night the other night and I got up and I was like well I might as well just check on what it's doing and kind of logged in to just enter a few times on some Claude sessions and there's a certain like there's a dark pattern around that you know But there's a meme went around about the fruit machine where, I'm sorry, fruit machine being a slot machine to translate for you, where you know, you pull the crank,

Starting point is 00:41:07 which is like hitting the yes, do this, and maybe you get the right answer this time. And there is a bit of that. And that, you know, I'm, I don't know how I feel about that either. It feels, you know, so I feel like overwhelmed, overworked, burned because I'm context switching all the time, but also excited. And I'm trying to pick through that. And I wondered if, yeah, having me just talked at you like for 10 minutes, What do you have?

Starting point is 00:41:27 If you could somehow. That's, I did it. Yes. All right. I think you're not the, you're not the only person that is feeling that right now. Right. One of the things that we're going to have to figure out,

Starting point is 00:41:40 you know, I have used the phrase, I think, on this podcast before is sustainable pace. What does sustainable pace look like when you basically, you're building software at the speed of thought, right? Like there's literally no barrier. I mean, you know,

Starting point is 00:41:53 maybe this is a few years ahead of us, but it's like there's literally, no barrier between you coming up with an idea and implementing it. Right. Yeah. One of the, you know, we had a mutual friend that pointed out that he thought that, um, this sort of agentic programming would separate out programmers that love to code and programmers that love to make things, right?

Starting point is 00:42:13 Um, I think that's a little bit more of a two dimensions in a plane for me. Um, and one thing that I have, because I, I like, I enjoy both aspects of it, right? Like there's a puzzle solving aspect to writing code that I enjoy. That's why I play games like Fractoreo. But I also love building things for the sake of building things. And if I could build them at the speed that I could think of them, that would be great. One of the things that I've noticed is that I think actually this has diminished the number of personal projects that I take on. Because in the past, I would choose projects because I would be getting something from both of those dimensions.

Starting point is 00:42:54 Like, oh, this would be a useful thing, and it would be a fun puzzle to solve, right? And now the puzzle aspect is completely gone, right? There's no puzzle enjoyment from it at all. And so there's a certain category of them that just don't meet the bar anymore, right? Interesting. Yeah. But I'm also taking on personal projects that I never would have dreamed of doing before because it would have been too time consuming, right?

Starting point is 00:43:20 I think the net number of them is actually lower, but the quality is, like not better but more use because it's a personal project what is better even made yeah right it's more useful right like there's more of a chance that I'm going to actually use it and it will not just be a thing that I built three years ago and then shoved in a folder somewhere and then forgot about it is yeah it's definitely changing I you reminded me of a tweet I saw by Ellie Huxable who is the person who writes a two in the shell history thing I don't know if you know it yeah I can't live without it now that's another sort of crutch I have to live on as my memory disappears

Starting point is 00:43:56 and I have to use my infinite searchable shell history as my the most important intellectual property I create. Yeah, she was retweeting somebody who was like some high school student who was like trying to build a web browser using agents to help them do it. And I think her point was something along the lines

Starting point is 00:44:17 of like, if you aren't doing this kind of thing, are you even exploring the space anymore? Now that the barriers have come down so much, you know, like this is an amazing, kind of thing you can do. Now, whether you're successful or not, this is the new thing that you can, this is a new achievable goal, right? Or at least, you know, you can learn a whole bunch of stuff about why it's not trivial to do this kind of thing. I was fascinated by that. And I thought, wow, you know, what would it would be like to be a teen in this, in this environment? But I would like

Starting point is 00:44:46 to put a massive thumbprint thing, which I now have made a horrible noise that microphone, I'm not going to like this in the idea. I come from this as somebody who can afford to pay for the access to these systems at the level, which means effectively I don't have to think about it so much. I don't hit the limits very much, right? And that is different from somebody who is, you know, unable to afford, even I say the most minimal access to court. And that is not a good feel, actually, compared to like, you know, if you could get a computer in the old, even some time on your friends computer, you could learn to program.

Starting point is 00:45:28 And now it's like, nope, you need a $20 a month subscription at the absolute minimum to even start the whole process. Honestly, yeah, those $20 a month subscriptions are purely being subsidized and are not sustainable and are, yeah. I mean, especially what's been happening with some of the rate limiting lately, like it's getting. So I understand. Yeah. Like, so like, yes, it's introducing this whole other like barrier to entry for programming. Yeah. Which is not great.

Starting point is 00:45:57 Not great, but, you know, also if you, you know, if you're, I guess the craftsman, you can buy a set of tools and, you know, whittle away a piece of wood and make something beautiful and that's wonderful and it's an of thing. But, like, you know, I can't operate a mill because I can't afford it. It's like, well, it's fine. You know, maybe that's okay anymore. But it's just that it's always been something that is so emotionally important to us in our backstory. The reason that you and are even talking to you doesn't have on this podcast.

Starting point is 00:46:24 because we were whittling away. Yeah. It's possible that history will tell of a brief period of time where unlike pretty much every other professional engineering industry, civil engineering, mechanical engineering, maybe not mechanical engineering. You could essentially for the cost of a laptop, have everything that you needed to apply your profession.

Starting point is 00:46:48 And that was incredible. What an amazing thing to have occurred. That's not how normally things occur. No, that's true. You know, you can't be a chemical engineer at home unless you're just brewing beer. And even then, it's sort of like, you know, how are you going to scale that? I mean, that's true. You're not going to get a job at, you know, oh, God, what is the name of that?

Starting point is 00:47:11 Chemical company is like, we don't make the thing. We make the thing that goes, whatever. Some big chemical company whose name I can't remember right now because you know how to brew your own beer. Isn't your wife a chemical engineer? Well, that's why I brought that. I mean, you're from the petroleum industry for a little while, and you definitely can't do that at home. No, that's true.

Starting point is 00:47:30 Nor should anyone ever for any reason. True. Oh, boy. Somehow we're at like very long time. Yeah. We should probably still talk about it. Yeah, this is a topic that could go on forever. I know, but it's really interesting to check in.

Starting point is 00:47:45 And I mean, I think unlike most of our podcast, this one will probably go out somewhat contemporarily. So maybe this will be up to date when by the time people, people hear it still. You know, like, we won't be so ridiculously out of date, rather.

Starting point is 00:47:58 So, I mean, I'd be interested, you know, it is, it's a, it's a tough and interesting and very deep topic and has a lot of dimensions that we're not used to dealing with as programmers,

Starting point is 00:48:07 you know, like, which sort algorithm is better is like, where my, you know, what cash lines should be pulled. How the branch predict as well, these are things that I'm kind of more comfortable with.

Starting point is 00:48:16 But when it's like, you know, hey, this may change the whole nature of what we do as a profession. And it may exclude people. It may cause environmental damage, but it's also very exciting. I don't know how to navigate it other than like.

Starting point is 00:48:27 It is a kind of ride in the way. World. We're in a crazy world. Well, friend. Let's leave it there for this time and we'll let our listeners let us know if they want us to continue to ranting and rambling about this or whether we should go back to something a little bit more traditionally choose complementi. But well, either way, this has been fab. Thank you for bringing me up to date with what you're doing.

Starting point is 00:48:49 Right. And we'll have to try and find a time to grab a coffee in person sometimes. the bar. That'll be the plan. That works. That works for me. All right, friend. Until next time. Until next time. You've been listening to Toos Complement, a programming podcast by Ben Rady and Matt Godbolt. Find the show transcript and notes at www.org.

Starting point is 00:49:11 Contact us on Mastodon. We are at Tooscomplement at hackyderm.io. Our theme music is by Inverse Phase. Find out more at InversePhase.com.

Two's Complement - Speed of Thought

Ben has stopped talking to Claude directly. Matt hasn't opened his editor in days. They try to work out whether this is fun, programming, or a very expensive slot machine....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.