Algorithms + Data Structures = Programs - Episode 277: High on AI Update

Starting point is 00:00:00 5.3 codex, or I guess 5.4 codex now. And then, you know, Sonnet and opus. And whenever you do that, one of them always solves it. And so that's where I'm at. What is your, you know, daily workflow? We know you're on cursor. What's your, what's your model that you use? I use the, the Claude 4.6 opus high thinking. I use like the slowest, like, best thing. because I don't care how long it takes. I want it to be as good as possible. Welcome to ADSP the podcast episode 277 recorded on March 10th, 2006.

Starting point is 00:00:48 My name is Connor, and today with my co-host, Bryce, we revisit the topic of AI and do a deep dive the first time since August of 2025. Sorry, I was, I was distracted by the AI. I was too busy talking to it to notice you were back. Understandable. Understandable. I got Auto Run working. Maybe.

Starting point is 00:01:19 I don't actually know because it still asks me quite a lot. So there's a couple weird things with the new auto run. First of all, the old auto, like the old or the sorry, the non-auter run mode, if it needed to search the. We're talking about cursor, folks, just we're hopping right into it. We're going to talk about AI, lots of stuff today. but Bryce was telling me I needed to focus all my energy on getting sandbox, auto run with cursor, 2.2. Whatever and higher. I'm on 2.6. And I finally, thanks to, well, actually, let's get their names, just their first names. Thanks to, why isn't Slack open? Slack's not open? Because we got issues

Starting point is 00:01:57 with my computer folks, and I had to restart it. Was that the root cause the problem? No, no, no. I just, I thought I'm having SSD issues, but I asked Kerr, cursor about it. It said that my SSD is fine. It ran some F-Trim command, and that freed up 76 gigabytes of stuff, and it said you don't have F-Trim running. And so I auto-set that up, but every once in a while I run into an issue where I lock my computer screen, and then just like the UI becomes awfully slow. Like, you know, I will type in a character, and then I'll see it like a minute later show up. And so I just have to restart. And I thought it was an SSD issue. Curser seems to think my SSD is fine. Anyways, thanks. Shout out to Pavo and Scott on the CDD Code Assistant cursor thread in our Slack. The main cursor individual just pointed me at the docs. I'd already seen the docs, folks. And also was telling me that my colonel wasn't new enough, which it definitely was. It was. But yeah, yeah. Well, I don't know. It is confusing. Dot 11 versus dot 2, you know, Which is newer? Is it, is dot 1-1?

Starting point is 00:03:06 You know, technically that's, what do you call that? They call that lexicographically less than two. But if you interpret it as a two-digit number, then it is larger. Anyways, I got, thanks to these two individuals at Nvidia, I got the sandbox auto-run working. However, at first I thought it was working great because it wasn't asking me for stuff, but now it still asks me to run like Python commands every once in a while. And I'm like, that should definitely be inside the sandbox, you know, rules. So my experience with it has been interesting because it used to be that the way our cursor was set up, you know, if it wanted to search the internet for something, it would just do that. But then any command that it would need to run, it would ask for permission, which was like super annoying because my job just became sitting there and waiting two minutes for it to churn out the command and then pressing yes. And then inevitably I would go do something else and I would come back like 30 minutes or an hour later and remember it. And I'd be like, oh, this thing has just been waiting for my.

Starting point is 00:04:02 approval. But now what I find happens is that it'll run most stuff in the sandbox. The way that it works is basically it tries to run everything in the sandbox. And if it doesn't have enough permissions in the sandbox, then it will ask you to run it outside of the sandbox. But what I've noticed is that with the sandbox mode enabled, I get these requests to like fetch content from the web. And I don't think it's like downloading scripts. It's like, oh, like I want to look at the docs for this thing. So I think if you have it in sandbox mode that like any network request, it's going to ask for permission. So that is annoying me a little bit that like I don't mind approving the commands that actually do need approval. But like basically if you if if you're encountering that, you should look at the reasoning trace.

Starting point is 00:04:48 It probably means that the Python commands that's running in the sandbox for whatever reason don't have the right permissions. sometimes that happens if like if it needs to access something that's outside of the repo route so just like look at the reasoning trace it should like explain it like you should see you should see in the reasoning trace like the reason why it didn't like why it failed to run it in the sandbox but has it been a force multiplier for you yet not really not really i mean don't get me wrong cursor itself is a hundred X force multiplier but the auto run like until it i don't have to click

Starting point is 00:05:25 any buttons. Like, I don't care whether it's like 10% of some set of stuff or 50% of some other set. If it's any percentage of any set of stuff, that is like in the common path, a.k.a. Network requests, aka running scripts locally, aka, I mean, I think you should still be able to locally remove files if you're in a Git repo. You know, you can just, you know, get checkout dash, dash, dot, and you're back to where you were. I mean, removing stuff outside of the repo. Yeah, that should require, because, you're in a get, you know, because. that is something every once in a while now. And this is one of the great, we should back up in a second and start giving a full AI update.

Starting point is 00:06:03 Because I entitled last episode, The Mini Cursor slash AI Update. But anyways, we'll step back in a second and do that. We'll talk about the models we're using our. I'm trying to get statistics right now about how much more code I'm output in with the auto run than before. I was scared of the auto run for a long time. You know, like we're like old men basically in the world of AI. because we're still using cursor, you know, like we're not using ClaudeCode like everyone else.

Starting point is 00:06:30 No, no, no, that's not true. First of all, I'm not an old man. You're an old man. I was, I bootstrapped an auto-run OCR screen capture. Like, I was working with a project where I really needed, I like didn't, I needed it to be running while I was away. And I, and we didn't have access to like what they called Yolo mode at the time. So I wrote, I got, I bootstrapped using cursor.

Starting point is 00:06:52 I think it was 3.5 at the time. A script that would take script. screenshots of my main monitor, it would identify the, I think there was like three different things. One was run, one was approved, and it changes over time. It would find that. Then it would use auto hotkey, I believe, or some, you know, Python move your mouse module to move the most, go click the button. Beautiful, beautiful. You're, I'm not old man. And someone, I actually told someone this like way back, like a year plus ago. And they were like, well, aren't you worried it's going to like, you know, RM dash RF, your system? And I was like, that is a risk I am willing to take because of how

Starting point is 00:07:24 useful these tools are. And it never did, folks. It never did. So you may be an old man. Does it make us old? You said you tried cloud code, yet you're back to cursor. So what happened? So, okay, no, no, no. I didn't try cloud code. I downloaded cloud code. I tried to use our corporate login to log into cloud code. And because, like, with it, I couldn't get cursor auto run working similar to you. I followed all the instructions, but, like, I wasn't showing up. And so I was like, all right, I'll just try clog code instead. I actually didn't. I think actually, I didn't know that we had cursor auto run. I just knew about the ClaudeCode auto run. And so I was trying to download and install Claude code with like our corporate, you know, account. And when I tried to do like the single

Starting point is 00:08:06 sign on through Claude, like it gave me an error and I went to the Slack channel and like five other people had reported the error. So I just determined like Claude codes, you know, SSL, like SSO integration with us was not working that morning. But like I had to do stuff. So then I like search within the internet. And I was like, oh, there is a cursor auto run. I don't have to go learn any new things. Also, my, like, my first, like, two minutes interact, because cloud code is a two-y. It's a text user interface. You start it from the command line, and it pops up, like, a little text windowing system instead of just being a GUI. And my initial interactions with it solely from starting it to, like, the first few steps of the login process, I was like, oh,

Starting point is 00:08:48 this is so awkward. And, like, I keep in mind, I'm somebody for many years, Like did everything in VAM and like a command line. I never used an IDE before AI whatsoever. But like my initial reaction with this was like, ugh, you. Like I want to go back to cursor. And I feel like I'm going to give it another shot probably after GTC because everyone seems to be using it. And somebody made the point to me that because there's so many people using it, that there's a ton of like useful skills out there for Claude Code.

Starting point is 00:09:17 And that like you're like one of the one of the benefits using it is that you get like ecosystem effect of like everybody else using it. But I'm very happy with the cursor experience generally. I like having an IDE for this type of development because I find that the majority, the thing that I spend the majority of my time doing is not writing code. If I was just writing code, I would probably, you know, want to do that in VIM, but the thing I'm spending the majority of my time doing is browsing through the code base, you know, browsing through the structure of the code base, looking for particular files or something to give it a task to do, reviewing code. And I think if I'm spending most of my time reviewing code or browsing through

Starting point is 00:09:56 the sort of the file structure, I want an ID. I want a GUI for that. Not just solely a text interface. So I'll give it another shot and we'll see, we'll see what it's like. Yeah, we should put it both on our list of things to do. Admittedly, I've tried cursor CLI, which I think is modeled very closely after Cloud Code. And I have heard this about Cloud Code is that anytime you make a request, it like gives you a couple questions like, do you want me to use React or Flutter? or like something like that. And like the cursor CLY experience, I was like, this sucks compared to Cursor and I just went straight back to Cursor. No offense.

Starting point is 00:10:30 I mean, the lovely folks, Michael Truel, CEO of Ennisphere and you're doing great work. I'm just saying the CERL CLA experience was subpar compared to Cursor. And I just have had, I mean, I'm saying I have no need to switch. Yet here I am whining about the fact that we don't have auto run basically like 100%. But I don't think you won't have that with Claude, either. Like, like, the corporate auto run, I think, well, I do know that they have like a pilot program. I don't know if we're allowed to talk about it or talk about any of this, but I know that they're working on rolling out auto run with network access. So like sandbox, but with some amount of network

Starting point is 00:11:07 access with like an allow list. I've seen a couple emails about that. So we may be able to do that soon. The way that it works in like sandbox mode. So all these sandbox modes, they use like built in OS or kernel features to run your stuff in. I don't want to say it's like Docker, but it's kind of like similar to like a containerization like Docker where it's running it like on your OS but like in a separate environment. I think on Linux, the thing that like cursor and cloud code use is called Landlock. The difference with Docker is that like Docker runs it in like a different environment.

Starting point is 00:11:43 And like it's like Docker's like sandbox, but it's in a different environment. I think the thing that Claude Code and cursor are doing when they're sandboxing is it's not a different environment. It's your same environment, but it's still like sandboxed. So it has a different set of permissions, and it's got like, isolated access to your file system, to your networks. So like when it runs it in your repo, it's going to give it access to the files that are within your repo, but not like the whole, it doesn't have access to like the whole rest of the file system. Yeah. I think it's called Landlock is the thing that they, that they use. But, you know, like, I think the,

Starting point is 00:12:19 idea with the auto run with a network allow list is like by default right now today if you try to access anything in the sandbox like anything that has to you know access the internet it's like it won't work in auto run and then it'll fail and then it'll ask you for permission and with the allow list it's like it gives you a blessed set of a blessed set of things and one of the other things that's specific to invidia is that i think in in both cursor and claud code. The default sandbox setup does not have GPU access. So if you need to like run your tests and your, because you work at Nvidia, your tests are testing. That's only a small part of our business though, right? The GPUs. Waiting for the joke to register on Bryce's face there was hilarious. I could see

Starting point is 00:13:10 the gears turning. It's it's rather early in the morning right now. And Bryce was like, wait, is he talking about the GPUs? So this is just like a common problem with GPU development and containerization is that like you need, you need some amount of privileges to access the GPU. And I think by default in these tools, the sandbox environment doesn't have access to the GPUs. And so anytime you go to run your tasks, it has to ask for permission. Like, I've noticed that I have some pre-commit hooks in the repo that I've been doing most of my work on recently. And the pre-commit hooks don't run in the sandbox. But I actually kind of like that because I will have cursor write my commit messages.

Starting point is 00:13:49 for me. I'll tell it like, you know, commit this change, like push it to GitHub, like open up a GitHub PR, et cetera, because I don't want to deal with the Git command line. But I have noticed that sometimes if I tell it like, hey, go like fix this thing, sometimes it will like go fix the thing and then it will ask me if it wants to commit the change. And sometimes it will just commit the change. And it will do that before I've had a chance to review it. So it is actually useful that the, that the pre-commit hook fails in the sandbox because then it means that before it commits anything, it goes and asks me. Yeah, at one point on, not on my workstation, but on my work laptop, I somehow magically, for like, it was a month period, I downloaded some cursor,

Starting point is 00:14:31 and it just had yellow mode. And I was like, I was so fearful that cursor was going to force me to update at some point. But this was back when it was like Claude 3.5. So it's like, you win some, you lose some, right? Like, really what we need is it with 4.5 opus right now, like 3.5 sonnet. it wasn't as great, but still, it was, it was still amazing to not have to hit that button. And except for one time it did write me, like, write the commit message and like push. And I was like, whoa, whoa, whoa, whoa, whoa, whoa. Like, I was a little bit scared there. I was like, I did not ask you to do that. I do not know why you think I wanted you to do that. And then so I immediately was like, you know, added it to the custom rules or whatever they were called and was like, please never

Starting point is 00:15:13 get push or get, even even writing a commit message like is confusing, right? And like, because if you get ad and get commit, like the changes disappear. And then you're like, what happened? And you have to go get log. And if you don't go get log, then or get show, like, you're a bit confused. Like, did it just yeat all the changes? And then you're looking at the files and you're like, no, the changes are there. And it's like, did it go commit these? Anyways, so it happened once to me and it was a very terrifying experience. If that's not what you are. I know that there's some people that are just, you know, letting the AI do everything. But I do like writing my own very terrible commit messages. They're always just like two words or three words. But it gives me a little bit of joy. running get ad, get commit, and get push, which is the only reason I do it now, folks. So it's funny, I have, I've gone down a slippery slope here. I, historically, the commit messages that I've written have been one to two sentences in fairly descriptive, but not like a paragraph. And I typically what I do is I write like what section of the code base the changes in, like if I'm changing like something I've been working a lot with our tutorials lately. So if I'm changing a particular tutorial, I would put like tutorials slash accelerated Python slash the name of the

Starting point is 00:16:24 notebook or the stuff I'm working on, then colon, and then a description of the change. And I had a cursor rule that basically said, you know, when you, when you write a commit message for me, look at like the previous commit messages and follow the format that I used and like, you know, make it, like state it in my voice, basically. Like follow the style of what I've been doing. And I did that for a while and then like starting last week I've been doing a lot of changes to this repo and I started using the auto run and you know it started like wanting to it would follow that format for the the first sentence but then you know how get mess get commit messages you can have like one sentence and then you can have like a blank line and then like a paragraph with a longer description and then it started like writing like slightly

Starting point is 00:17:10 longer descriptions and like one time I was like all right this longer description is actually like helpful. And I was just like, all right, approve. And then what happened is because my cursor rule told it that it was supposed to look at the commits, instead of... It became verbose. Yeah. Instead of telling it exactly what format I want, I told it, just look at the style of my previous commits. But then, like, I wrote this one commit this way, and then it was writing all my commit messages. So over time, it started to evolve, and the commit messages have gotten a little bit longer. And now it's like, it's totally doing its own thing. And honestly, I don't look that much at the commit message descriptions that it is writing because they've

Starting point is 00:17:51 usually been good. I'm usually not that concerned about whether it's written a good commit message or not. I do like, I'm too busy like reviewing its code to spend time reviewing its commit messages. Yeah. Well, I mean, I said we were like, I was just thinking that you call that like extreme drift, commit message drift based on the AI models. But we haven't actually talked. So let's now like, I mean, we were doing this the reverse order that.

Starting point is 00:18:15 we should have, but, you know, Se la V. Chaos with Springs of Information. Let's give our, I'll go first and then you go, so what is the update? I mean, actually, before we do that, let's, when, when did we do the high on AI? I think it was August of 2025. Let's see if I am correct about that. High on AI, part one, part two was July 25th and August 1st of 2025. So that's the last time. I mean, obviously, it comes up from time to time, but we haven't really done a deep dive being high on AI again. So I'm, I mean, I'm probably not going to call it. Maybe I will.

Starting point is 00:18:47 I don't know. Maybe we're going to revisit. I thought you were going to call it idea person. No, no, no. Because we're going to split this up. We'll call one of the next one, the next part, the idea person one. So this one, we're just talking about our workflow now. So I think back in, and actually, let's go to the, what do they call it?

Starting point is 00:19:03 Is it intelligence index? Artificial, artificial analysis is what it's called. I don't want to resume browsing. Just bring me to the website. And actually, I'll share this. Let me move some stuff around. let us take Bryce's head, move that over here. And now I'm going to share, because we've shared this site before, but we will link it as well.

Starting point is 00:19:26 And now we are going to scroll down to what we want is this one. Why they show 16 different LLMs by default is beyond me, folks. Korea Telecom is an LLM. And all we care about is Google Anthropic, an open. AI. So right now we are staring at, I've showed this in a talk that I gave back in December. It is the evolution when they introduced the different models. So right now we're on Gemini 3.1, GPT 5.4, which is the most recently released, haven't taken it for a spin, folks, but I typically don't like the open AI models. And then Claude Opus 4.6. So back in July of 2025,

Starting point is 00:20:10 we clearly were on Claude 4, Sonnet, Gemini 2.4. 5 and 03 Pro. So back then we were using Claude 4, but it was Sonnet 4. And there's a huge difference, huge difference, folks, between when they released, it doesn't really show when they released, or I guess was 4.5 was actually, so 4 was good. But yeah, 4.5 was the huge jump in the tail end of November. So, you know, obviously we love these tools, was using them daily. and then they released 4.5 and Claude 4.5, Sonnet and Opus, late November.

Starting point is 00:20:49 What was the actual date? It was, well, it says Sonnet was December 28th, but then Opus was, yeah, November 23rd. And I remember this, and then this was like, look at how close these were. GPD 5.1, Gemini 3 Pro, and Cloud 4.5 Opus, the dots are literally like on top of each other. So it was probably OpenEI, released first on the 12th, and then they all, within like a couple weeks, drop their model increases. This was like, it was a massive jump. And the first thing that I ended up doing was basically creating my whole slide deck for that talk that I gave in December. I created the whole slide deck. I started by creating the title slide. And then I was like, oh, maybe I'll do the About Me slide.

Starting point is 00:21:32 And then I was like, wait a second, I can just keep on asking it to create a new slide. It created this nav.js thing. The whole thing was done in JavaScript. Oh, my God. It was beautiful. But that was not taking it to its full potential, folks. And since then, you know, 4.6 came out like just a month ago. And I mean, 4.6.

Starting point is 00:21:50 And so the thing is, here's one of the big things. I used to use Sonnet on a daily basis because Opus took too long. You couldn't interrupt it as easily. And I felt that working with Sonnet, actually, you could get further by, like, redirecting it whenever it went off the, you know, the railway that you wanted. it to go down. Now, the models are so good, you got to use the big ones, folks. You got to use the big ones. They one shot everything so well. And then you obviously want to design it a little bit differently, but probably up next in the idea episode, we're going to talk about all the different things.

Starting point is 00:22:25 Last episode, I mentioned probably in a weekend, on a weekend day, you could build a podcast app. That podcast app has been built. It is beautiful. It's better than all the other podcast apps out there. I've been using it. There's still a few. Oh, yeah. So I will. well, should I? Maybe I will give, if folks leave comments on the GitHub discussion, I will find a way to give people access to the APK. I'm not releasing it yet. I want this thing to be perfect. Or at least perfect for my daily use cases. Obviously, there's going to be edge cases that I don't run into because it's not the way that I use podcasting apps. But still, like, I use this. So I added right before I shared it with Bryce, I added a stats page. And that was last Friday. So recording this on Tuesday morning. Since then, I've listened to 20 episodes. episodes, 14 hours and 25 minutes, and it's got beautiful stats. But every day, I noticed something like this morning, I woke up, my podcasts were not there. And this is a common issue for other podcast apps. I imagine Apple Podcasts doesn't have this, but because it requires background, like, checking of is there new updates, a lot of phones in order to conserve battery will, like,

Starting point is 00:23:35 aggressively shut down apps' abilities to, like, do different things. while you're not using that app actively. The weird thing is, is when I opened it, there's supposed to be some code in there that says, oh, you've opened the app, definitely go and do a check, but I had to go shut down the app, reopen the app. And then only one of the three new podcasts had downloaded at that point.

Starting point is 00:23:55 I had to go manually. So there's still little bugs. 98% of the time, it works perfectly. And it works better than CastBox, which I was using before and other ones, just because I designed it the way I want to use it. Anyways, we're going to talk about that. We'll talk about Array Box.

Starting point is 00:24:07 You did give me access to the repo, right? I was hoping over the weekend to file some issues and fix some things, but I didn't have time. I didn't add you, but there's two different repos right now. One of them is a public one, so I just said file them on that. That is the one that backs the Podgod.ca website. There's another one that's privated, which I'm probably going to keep privated for now, because I think that this could actually like take off, because it is going to be a podcast player for podcast listeners.

Starting point is 00:24:40 All these other podcasts are not for listeners. Therefore, these companies that are trying to make money. CastBox, the one that I was using, is a VC-backed podcast like company that is like spamming you with ads all the times. That's why when you're looking at the artwork, it's always like rotating in between an ad and between the actual artwork of the episode. And like when you go in between episodes and downloads enough times, it shows you like a 30 second ad that you have to like, you know, skip. And then they've got their premium thing, which is like pay money. Guess what, folks, I don't want to make money off of this. I mean, don't get me wrong.

Starting point is 00:25:09 I'd be happy to make money off of this. But I'm not the same way that ADSP has never had an ad. I'm not going to say we never will have an ad. If there's a life-changing amount of money, I'm happy to sell out, folks. I'm happy to sell it. You tell me a number. And I'll let you know if that number is big enough. But, you know, I don't think anyone's going to give me a number that I'm going to be like,

Starting point is 00:25:27 I'll subject the thousands of listeners that we have to an ad. Because I personally hate ads. That's one of the other issues I ran into the other day. I was trying to skip an ad from the lock screen, and I accidentally hit, like, the tracker to the end. And when my podcast hit 100%, they automatically get deleted. And that was very unfortunate. But then what did I do?

Starting point is 00:25:47 I just went and blocked the ability. Like, how often do you ever use the tracker, like the slider, on the lock screen of a podcast in order to skip ahead? You never do that. Yeah. The only time you're trying to skip ahead is when you're trying to skip ads, and you use the little skip 30, skip 30. Anyways, we're talking too much about podgod.

Starting point is 00:26:04 But the point is, quad 4.6 opus, it's basically what I use as my daily driver. It's amazing. You have to go try it if you haven't tried it. Every once in a while, if I give it too difficult to task, because I am giving it the world now. It will fail. And after like a couple minutes, if I can't get it to, you know, get the right answer, I just go into multi mode where, or multi-agent mode, where I'll give a Gemini 3.1, you know, 5.3 codex or I guess 5.4 codex now. and then, you know, sonnet and opus. And whenever you do that, one of them always solves it. And so that's where I'm at. What is your daily workflow? We know you're on cursor. What's your model that you use? I think I, well, I was just checking.

Starting point is 00:26:47 I use the Claude 4.6 opus high thinking. I use like the slowest, like, best thing because I don't care how long it takes. I want it to be as good as possible. and I am not paying the bill. So, you know, I'm using it for work, for work stuff, and they didn't tell me to worry about the cost. So I just use whatever the top thing, which I think 4.6 opus is the top thing with the high thinking. Is that what you're using too, presumably? I believe so.

Starting point is 00:27:23 I've gone to cursor dashboard, which they definitely at one point used to tell you, I mean, there is a usage thing, but the usage. just shows you the cost of every single request that you make. And if I went, I went down to, I clicked on the left on analytics and it showed it there. It showed, when it showed my like ranking, it showed me what was the most common, the model I used most commonly. And see, mine shows me the activity graph. I mean, you're looking at my screen right now, scroll down, scroll down to the bottom. Yeah, you see right there, it should say opus high. So you're not using thinking mode as frequently.

Starting point is 00:28:00 Where does it, wait, where does it say open-eye? You see in the list of rankings, it shows where you are in the rankings. Oh, yeah, yeah. That's sad, though. They used to actually give you like a pie chart. Yeah, I mean, I do it mostly for, actually, I don't know why. I probably just clicked it. And maybe actually, I think I have switched back and forth because thinking does have, well, I don't actually know.

Starting point is 00:28:20 I think, well, that's why I want to see a pie chart because, yeah, I'm not sure. I'm not sure why I'm not using the thinking. Maybe I would have, those couple problems that I had, thinking would have fixed it for me. I never really changed the model. I just always used thinking. And I don't even, like, I haven't even, like, I don't even play around with other models that much. I'm mostly just, like, I'm too busy for that.

Starting point is 00:28:40 I'm too busy for that shit. I got too much stuff to do. But yeah, I always, I always just use the one that's, like, slowest and fastest. I mean, what I do is, have you started using the Git Work trees? Yeah, I mean, that's easy. You just click a button and it sets it up. And then you need that for the multi-agent mode. Yeah.

Starting point is 00:28:54 So I don't, so the multi-agent mode is the one where, like, you have it, like, multiple agents working on different solutions to the same problem. And then, like, it shows you the best one. I haven't used that. I mean, maybe I could try using that. But my primary, like, mode of operation is to, like, be working in parallel. So I'll typically have two or three different, you know, tasks than I'm working on at a time. So I'll have two to three different chat windows open. Sometimes, sometimes four, but usually it's two to three because, like, that's roughly the, like, length of the pipeline of, you know, I write, I write some prompt of, here's the next thing to go do, and then when it starts churning on that, I go to the next window and I start feeding the

Starting point is 00:29:39 next prompt in, and then I sort of like ping pong back and forth. And I found that like two to three is about the number of like tasks that I can have running concurrently because like it needs my attention at some points, right? Like it either needs my attention to approve something or it needs my attention because it's finished, its response, and it needs more input from me. And I use, I'll typically use the get work trees. I have run into a couple issues with the get work trees. But then the other thing is I try to, I try to find separable tasks, tasks that are not going to have overlap within the code base. And that's actually challenging because I think typically when you're working on one thing, you'll realize that something else

Starting point is 00:30:19 has to be done in the same place in the code. But if it's, you don't want to see realize your process. And so if I'm working on like one file, let's say, and I realize some other change needs to be made within that file to like the same place that the current, you know, chat, the current agent is working in. If I go and have in a separate work tree, another agent work on that related task on the related code, then I'm going to probably end up with some sort of merge conflict later. And sure, it can resolve the merge conflict. But if I end up with a ton of merge conflicts, like it takes it time to go and resolve the merge conflicts, sometimes like more time than like other tasks. And so I typically want to avoid that. And so I try to find

Starting point is 00:31:06 like separate places in the code base where I can be working like two completely separate tasks or at the same time. And then what I do is like if I am working in one place and I realize, oh, hey, there's this other thing that I need to do, I will buffer up that other task in some way. I feel more like a project manager because I have like these, you know, lists of like to-do items or of like action items or GitHub issues or things that need to be done that I keep in various places. I was thinking about like filing a GitHub issue for each thing. But honestly, like that would be too time consuming. That would be too, that would be too process heavy. So what I...

Starting point is 00:31:44 It is so funny that you say that because like I used to. That was my workflow. Anytime I would have a project and, you know, I'd encounter something that either needed. to be fixed or changed or a feature that I wanted to add. I'd go. Usually I have like, what do you call it? Like a plan to do list. And then I would create sub-issues that would all link because it looks nice and whenever you finish one and they turn purple on GitHub. And now I don't, I don't create those anymore. I just have a to-do like text file. And then I just, I put them in because you can finish them so quickly. And one time, I looked at the list and I was like,

Starting point is 00:32:17 actually, you know what? I just pointed Opus 4.6 at it and said, there's my to-do list. I was in plan mode and I said make a plan to like do all these things. And as long as they're not like too big of tasks, it'll then go and create like a really nice like checkbox, checkbox, checkbox. It'll do them one at a time. And like, I remember one time I had like 14 things in a to do list, which like typically I would have gone and created 14 different issues and then closed them one at a time. And it just got, I think it got 13 out of 14 them like one shoted. And you know, you have to go and test them and verify that it did it. But I was just like, so it's so funny that you're saying you used to use issues and now you just use like a to do

Starting point is 00:32:53 list because that's exactly what I've started realizing too. It's a waste of time to go set these things up because you can basically fix them and add features so quickly. So you said 14. That's cute. That's cute. Now, admittedly, this is out of line in the repo because this is on a project that I'm working with other people on. And so I needed to share it with them and we use a lot of Google docs within Nvidia, although there are now very good tools for hooking up Google Docs to agents. But this is my list of things. And I don't know. How many items do you think are here? I mean, there's like 10 pages, so we're going to go definitely in the hundreds.

Starting point is 00:33:28 Yeah. So admittedly, this is like a backlog from, like, originally this was like in my personal to-do list from like the last like four or five months of like all the various stuff I'd written down. This is for our training material. And basically every time I do the training, I make a bunch of notes of like, we need to fix this, fix this, fix this. And I write it down.

Starting point is 00:33:49 And then like last week, I just like took them all and I merged them and organized. them a bit. But just like this set of things right here that's undercompleted, this is, I don't know, this is maybe, maybe 40 items. And like each one is like basically a sentence or so. Some of them are like descriptive sentences that say exactly what the change should be. And then like why. Some of them are just like, you know, like kernel correctness checks. But these are like 40 things, 40 like individual things. And these are all things that I fixed in like the last two days. and this is not even like all the things that I actually fixed because there's some set of things that I notice while I'm doing other things and I just queue it up in in cursor itself. One of the nice thing is with the cursor chat, if it's working on a prompt, you can send a follow up.

Starting point is 00:34:40 And the way that it works is it has a cue. And so by default, if you type something while it's working on something, you press enter, it doesn't send it to the agent immediately by default. it puts it into this little queue, and then once the agent has finished its current prompt, then it will send whatever the next thing is in the queue in the same chat session. But there is also a button for it called Send Now. So you can press that, and that will send the message immediately. And so if you're watching what it's doing and you see in its thinking trace that it's like going off in the wrong direction, you can send a message to it immediately like saying, you know, oh, don't do that.

Starting point is 00:35:19 And I do do that, you know, every now and then where I'm like, oh, you're heading in the wrong direction. Like, don't do that, do this instead. But what I will often do is, as I said before, like, you know, sometimes you'll be working on a project or working on one particular task. And then you notice that there's something else related, like, in the same place. And so what I'll often do is if I notice that while it's working on something, I'll just put the follow-up thing in the same chat and I'll cue up a couple of them. And like those, I don't even, I don't even write down those to-do. I've just like, I've queued up,

Starting point is 00:35:51 I've queued up this series of prompts that are going out to the agent. I, I am a little cautious about doing that because I don't want to like, you know, I don't want to have one chat context that has like a bunch of unrelated tasks, but I do find myself,

Starting point is 00:36:07 like in the early days, like when I start first started using cursor last year, and I guess it was like spring last year, maybe. Maybe it was earlier than that. I noticed, I would very frequently create a new chat because as the chats got longer, performance of cursor itself got worse, but also I found that it went off on more tangents. So I would almost always, for a new task, I would

Starting point is 00:36:31 start a new chat. I think mostly because of this problem where I had a long time where cursors, UI, like in the early days would like freeze up for me all the time. But now what I find myself doing is I'm more likely to continue work in an existing chat because usually the, you know, There's like relevant context there. Like I've told it to do, you know, like, I've told it to like do some task and I want to tell it to like extend or or fix something like related to like some function foo or some feature X. And if I was in a new chat, I could probably just say something vague like, you know, like, sorry, if I'm in a new chat, I can't say something vague. If I'm in a new chat, I can't tell it just like, oh, go add, like, go at, let me give a better example. Let's say that I'm like modifying some function to add a new parameter to the function.

Starting point is 00:37:24 And so I modify it in one place. And then I'm like, you know what, there's this other function that I should also add the same parameter to. If I do that in a new chat, then I have to tell it, go add a parameter to function fu, just like this parameter that was added to function bar. But if I do it in the existing chat, I can just tell it, go add the same thing to bar. I don't have to, like, explain more. I can type it out quicker. I don't have to give it as much context.

Starting point is 00:37:53 So I find myself doing that more and more, and I've had less issues with it getting confused with large contexts. Yeah, degradation is not as big a deal. And I found it's, yeah, it's funny. I think probably a lot of people that use these tools on the daily end up learning the same things because I have the exact same thing. like unless if it's an entirely new like thing that is going to require a long conversation, I almost always just keep it in the same chat because there's always at least like 10% of what you were just doing.

Starting point is 00:38:24 Even if it's just like for me, because I've been working on like the podcast player, like building the release APK and launching the emulator, like that is in the build thing. But like as soon as you have like a chat window open, you've done it once, like it's like so fast. Like, whereas anytime you open a new chat, you're going to see it go and, like, read the docs and figure out how to do it again. And, like, even that much, it's just like the model, like these, the conversation lengths don't lead to degradation the way that they used to. So what's the point in starting a new one?

Starting point is 00:38:56 And unless if there's some, like, good reason. There is, there is a disadvantage, which is I have found that for cursor rules that are set to always apply, which is most of my cursor rules, because most of my cursor rules are very short. So if you have a long cursor rule, then you want to have a cursor rule where you have this little header where you tell it, like, in what situations does this apply? If you've got one that's like a big, long, descriptive thing. But a lot of my cursor rules are very short. Like my cursor rule for the commit messages is just like when you commit a Git, when you make a Git commit, look at the existing Git commits to understand the, you know, format and style used in this repo. It's literally one

Starting point is 00:39:38 sentence. So there's no point in me having a like short description of that at the top. So I just said it to be an always apply rule. And what I've found at least maybe this was more true like two or three weeks ago before the latest cursor updates. But in longer chats, it would not look at the cursor rule or like it would just completely ignore the cursor rules. And what I think was happening is for the always apply rules, like it had them, it reads them in its like initial prompt. It gets included in the initial prompt, but then, like, if I had a long chat, like, the cursor rule was falling out of the context. I've noticed less of that recently, but I also, like, the main cursor rule that I had

Starting point is 00:40:21 was the get-commit message one. I don't know that I have any other, like, cursor rules that are super impactful. I know everybody, like, talks a lot about skills and cursor roles and stuff like that. Honestly, maybe I'm doing something wrong, but I have not found. as much of a need for that recently. Maybe it's just been the sort of task that I've been doing, but even with no rules, the reason why I would think you would need a rule or a skill would be if you're having to explain something multiple times to the model. If like you tell it to do something and it's like doing it wrong and so you have to tell it to do it in a more descriptive way. Like,

Starting point is 00:40:59 it used to be that whenever I would tell it to make a Git commit, you know, it would write a message that I didn't like. Or one another example, it used to be that if I would tell it to push a commit, I like to have my branches on the main repo, not on my fork. So if I would tell it to, you know, commit this change and push it and open a PR, it would push it to my forked repo instead of to origin, which would drive me nuts, a variety of reasons we can get into some other time. But that would be something that I would create a cursor rule for so that I don't have to explain every time that I wanted to push to origin in not to, you know, my forked repo. I wanted to push to the main repo, the branch to the main repo.

Starting point is 00:41:41 But I have not found that many things, that many things that I'm asking it to do where I need to explain to it how to do that thing in some way that I find myself repeating a lot. I typically, I write some description. Honestly, my prompts are not very good. Sometimes I can write something very vague. and that's usually good enough. Like, I don't find myself spending a lot of time writing cursor rules or skills or anything like that.

Starting point is 00:42:11 Like, I don't, I haven't felt the need for it. Yeah, I have zero cursor rules as of today. I mean, there should be probably one, which would be anytime you're using Python, activate the local UVVM. I do have that one, yeah. Because I constantly am typing that. But I don't know. I like my zero cursor rule workflow because half the time now the VM that I want is actually in some other folder because like it used to be painful if you're in a current repo to like navigate over somewhere else where I've installed a bunch of, you know, multi-gigabytes, you know, Kuta modules from Python.

Starting point is 00:42:52 But now I'll just be like, go like go get the VM from that folder and it takes, you know, the time it takes to to type that folder, right? And most of them are just like sitting in home, right? So it's just tilda slash the name of that folder. I don't have to do all the CDing, you know, source dot, activate, whatever the command is. And that's the funny thing, too, is, you know, I hear, maybe we should, because we're at the 46 minute. Let's end of episode one, start of episode two. Be sure to check these show notes, either in your podcast app or at ADSP thepodcast.com for links to anything we mentioned in today's episode, as well as a link to a get up discussion where you can leave thoughts, comments, and questions.

Starting point is 00:43:29 Thanks for listening. we hope you enjoyed and have a great day. Low quality, high quantity. That is the tagline of our podcast. It's not the tagline. Our tagline is chaos with sprinkles of information.

Algorithms + Data Structures = Programs - Episode 277: High on AI Update

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.