The Standup with ThePrimeagen - What even is an AI Agent?!

Starting point is 00:00:00 Guys, have you guys checked out the hottest new site, balls. Dot yoga? What? Yeah, I have checked it out. Adam, check it out. Pals. Yoga. Check it out. Oh, my God.

Starting point is 00:00:10 What is this? It's so hot right now. So hot. This is you guys. What? In the world. I just sent in the chat the, we made an ad for Bolt further hackathon this month that we were doing. And it involved Prime being devastated that he was no longer able to make balls.

Starting point is 00:00:33 balls. yoga, which we just casually mentioned in the ad, but then we actually made it. So that there would be an Easter egg. Can you believe balls. Not yoga was available to buy though? Adam, are you okay? He's watching, and he's watching the ad. Yeah. I know.

Starting point is 00:00:49 Your face looked so disappointed. Oh, did it? I was just reading Twitter. I'm sorry, you guys sent me to Twitter and then I found that there's some stuff on Twitter. Ooh, the drama. I just want to read it so badly now. like let's just read it on stream i don't know if it belongs on your podcast like i don't think that's

Starting point is 00:01:08 necessarily where it goes bring it up can we give a proper intro to the episode before we start yeah yeah yeah yeah yeah yeah yeah yeah uh anyway sorry go ahead prime hey today we are having on what i would consider two experts adam and dax uh adam is currently looking over looking very disturbed i'm so sorry josh zoom in Zoom in, John. But they have been building out something called OpenCode, which is going to be an agent for your terminal. And the experience looks immaculate. Adam, you've done a fantastic job. Dax, I don't know what you do.

Starting point is 00:01:45 But it looks very, very, very good. And, of course, on this podcast, as always, is Teage. Teage, hey, he might be Steve Jobs, telescopic Johnson. I don't know what your full name is these days. You keep getting new ones. And so, yeah, today on the standup, we're going to be talking about how. how to build an agent, but not only that, some things about OpenCode itself, which hopefully you'll see a bunch of this on the stream coming up because this is really our kind of like our

Starting point is 00:02:11 area that we all seem to enjoy for whatever reason. All four people on this podcast use NeoVim, by the way. And so you can- That's how we started our company. That's actually why we started our company on ironically. It's the first time someone got hired because of Neovim. It got two, a double, two for one. But I mean, the reality is that this type of agent is built for people who,

Starting point is 00:02:31 want, I assume the more command line kind of experience, the ability to have the faster, more creamy experience that way. And so they're going to just kind of walk us through what it takes to actually build an agent because if you're not familiar, you probably read 1,000 AI articles at this point. That's like, yeah, spent an afternoon, I've built an agent. It's easy. You don't even need something like cursor. It's so simple, right?

Starting point is 00:02:49 Like you read this thing over and over again. You're just like, well, then what are people working on? It has to be harder, right? And so we would love to have someone on who's actually done it and can walk us through what it actually takes. Hackathon's over. But that doesn't mean the content is. Join us on July 26th at 10 a.m. PST on Prime Stream to live react to the award ceremony.

Starting point is 00:03:11 See what everybody built and we'll see you there. It's okay. You've worked at Netflix. You have everything they want. Anyway, so that's been a little frustrating, but on the other hand, we're really excited about the direction things are going in. I think we've been, I think the genesis of this is like there's been a lot of really cool tools out there. like cursor. A lot of people like a lot of people use a bunch but I don't want to give up neovim. Like I like working in the terminal. I don't want to switch my whole ID. So we're excited about seeing okay, what can we do for people like us? Like what what's the best possible thing we can put together that's complimentary to

Starting point is 00:03:48 whatever ID that you choose to use? Nice. Okay. Well that's some good background info because we're going to be mentioning open code a bunch probably while we're talking about this. So that clears up what where people should go when you guys are talking about that at least. So let's get to, what is an agent, Dax? Everyone's been asking.

Starting point is 00:04:10 I gotta be honest, I don't really know. Adam, do you want to take this? What's your concept of an agent? Well, I think the term is very overloaded, right? So outside of programming, it means a lot of things, probably. Really? Programming. Yeah.

Starting point is 00:04:25 James Bond. I meant within the AI bubble. So I guess there's like, in the programming world, It's just this idea of calling an LLM, giving it a bunch of tools that are related to programming. So editing, reading files in your code base. And then just like looping and letting it keep calling these tools, making changes, and then coming back with some kind of response. That's kind of the basics, like the formula for an agent. If you look at like cursor has an agentic mode, but then there's other tools in the terminal like ClaudeCode.

Starting point is 00:05:02 I think all the tools that fit into this category kind of have that backbone. And in particular, the looping is really what we like, the looping and the tools, right? I think like we can say agent equals LLM plus tools. Is that fine? Plus loops? Mm-hmm.

Starting point is 00:05:21 Plus some sort of system prompt that keeps it into, like there's some, I assume there's some sort of secret sauce to getting a good prompt that helps the agent understand that it needs to go through a bunch of files and kind of understand the code base. Is that what it is also? Or understand the task at hand? Yeah, and each of the models kind of like,

Starting point is 00:05:40 they respond differently to the system prompts. So there is like a bit of that, but it's not like a super sensitive secret. I mean, there's, you can, you can kind of see all the system prompts out there. I don't really know that any tool can like differentiate that hard on those fronts. Like the tools are pretty like set to the model at this point.

Starting point is 00:06:02 Claude expects certain tools, at least today, maybe someday all the models will be better at calling tools generically. But right now, like, it's really kind of like all these, all these agents have the same set of tools, the same roughly system prompts. It's all the stuff around it that I think we're focused on, like open code, having kind of like the share page, you can share sessions with people and you can kind of see the full history of a session and just having a really great Tui experience. We're going to build a mobile client so you can kind of like step away from your machine and continue, you know, conversations while you're on the toilet or whatever.

Starting point is 00:06:41 Yeah, I think like there's a lot of stuff around the actual agent that makes for a good experience in terms of programming with these things. And I think Dax and I have both done a lot of programming with these things. So we have a lot of opinions. Yeah. I think we're definitely like really firmly in the product zone. It's like less about pushing the. LLM capabilities or like finding clever ways to use the LM itself for making it perform better.

Starting point is 00:07:06 There's some of that, but most of it is just packaging it in a way that's nice to use. That's obviously very fun because when you're working a product that you yourself use every day, you're just like, oh, I wish this was like this, I wish this was easier. So yeah, it's nice to be able to iterate on that. And then some of the mobile stuff, so I think one differentiator with us is we're really in on this idea of the agent running on your machine with access to your stuff because that's where you presumably your dev environment works the best presumably you have everything set up some of these uh tools like codex or like devin they work remotely and they run in the cloud which can work but you need to

Starting point is 00:07:47 recreate your perfect environment in the cloud which some companies are disciplined and have that but nix fixes this right yes nix does fix it but you know how many people use nix so until everyone's using next several yes uh realistically practically most people's workable their environment is local so we want to do stuff like oh we want to have a mobile app that's just going to be connecting to a session that's you leave your laptop running and you step away go on a walk you can like get notification saying how the agent's done with this give a feedback but it's also just running on your laptop so you don't have to go set up this crazy cloud thing have you have you successfully been able to actually you know do a day of prompting

Starting point is 00:08:28 while out at the grocery store doing your own thing, just hitting some prompts from your mobile phone? No, we haven't built the mobile client yet. It's something I desperately want. I don't care if anybody else uses it, to be honest. There's just so many times at this season of life, I'm 38 years old, I have two kids. I'm out of my office a lot.

Starting point is 00:08:46 Like I step out 15 times a day. And to be able to like go on a walk and then when it's done, get a notification, you know, it's doing its loop and it needs input. But to be able to, like, continue that conversation without having to be sitting at my desk, it's just I want it very badly. And I think it turns out other people want it. We've seen some mentions on Twitter, cursors even designing a mobile app. So, yeah, it doesn't exist yet, but I'm very excited to have it. Okay.

Starting point is 00:09:14 So then kind of walk us through what it takes to actually build an agent in some sort of real capacity, not just a weekend warrior project. but like what what like why did it take you guys so long to release yours obviously you put a lot of work into it so it must be more than just a weekend project yeah so i think one dynamic for us is we're not coupled to a single AI provider so we support anthropic google open night we actually support like the full list of everything's available because we're built on a library that covers most of them what about like llama can i run it locally yeah so one of the i haven't fully tested this out yet but one of the providers, there's like provider support. One of the providers is for like a local model.

Starting point is 00:09:58 So one, just testing this stuff across everything, seeing which models actually work well for this. The reality is, is right now very few perform well. We all use Anthropic and Claude Sonic for as our defaults because it is it is the best. Some of us aren't poor and we use Opus, but sure. You can use Sonnet, that's fine. I use Sonic because I'm not paying.

Starting point is 00:10:20 Of all the time. Holy damn. He's like, I need a mobile app because I'm going to be doing this while I'm eating vegan sushi. It's literally five times expensive, I think. So, yeah, it's quite a difference. I like it more because it's expensive. Like, that's my thing. That is a very expensive.

Starting point is 00:10:39 That is a very adamantor. It's like a Gucci bag or something. Right. You're like, I get the same code out. It doesn't matter at all. The models are all the same. But mine was made with Opus. I do have a quick question, Mr. Gucci over there.

Starting point is 00:10:51 I noticed that you're not. subbed on my channel and so it's just like you're talking about spending all this money but you don't even have five dollars a month it hours of entertainment hang on hang on how do i do it how do i do it five i was at one time i haven't been on twitch in months i swear to you oh so before it wasn't even scheduled because you can set it up to auto renew adam a while i think it was it was it was at one time maybe it's a new i got a new card new credit card you know that happens and then it just no it It's crazy. You guys make Adam do all the work and you make him pay you.

Starting point is 00:11:26 He's getting the privilege of working with us. Dex. It's funny. I did see an ad a second ago. I've got Twitch chat up here. And I was like, why am I seeing an ad? What is this on Twitch?

Starting point is 00:11:38 That's why. I'm so tilted right now. I just, I'm sorry, it's so hard to focus. I've seen, like, in Twitch chat, people are confused about the open code AI slash open code.

Starting point is 00:11:49 It is super confusing. I'm so, upset about this. It's so like frustrating and annoying. And I want to just like focus on our conversation, but it's just very annoying. Do you need a Snickers? Yeah. I don't think that's vegan friendly.

Starting point is 00:12:04 I know. That's what he needs though. Okay. Well, all right. So I mean, I want to keep going on this agent thing. So you're just talking about integration and tools all this. I would assume that do you have to like do you have to like bespokely craft every prompts to be able to use every tool or does it? it just simply kind of work out of the box. You're just like, hey, you should search for this thing now.

Starting point is 00:12:26 Or does it tell it like, does it use reasoning in the sense that you're like, hey, what do you do? And it's like, this is what I should do. And you're like, okay, now follow step one. What do you do with follow step one? Like how does it start using these tools? So roughly what the model is good at is you give it a description of the tools and the list of tools and you give the task and it's very good at going to loop.

Starting point is 00:12:44 It's like, I'm going to figure out what to do. I'm going to tell you what tools to call, call them, give the results. I'm going to like do that. So it's, I think some models where this works. that does work really well. There is optimization, though, there always is. So the task tool descriptions, like the description of the East Task School, like how do you describe it?

Starting point is 00:13:01 Like, what's the schema for the input? Like, is that confusing? Does you get tripped up on things here and there? So what we do is for, say, something like Anthropic, you know, they have their own version of this called Claude Code. We just dump everything out from Cloud Code. And when you're using Anthropic, we use all the exact same task or the tool descriptions and stuff.

Starting point is 00:13:20 So on one hand, I'll say it does make a difference. If you just do a very naive approach, you're probably going to get worse results. I personally don't think it's like the thing that makes something 10x better. I think it's a thing that you can kind of play with and optimize. And also the end goal is like you don't really want it to have to be that precise because you can bring your own tools. You can say here's a tool to access my database. Here's a tool to do XYZ things. It needs to be able to be flexible.

Starting point is 00:13:47 So we think it'll just kind of get better along those lines anyway. Can you can you give an example of like how you hooked it up to like LSP for example? Because I feel like that will be like kind of illuminating, but like a particular example of how that, how it gets access to that. And how it even knows to call that also? Yep. So with LSP, so we experimented with a few different approaches to this. But the one that we're sticking with for now is whenever it makes an edit to the file, the response, we have a tool that's like edit file tool. Send us like a patch or like a, it's like old string and neutral.

Starting point is 00:14:20 string in the file where we'll place it. That's how it edits files. When it does that, the response to that tool will include any diagnostics that we found from LSP. So if it's a TypeScript file, where we had the TypeScript LSP running, we know that after this edit, there's these three errors. We say, here are the errors, please fix. And you'll see it instantly respond with a fix.

Starting point is 00:14:40 And this really helps hallucinations because when it, like, thinks that there's functions that don't exist on a library or any of those things, it correct itself right. way. And it's, it's quite good at responding to that feedback. So wait, so how does it, how does it kick all this stuff off? Because does that mean you actually run the LSP's yourself? Or do you, okay, so that means whenever I open a project, if I have open code plus VIM open, I might have two TypeScript servers. I may have two go pleases or whatever they're called and yeah, all the other ones. Okay. Yeah. So we originally were looking to see can we like hijack any running ones, because maybe you already have VIM open and you already have these LSP's configured, but you can't,

Starting point is 00:15:21 because they're standard input. T's big shaking. Big no. That's a big no for me, dog. That's a big no. So we run our stuff in parallel. And there's also no configuration. Like we have we ship out of the box support for a bunch of things and everything just downloads and runs.

Starting point is 00:15:37 Because you don't need like absolute precision for your exact LSP configuration that you like. It's more just giving the LM something that something to work with. And then you said that's like at least for the way you guys have it. It's part of the edit file thing. So it's like, okay, I know that I need to go and like edit. Does it also happen to feel like move a file or delete file? Like does it happen in every kind of like file? Because like I just interested to know how you put LSP into everything.

Starting point is 00:16:05 Or is it just like edits? That's a good point. Like we probably should give it information whenever it changes. All right. All right. Okay. And the right. Nice.

Starting point is 00:16:15 Open source contributor. Let's go Tj.J. Needs to do LSP diagnostics on file add. delete boom we do it on right so when you create a file we do show the diagnostics to the tool or we return the diagnostics so edit and write we're already returning them i guess a move i didn't i don't know we don't have yeah we don't often see i guess i haven't done much with it where it's running like a bash move command but yeah i think we also we also considered and we have this in there which have disabled for now there's a tool that just returns diagnostic so it can choose hey i want to look at

Starting point is 00:16:52 diagnostics right now and it can query for them. And we have seen it use that a bunch, but it wasn't well tested. So we're not shipping with that right now. Yeah, the, the models today, they're still very like tuned to call specific tools. Like we've played with a lot of tools and you can hand it a bunch of tools that's never seen before and it just, it doesn't call them. There's something to being like the post training process being catered to certain sets of tools.

Starting point is 00:17:17 So Anthropic is really a cloud for cloud three seven before that. those models are the best at calling tools from a programming standpoint. They'll actually keep trying and going for it. Other models can be really smart like Gemini 2.5, but it doesn't really, it doesn't call tools very eagerly. So there is still like this phase we're in right now where you kind of have to like provide the set of tools that the model expects.

Starting point is 00:17:41 I don't think that'll always be the case. But we've definitely given it a bunch of LSP tools. I've played with, you know, giving it go to definition and find references, things like that. And it just doesn't use them. I mean, You can get it to use them if you ask it to, but it doesn't like, it doesn't default to kind of thinking that way.

Starting point is 00:17:58 I think that'll change. So, like, how do you set it up in such a way to know to use that tool? Like if you say, hey, use Fides, find references. How does it know when to use it? Or because isn't this kind of like a prompt skill issue going on here where it's just like you don't have a comprehensive enough system prompt for it to be able to follow? Is that, is that what the system? I mean, it is a system prompt. That's if you look like at the cloud code system prompt, there's a lot of like specific tools called out.

Starting point is 00:18:26 So use the to do tool to set up your plan before you start executing. That is how you kind of like massage the model into using tools that it maybe doesn't have any awareness of. It's just, yeah, there is kind of like a finite number of tools before you kind of hit diminishing returns today. It just doesn't, it doesn't take advantage of all of them. But it gets really good at using the handful that it uses. and I think it's in a good spot right now. I mean, it's very effective. I think these agents today are

Starting point is 00:18:54 worth using and leveraging. But I do think it'll get better. I think more models will get better at calling tools and we'll have other options. So I've never built a model, so I just have kind of further up questions. But here, Dax, why don't you go?

Starting point is 00:19:07 Because then I just still want to keep on this. So one last thing on that. I just totally blanked out what I was going to say. It's okay. Thanks for interrupting him. It's prime. I don't know if I interrupted him. We kind of like race towards it.

Starting point is 00:19:22 I mean, Adam is still so upset about Twitter right now. I remember. I remember. I'm going to go. So one of the missing pieces here is you can add such as a system prompt and say, hey, use this tool. Max, were you waiting for Cluelly to load or something? What's going on?

Starting point is 00:19:39 Yeah, it was buffering. I need my cheat sheet. One day. It's hard. hard to tell right now whether when you're making something better, are you making something else worse? Because this L-LM is such a black box. It's not like a deterministic system they can see the inside of. So what is missing, and to me, the next major thing we work on is a set of consistent benchmarks that we can run whenever we make changes to system prompts. And these

Starting point is 00:20:08 benchmarks are probably not going to be very quantitative. We're thinking that we're going to come up with a very real-world looking code base. We'll have a bunch of like features that we needed to implement. And we'll have a bunch of standard prompts. have it do that. And we have this nice, like, way to see for anything we do, like, every single thing that happened, diagnostics-wise. So when we make a change, we can run it through this and see, like, the output and evaluate it somewhat qualitatively.

Starting point is 00:20:32 So it's still going to be a qualitative thing, like, did it get better or worse? But at least we have a consistent, we'll have a consistent set of things we're running so we know. Right now we're kind of flying in the dark. We're like, yeah, this made it better, but I don't know if, like, some other person is having a horrible experience because of this change. So you're saying now you're doing not only vibe coding, but Vibe coding, but Vibe Vib testing, Dax.

Starting point is 00:20:50 Vibed testing, vibe benchmarking. Yeah. It's really badly needed because there's so many benchmarks for the models, which everyone has opinions on. There's not really a benchmark. Well, I don't know. They're kind of worthless, I guess. But why?

Starting point is 00:21:07 Why are they worth those items? Say why. Why are the model? I mean, a lot of strong opinions. The model benchmarks, they just don't seem to correlate with like actual real world. But why? I mean, why?

Starting point is 00:21:17 I don't know. because I guess because they're training to like hit the benchmarks are you guys like quizzing me what is going on here no I'm trying to get you say what's wrong with the arch aGI benchmarks Adam what's wrong with the arch aegee i benchmarking i'm trying to get you to say because they're all benchmarking python oh well that's a thing yeah they're not all doing it but the major the sui bench benchmark is literally just python which blew my mind when I learned that uh yeah there's not like a benchmark today for there's like a a dozen agentic coding assistance. And there's no benchmark that says, like, given the same prompts and the same code base,

Starting point is 00:21:54 here's like, it's part of it's like, it's qualitative. But here's the one that did the best job. It did it the cheapest. It did it the most effectively. You had to have kind of like a grading system and there'd have to be humans in that process. And speeds another factor too, like how fasted it that even do this thing. Someone did this funny thing. Someone did me to something yesterday.

Starting point is 00:22:13 I was kind of interesting. They told the agent to like write a book. remember exactly what it was it was something where it would definitely like run in a loop for a long time and he uh checked like how many steps would i think the highest one which was claude uh did 187 steps like it's the lm tool call to the lm tool call before it before it finally stopped uh so that's a good way to like rank it to see like which one's the most persistent all these models are not very like they give up very easily they run to an issue just like humans yeah just like just like humans.

Starting point is 00:22:45 AGI has really been achieved internally. Like, I'm giving up on this. They're going to just ask me again later. It's fine. Yeah. All right. I got real questions here.

Starting point is 00:22:55 I also agree RKGI sucks. Okay. But now that we got that out of the way, with that stated, so, like, how do you start a loop? And how do you determine that a loop's done like?

Starting point is 00:23:06 Because like, I know you're going to say, here's what the user said. You have some sort of system problem to say, hey, you can accomplish this task via the, tools, whatever it says. How do you say, okay, I either repromp the system again to execute step one of a several instruction plan. Do you have to like manually parse it out? How do you like, what's

Starting point is 00:23:26 the interaction between the model and this like start this looping process? Because that's what's kind of confusing to me since I've never actually built the agent itself is that I imagine that there's some, you have to do some weird parsing. What's cool is you don't have to do anything because this is the responsibility of the of the model. So when you make a call to an out, it'll generate a bunch of text and it'll stop, right? There's reasons for why it stops. Sometimes it stops because, oh, I'm done generating text. I'm done with everything I need.

Starting point is 00:23:55 So that's the stop reason. But it could also stop because I can't continue until you execute these tools. So you don't have to figure out when to interrupt it and like do something else. It'll tell you. It'll stop and say call these tools and then continue with the responses to the tools. That's baked into the models. So that part's pretty easy. very easy to build like that loop.

Starting point is 00:24:16 So that's not even a loop then. We keep asking until it's done. Eventually it's like a wild true break if if the stop reason is done. It's part of the request that you literally send like to Claude is like, you know, I have these tools available. They can do these different things. And then the like Anthropic is the one, right?

Starting point is 00:24:39 If I'm understood, from what I've seen before, you literally send them the tool definitions. And then the model says, I'm going to run this thing. Then you're like, cool, you said I'm going to run that thing? Do do, do, do I run it? I send it back to Anthropic. And they said it just like you can do a conversation. Oh, okay.

Starting point is 00:24:54 So it's not you going, if you were to execute, what would you do? And it's like, I would do these things. And you're like, okay, I'm going to post on all that things. I'm going to execute it. Then resend effectively the whole previous conversation. Plus the current one's like, what's next, bro? Because that seems like it would just do this forever because it always comes up with new stuff. Like you and Devin, basically.

Starting point is 00:25:11 Exactly. me and devon are tight yeah yeah so it's actually it's very easy that's why building like a basic prototype of an agent yeah is like a weekend weekend project yeah because you just would define like edit file as like a tool and then you'd give that to the model and then you'd define like some other simple ones that you would need to do stuff locally search for files i don't know i don't know what other ones are like the normal ones yeah and that's that grip yeah yeah yeah fine grab said it runs scrap yeah it's got a bash tool like where they can actually run

Starting point is 00:25:47 bash commands that one's always good because it's like oh i'll let you run any bash command and then it just runs anything that's a really smart that's actually my biggest worry is like how okay so how do you protect against destructive operations do you just simply say hey user you must like approve of said operations because how do you know it's not going to like rmrf root no preserve so there's a lot of different approaches here most Most of these agentic coding assistants have a permissions model where basically, you know, certain

Starting point is 00:26:17 tools have to be granted permission. Most of them also have like a full auto mode where you can bypass all those things. If you know you're in a sandbox or whatever. And then there's like there's approaches with sandboxing. So like codec, CLA by OpenAI. They've got, they use like the Apple. I don't remember what seatbelt. It's some kind of a sandboxing thing.

Starting point is 00:26:37 Same with Linux has a sandboxing thing. where it basically constrains it to work only in this directory, like in the project directory, and then some network constraints as well. Yeah, some people have, some people will run cloud code in a Docker container. Dark containers aren't like a perfect, like security-wise,

Starting point is 00:26:55 they're not like a perfect sandbox, but they're pretty good, practically speaking. We're not thinking about that stuff too much right now because we need to build something that's fun to use first, and then we'll figure out how to, like, layer in some of these sandboxing things. sorry that was just such a fun statement okay so does that mean are you saying that windows was always right asking for permission for everything including deleting files

Starting point is 00:27:18 that's what open code does right now right so yes yes windows was correct just say yeah dax you got it yeah dax stop reading twitter i know you're upset i know you guys are upset about the twitter thing we can i have so many questions so i didn't even know that there was already this agentic loop effectively that exists for it.

Starting point is 00:27:42 That's what's inside of cursor. Well, yeah. I mean, I know that the cursor has it, but I thought that you wrote the loop, not that these, these models effectively can kind of break and say, hey, I need more information before continuing. I mean, you still do have a loop on your side because if it's a wild loop, it's still a loop where you're having to keep calling the LLM with all the messages until it tells you to stop, basically.

Starting point is 00:28:07 Yeah. And there's some tricks here too, right? There's like, there's obviously a limited context window on a lot of these models. Sometimes they're like pretty tight. So if that loop is going and a context window is starting to approach the limit, we will pause for a second, take the whole history, send it to LM, ask it to summarize,

Starting point is 00:28:30 and then we'll like continue the loop just with a summary. Is that dangerous? In what way? I don't know. I mean, dangerous in the sense that the moment you asked it to summarize, you do effectively a compression algorithm on top of it, a very lossy one.

Starting point is 00:28:45 Like how far does it get off? You get like warnings about it? Does Open Code give you like a warning? Like, hey, we have to do a lossy compression on your history. So therefore it may start going haywire. We have that information to show you, but it works so well that like it's kind of.

Starting point is 00:29:04 Okay. Yeah, like the experience feels like you have infinite context. effectively. Most of the tokens that it's taken up are like, here's the stack trace from one thing that I fixed eight minutes ago. Yeah. I mean, really a better practice is just to keep creating new sessions and to not let any

Starting point is 00:29:21 context window grow very far. Summary is like compacting it is basically doing that. It's basically forcing you to start a new session, but it's giving it a little bit of context, I guess, in the form of the summary. But it's generally like you don't want to keep a long context window filled up with a session that's,

Starting point is 00:29:37 you know, five messages long. It's like, it's getting less and less effective over time. Yeah. And most, I mean, if you think about like how much of the actual tokens you're typing in,

Starting point is 00:29:46 telling it and giving it direction, it's very little compared to how much it's like getting from reading file or like a, oh, I'm going to change this file. It was 250 lines long and I changed a bunch of the lines. Huge diff. Tons of tokens. We don't need that now.

Starting point is 00:30:02 That was an hour ago. Yeah, the effectiveness of the model is all tied to like, signal to noise. How much signal can you give it without noise? And the longer your session goes, just obviously there's going to be more noise. Make a new session for different things. That's what we're saying, right? So if I'm going to do a new task, I should start a new session. Yep. And sometimes, yeah, sometimes you want to do things in parallel. It's another reason to start a new session. You like kick off one thing, new session, kick off another thing, new session.

Starting point is 00:30:30 I found my brain could only handle so many of those, but people do like paralyzing a lot. So could you have an agent that spawns sessions? So it is actually the compression algorithm for your brain. Sub-agent. It's all the rage right now. Teage is on top of it. Oh, I know. I know everything there is to know about agents.

Starting point is 00:30:50 I came super prepared. I started drama on X.com. The everything application has been seating false websites in the chat. I have 17 misinformation bots in the chat right now, spreading fake news about open code. Nice. The subagents are an interesting concept. I think they should be used surgically. I think there's a trap of like having just sub agents that are general purpose just to like parallelize work because we're not like open code at least is not intending to be an asynchronous agent.

Starting point is 00:31:25 Like you are in the loop on purpose. Like we want you as a programmer able to interact with the agent and to steer it. in the right directions. I think if you get into like eight subagents doing all the work, there's very little visibility into what's happening. You lose the permissions model, like the ability to inspect what's going on. You just can't like your brain can't handle those eight subagents going in parallel.

Starting point is 00:31:47 So now you've kind of got like a background agent, right? You've got jewels or whatever else. And that's a different thing than what we're trying to create, which is very much human in the loop, human guiding the agent. So I think subagents like using them, so there's a task agent in clog code. or a search agent.

Starting point is 00:32:04 It basically is just a read-only agent that can search through files and limit the amount of context noise in your main context window. That one makes a lot of sense. I think like a code review or a planning agent that's a sub-agent, I think those can make sense. We're going to play with a lot of that stuff. But I think like just general purpose sub-agent that does everything and then you're just trying to parallelize all the tasks.

Starting point is 00:32:26 I don't know. I think that's kind of a trap. Can we get one that's like a zoomer version of it so that then the rest of the time it responds like a zoomer. I want Tanner in open code. You know what I'm saying? Like I want to write my prompt and then it turns it into Tanner and Tanner reads everything off to me. Hey guys, it's me Tanner here. Just letting you know we're about to edit the files. Okay, sweet. That's what I think. That's what I think you guys are missing. Like if you're going to differentiate on product, that's what you, oh, prime. This is way better.

Starting point is 00:32:53 Yours is way better. I'm a way better Tanner do. I don't know who Tanner is. I'm not going to lie. I was trying to laugh. It's an AI voice on Prime channel. Yeah. Oh, gotcha. Yeah, yeah. I'm out. It is not watching Twitch.

Starting point is 00:33:06 I'm sorry, guys. I love you guys. I don't watch Twitch. Someone gifted you a sub, loser. Yes. A senior variable would be the first name. Adam, let me be honest.

Starting point is 00:33:16 Let me be honest with you. And even though this is in public, I'm going to be very frank with you. If you were watching Twitch, I'd be really mad because I'd be wondering what the heck, who's working on Terminal. Dot shop right now. Yeah, exactly.

Starting point is 00:33:26 I've got three jobs. I can't be watching Twitch. You do not have time to be watching Twitch. Adam. You need to be building terminal. Dot shop. And as your co-founder, I reject the notion

Starting point is 00:33:39 of Adam on Twitch. Yes. If you knew who Tanner was, I'd be disappointed. It was a test all along. True. We're putting the individual, an individual contributor.

Starting point is 00:33:51 Man, I love that line. So that one. It was very good. Adam's building open code so that he can have it work on terminal. This is all like a straightforward plan. Oh, so we're investing.

Starting point is 00:34:03 We're sharpening the axe. Nice. We're big fans of sharpening the axe around here. Yeah. That's really smart. It's going to be so sharp when he's done with it in a few months. It's going to be so sharp. Oh, man.

Starting point is 00:34:17 Okay. So it really, that's what it takes to build an angel. What is the part that you didn't realize was going to be like the hardest when building an agent? Like, what's the thing that caught you most off guard? You know, it's just like every product. It's the 80%. is easy and you get into all the edge cases. It's like, what happens when someone cancels the loop and it's like in the middle of a tool

Starting point is 00:34:38 call? Like, it's that handled gracefully. What happens when it like hits the context window but didn't get a chance to summarize? Like there's just infinite of these like weird cases and it's multiplied by the fact that again, these things are not very deterministic. So discovering them is quite hard. But yeah, it's just to me, I think for me it's been all the edge cases. Adam's been mostly working on UI stuff.

Starting point is 00:34:59 Yeah, the twois are just hard. every time I work on a 2E. Would you say it's the way that Charm does the rendering that's hard? Or is it the way that Charm takes over the open code repo and release it that's hard? Like what would you say? Tewis are hard. Charms him on both sides right now. That's what I'll say.

Starting point is 00:35:20 Multi-front. It's like, that is not okay. Do not associate. I could be literally anybody. It could be someone else over here. I am not part of this podcast at all. That's Miami, TJ. I'm missing your chain, Dax.

Starting point is 00:35:37 That's what I need to complete it. It really changes to look at the turtleneck when you throw a chain on. It's so different. The turtleneck does become way, way different. Here's actually the tough. I have one tough question for Adam. Does watching Twitch make you not a vegan because you're consuming something made by animals? Aminals

Starting point is 00:36:01 I heard that Wow That was unexpected I love it Yeah Does using electricity Mean you're not Evasion because you're consuming

Starting point is 00:36:16 Gasol fuels You hate the environment Why'd you agree to come on? I was surprised Yeah Okay okay Okay we could probably do one more real question But is the towee hard

Starting point is 00:36:28 because of the way, because I know what we've worked on charm together and it has this thing where it has to render from top to bottom so it makes like modals and various things difficult. Like have you got, because I know we talked about this, were you considering actually still building like the non-moat like the or the non top to bottom style rendering that is currently, is it bubble tea? Is that the name of it? Yeah. You're wanting to change it out to something that's more like you have a scene and you layer it. I like bubble tea. I, yeah, it's very good. I've grown to really enjoy it. I enjoy it. I enjoy go more all the time. I kind of hated Go at first. Suck on a Twitter.

Starting point is 00:37:03 Suck on that. I'm actually enjoying Go quite a bit. It's nothing about the actual like developer. You can learn to love anything. Yeah, you can learn. Yeah, sure. If we spent six months telling you you had to write something in this and that it was good, Adam, you would say Rust is good.

Starting point is 00:37:18 Like, this is not a strong endorse. Do not over index on this. The thing that makes Tui's hard is like coming from being a web developer. I can't, There's so many more constraints, and that's nice, sure, in a lot of ways, like designing a to-y. It's kind of fun because you have all these constraints. But it also just sucks when I can't just like let the user do things you can just do on the web. And this vertical space is so constrained.

Starting point is 00:37:42 Like you're so just like you're trying to like minimize absolutely every pixel. It's just stressful. It's just like getting all the little things right and making a good to-e experience. It just kind of like stretches all your brain cells. It just sucks. And overlays are hard. Overleys are hard, yeah Okay, see, TJ, that was a real question

Starting point is 00:38:02 TJ thought I was going to make another joke I wasn't, I was in there because I'm actually curious about it because I wrote when I did when I did Flappy Bird I actually gave up using that and just wrote my own scene and did Z indexing because I wanted to be able to do layering and it's just easier

Starting point is 00:38:17 but then it takes care of all the colors and the spacing and how you do that is actually a pretty complicated subject at the end of the day Yeah, I was just imagining you asking exactly the same question. Nah, that joke's only funny ones. Oh, man.

Starting point is 00:38:35 Okay, well, anything else you guys want to let us know about agents or open code? We should, we said on the podcast, it's SST slash open code. I've been pasting it in the chat. I know. Links will be in the description. And even maybe in a pin comment, if you're lucky, Adam. Oh, that'd be amazing. SST slash open code.

Starting point is 00:38:51 I'm just already prepared for all the YouTube comments that are like, wait, it's not open code AI slash open code? That's not the one? Well, now that you said that, you've just guaranteed 90% of the comments will be the wrong link, Adam. Thanks, Adam. So that's, so we, maybe we'll help you out. Maybe we'll cut that part from the final thing. But I will post the links.

Starting point is 00:39:16 I'll make sure the real links are there so that people can find it and test it out. Yeah, thanks for asking you guys. I still have no idea how to build an agent, but I really appreciate the talk. It makes me feel better. That's okay. We're going to ask open code to build us an agent next week. Yeah, there you go. Or later.

Starting point is 00:39:36 It's going to write one loop. Yeah. It'll be like, you'll just see it. It'll be super obvious. It's the same thing as sending completion requests. It's just tool requests. That's it. Okay.

Starting point is 00:39:46 And then put it in a loop until it's done. Make it easy. Yeah. Okay. Cool. Okay. Okay. Okay.

Starting point is 00:39:52 Well, thanks Dax. Thanks Adam. Thanks everyone for watching and listening. it's been another episode of the stand-up hope that you enjoyed bye bye bood up the day

Starting point is 00:40:01 vibe code and errors on my screen terminal coffee in hand

The Standup with ThePrimeagen - What even is an AI Agent?!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.