The Standup with ThePrimeagen - What even is an AI Agent?!
Episode Date: July 31, 2025Watch the https://bolt.new Reward Ceremony Live! July 26th 10AM PST on https://www.twitch.tv/theprimeagen #sponsored https://balls.yoga 📌 Chapters: 00:00:00 - balls.yoga, drama & intro 00:02:00 - ...Building OpenCode: terminal AI agents 00:03:00 - Bolt.new Reward Ceremony - July 26 10AM PST 00:03:18 - Agent architecture basics explained 00:04:08 - What is an AI Agent? 00:05:00 - Prompt engineering and tool calling 00:06:10 - OpenCode UX and mobile plans 00:07:27 - Why OpenCode runs agents locally 00:08:34 - Vision for mobile remote dev workflows 00:09:15 - The reality of build opencode 00:10:10 - Claude vs Opus vs others 00:13:32 - Tool usage & LSP integration 00:14:20 - LSP tool feedback loop 00:17:00 - Claude is best at calling tools 00:18:10 - Prompt tuning limitations and expectations 00:22:20 - Modeling agent loops with tool calls 00:24:40 - Why building a real agent isn’t just weekend work 00:26:20 - Security tradeoffs in early development 00:28:30 - Loop management & lossy compression 00:29:40 - Session design & managing context 00:30:40 - Parallel sessions and subagents 00:33:20 - Adam banned from Twitch by his cofounder 00:34:20 - OpenCode & Terminal.Shop development connection 00:35:00 - What surprised them building an agent 00:36:30 - Why TUIs are hard compared to web 00:37:30 - Layering, pixel constraints, UI struggles 00:38:30 - Final laughs & wrap-up 00:39:00 - Plug for OpenCode: sst.dev/opencode 00:40:00 - Outro: still no clue how agents work, but fun ride ⸻ Topics Covered: • AI agents in terminal environments • OpenCode’s CLI-first agent approach • Loop-based tool calling in LLMs • Integrating LSP feedback into agents • Prompt engineering & model tool awareness • Evaluating agents: benchmarks & metrics • Local vs remote dev environments • Designing for low-friction agent UX • Safety, permission models & sandboxing • The tradeoffs of TUIs vs web UIs • Session design, compression & memory limits • Humor, Twitch drama & voice AI characters
Transcript
Discussion (0)
Guys, have you guys checked out the hottest new site, balls.
Dot yoga?
What?
Yeah, I have checked it out.
Adam, check it out.
Pals. Yoga.
Check it out.
Oh, my God.
What is this?
It's so hot right now.
So hot.
This is you guys.
What?
In the world.
I just sent in the chat the, we made an ad for Bolt further hackathon this month that we were doing.
And it involved Prime being devastated that he was no longer able to make balls.
balls. yoga, which we just casually mentioned in the ad, but then we actually made it.
So that there would be an Easter egg.
Can you believe balls.
Not yoga was available to buy though?
Adam, are you okay?
He's watching, and he's watching the ad.
Yeah.
I know.
Your face looked so disappointed.
Oh, did it?
I was just reading Twitter.
I'm sorry, you guys sent me to Twitter and then I found that there's some stuff
on Twitter.
Ooh, the drama.
I just want to read it so badly now.
like let's just read it on stream i don't know if it belongs on your podcast like i don't think that's
necessarily where it goes bring it up can we give a proper intro to the episode before we start yeah yeah yeah yeah yeah yeah yeah yeah uh anyway
sorry go ahead prime hey today we are having on what i would consider two experts adam and dax uh adam is
currently looking over looking very disturbed i'm so sorry josh zoom in
Zoom in, John.
But they have been building out something called OpenCode, which is going to be an agent for your terminal.
And the experience looks immaculate.
Adam, you've done a fantastic job.
Dax, I don't know what you do.
But it looks very, very, very good.
And, of course, on this podcast, as always, is Teage.
Teage, hey, he might be Steve Jobs, telescopic Johnson.
I don't know what your full name is these days.
You keep getting new ones.
And so, yeah, today on the standup, we're going to be talking about how.
how to build an agent, but not only that, some things about OpenCode itself, which hopefully
you'll see a bunch of this on the stream coming up because this is really our kind of like our
area that we all seem to enjoy for whatever reason.
All four people on this podcast use NeoVim, by the way.
And so you can-
That's how we started our company.
That's actually why we started our company on ironically.
It's the first time someone got hired because of Neovim.
It got two, a double, two for one.
But I mean, the reality is that this type of agent is built for people who,
want, I assume the more command line kind of experience, the ability to have the faster,
more creamy experience that way.
And so they're going to just kind of walk us through what it takes to actually build an agent
because if you're not familiar, you probably read 1,000 AI articles at this point.
That's like, yeah, spent an afternoon, I've built an agent.
It's easy.
You don't even need something like cursor.
It's so simple, right?
Like you read this thing over and over again.
You're just like, well, then what are people working on?
It has to be harder, right?
And so we would love to have someone on who's actually done it and can walk us through
what it actually takes.
Hackathon's over.
But that doesn't mean the content is.
Join us on July 26th at 10 a.m. PST on Prime Stream to live react to the award ceremony.
See what everybody built and we'll see you there.
It's okay. You've worked at Netflix. You have everything they want.
Anyway, so that's been a little frustrating, but on the other hand, we're really excited about the direction things are going in.
I think we've been, I think the genesis of this is like there's been a lot of really cool tools out there.
like cursor. A lot of people like a lot of people use a bunch but I don't want to give up neovim.
Like I like working in the terminal. I don't want to switch my whole ID.
So we're excited about seeing okay, what can we do for people like us?
Like what what's the best possible thing we can put together that's complimentary to
whatever ID that you choose to use?
Nice.
Okay. Well that's some good background info because we're going to be mentioning open code a bunch
probably while we're talking about this.
So that clears up what where people should go when you guys are
talking about that at least.
So let's get to, what is an agent, Dax?
Everyone's been asking.
I gotta be honest, I don't really know.
Adam, do you want to take this?
What's your concept of an agent?
Well, I think the term is very overloaded, right?
So outside of programming, it means a lot of things, probably.
Really?
Programming.
Yeah.
James Bond.
I meant within the AI bubble.
So I guess there's like, in the programming world,
It's just this idea of calling an LLM, giving it a bunch of tools that are related to programming.
So editing, reading files in your code base.
And then just like looping and letting it keep calling these tools, making changes, and then coming back with some kind of response.
That's kind of the basics, like the formula for an agent.
If you look at like cursor has an agentic mode, but then there's other tools in the terminal like ClaudeCode.
I think all the tools that fit into this category
kind of have that backbone.
And in particular, the looping is really what we like,
the looping and the tools, right?
I think like we can say agent equals LLM plus tools.
Is that fine?
Plus loops?
Mm-hmm.
Plus some sort of system prompt that keeps it into,
like there's some,
I assume there's some sort of secret sauce to getting a good prompt
that helps the agent understand that it needs to go through a bunch of files
and kind of understand the code base.
Is that what it is also?
Or understand the task at hand?
Yeah, and each of the models kind of like,
they respond differently to the system prompts.
So there is like a bit of that,
but it's not like a super sensitive secret.
I mean, there's, you can,
you can kind of see all the system prompts out there.
I don't really know that any tool
can like differentiate that hard on those fronts.
Like the tools are pretty like set to the model at this point.
Claude expects certain tools, at least today, maybe someday all the models will be better at
calling tools generically. But right now, like, it's really kind of like all these, all these
agents have the same set of tools, the same roughly system prompts. It's all the stuff around
it that I think we're focused on, like open code, having kind of like the share page, you can share
sessions with people and you can kind of see the full history of a session and just having a really
great Tui experience.
We're going to build a mobile client so you can kind of like step away from your
machine and continue, you know, conversations while you're on the toilet or whatever.
Yeah, I think like there's a lot of stuff around the actual agent that makes for a good
experience in terms of programming with these things.
And I think Dax and I have both done a lot of programming with these things.
So we have a lot of opinions.
Yeah.
I think we're definitely like really firmly in the product zone.
It's like less about pushing the.
LLM capabilities or like finding clever ways to use the LM itself for making it perform better.
There's some of that, but most of it is just packaging it in a way that's nice to use.
That's obviously very fun because when you're working a product that you yourself use every day,
you're just like, oh, I wish this was like this, I wish this was easier.
So yeah, it's nice to be able to iterate on that.
And then some of the mobile stuff, so I think one differentiator with us is we're really in on this idea of the agent
running on your machine with access to your stuff because that's where you presumably your
dev environment works the best presumably you have everything set up some of these uh tools like
codex or like devin they work remotely and they run in the cloud which can work but you need to
recreate your perfect environment in the cloud which some companies are disciplined and have that but
nix fixes this right yes nix does fix it but you know how many people use nix so until everyone's
using next several yes uh realistically practically most people's workable
their environment is local so we want to do stuff like oh we want to have a mobile app
that's just going to be connecting to a session that's you leave your laptop running and you
step away go on a walk you can like get notification saying how the agent's done with this
give a feedback but it's also just running on your laptop so you don't have to go set up this
crazy cloud thing have you have you successfully been able to actually you know do a day of prompting
while out at the grocery store doing your own thing,
just hitting some prompts from your mobile phone?
No, we haven't built the mobile client yet.
It's something I desperately want.
I don't care if anybody else uses it, to be honest.
There's just so many times at this season of life,
I'm 38 years old, I have two kids.
I'm out of my office a lot.
Like I step out 15 times a day.
And to be able to like go on a walk and then when it's done,
get a notification, you know, it's doing its loop and it needs input.
But to be able to, like, continue that conversation without having to be sitting at my desk, it's just I want it very badly.
And I think it turns out other people want it.
We've seen some mentions on Twitter, cursors even designing a mobile app.
So, yeah, it doesn't exist yet, but I'm very excited to have it.
Okay.
So then kind of walk us through what it takes to actually build an agent in some sort of real capacity, not just a weekend warrior project.
but like what what like why did it take you guys so long to release yours obviously you put a lot of work
into it so it must be more than just a weekend project yeah so i think one dynamic for us is
we're not coupled to a single AI provider so we support anthropic google open night we actually
support like the full list of everything's available because we're built on a library that covers most
of them what about like llama can i run it locally yeah so one of the i haven't fully tested this out yet but
one of the providers, there's like provider support.
One of the providers is for like a local model.
So one, just testing this stuff across everything,
seeing which models actually work well for this.
The reality is, is right now very few perform well.
We all use Anthropic and Claude Sonic for as our defaults
because it is it is the best.
Some of us aren't poor and we use Opus, but sure.
You can use Sonnet, that's fine.
I use Sonic because I'm not paying.
Of all the time.
Holy damn.
He's like, I need a mobile app because I'm going to be doing this while I'm eating vegan sushi.
It's literally five times expensive, I think.
So, yeah, it's quite a difference.
I like it more because it's expensive.
Like, that's my thing.
That is a very expensive.
That is a very adamantor.
It's like a Gucci bag or something.
Right.
You're like, I get the same code out.
It doesn't matter at all.
The models are all the same.
But mine was made with Opus.
I do have a quick question, Mr. Gucci over there.
I noticed that you're not.
subbed on my channel and so it's just like you're talking about spending all this money but you don't
even have five dollars a month it hours of entertainment hang on hang on how do i do it how do i do it
five i was at one time i haven't been on twitch in months i swear to you oh so before it wasn't even
scheduled because you can set it up to auto renew adam a while i think it was it was it was at one time
maybe it's a new i got a new card new credit card you know that happens and then it just no it
It's crazy.
You guys make Adam do all the work and you make him pay you.
He's getting the privilege of working with us.
Dex.
It's funny.
I did see an ad a second ago.
I've got Twitch chat up here.
And I was like,
why am I seeing an ad?
What is this on Twitch?
That's why.
I'm so tilted right now.
I just,
I'm sorry,
it's so hard to focus.
I've seen,
like, in Twitch chat,
people are confused about the open code AI slash open code.
It is super confusing.
I'm so,
upset about this. It's so like
frustrating and annoying.
And I want to just like focus on our conversation,
but it's just very annoying. Do you need a Snickers?
Yeah.
I don't think that's vegan friendly.
I know. That's what he needs though.
Okay. Well, all right. So I mean,
I want to keep going on this agent thing. So you're just talking about
integration and tools all this. I would assume that do you have to
like do you have to like bespokely craft every
prompts to be able to use every tool or does it?
it just simply kind of work out of the box.
You're just like, hey, you should search for this thing now.
Or does it tell it like, does it use reasoning in the sense that you're like,
hey, what do you do?
And it's like, this is what I should do.
And you're like, okay, now follow step one.
What do you do with follow step one?
Like how does it start using these tools?
So roughly what the model is good at is you give it a description of the tools and
the list of tools and you give the task and it's very good at going to loop.
It's like, I'm going to figure out what to do.
I'm going to tell you what tools to call, call them, give the results.
I'm going to like do that.
So it's, I think some models where this works.
that does work really well.
There is optimization, though, there always is.
So the task tool descriptions, like the description of the East Task School,
like how do you describe it?
Like, what's the schema for the input?
Like, is that confusing?
Does you get tripped up on things here and there?
So what we do is for, say, something like Anthropic, you know,
they have their own version of this called Claude Code.
We just dump everything out from Cloud Code.
And when you're using Anthropic, we use all the exact same task or the tool descriptions
and stuff.
So on one hand, I'll say it does make a difference.
If you just do a very naive approach, you're probably going to get worse results.
I personally don't think it's like the thing that makes something 10x better.
I think it's a thing that you can kind of play with and optimize.
And also the end goal is like you don't really want it to have to be that precise because you can bring your own tools.
You can say here's a tool to access my database.
Here's a tool to do XYZ things.
It needs to be able to be flexible.
So we think it'll just kind of get better along those lines anyway.
Can you can you give an example of like how you hooked it up to like LSP for example?
Because I feel like that will be like kind of illuminating, but like a particular example of how that, how it gets access to that.
And how it even knows to call that also?
Yep.
So with LSP, so we experimented with a few different approaches to this.
But the one that we're sticking with for now is whenever it makes an edit to the file, the response, we have a tool that's like edit file tool.
Send us like a patch or like a, it's like old string and neutral.
string in the file where we'll place it.
That's how it edits files.
When it does that, the response to that tool will include any diagnostics that we found
from LSP.
So if it's a TypeScript file, where we had the TypeScript LSP running, we know that
after this edit, there's these three errors.
We say, here are the errors, please fix.
And you'll see it instantly respond with a fix.
And this really helps hallucinations because when it, like, thinks that there's functions
that don't exist on a library or any of those things, it correct itself right.
way. And it's, it's quite good at responding to that feedback. So wait, so how does it, how does it
kick all this stuff off? Because does that mean you actually run the LSP's yourself? Or do you, okay,
so that means whenever I open a project, if I have open code plus VIM open, I might have two TypeScript
servers. I may have two go pleases or whatever they're called and yeah, all the other ones.
Okay. Yeah. So we originally were looking to see can we like hijack any running ones,
because maybe you already have VIM open and you already have these LSP's configured, but you can't,
because they're standard input.
T's big shaking.
Big no.
That's a big no for me, dog.
That's a big no.
So we run our stuff in parallel.
And there's also no configuration.
Like we have we ship out of the box support for a bunch of things and everything just downloads and runs.
Because you don't need like absolute precision for your exact LSP configuration that you like.
It's more just giving the LM something that something to work with.
And then you said that's like at least for the way you guys have it.
It's part of the edit file thing.
So it's like, okay, I know that I need to go and like edit.
Does it also happen to feel like move a file or delete file?
Like does it happen in every kind of like file?
Because like I just interested to know how you put LSP into everything.
Or is it just like edits?
That's a good point.
Like we probably should give it information whenever it changes.
All right.
All right.
Okay.
And the right.
Nice.
Open source contributor.
Let's go Tj.J.
Needs to do LSP diagnostics on file add.
delete boom we do it on right so when you create a file we do show the diagnostics to the tool
or we return the diagnostics so edit and write we're already returning them i guess a move i didn't
i don't know we don't have yeah we don't often see i guess i haven't done much with it where it's
running like a bash move command but yeah i think we also we also considered and we have this in
there which have disabled for now there's a tool that just returns diagnostic so it can choose hey i want to look at
diagnostics right now and it can query for them.
And we have seen it use that a bunch, but it wasn't well tested.
So we're not shipping with that right now.
Yeah, the, the models today, they're still very like tuned to call specific tools.
Like we've played with a lot of tools and you can hand it a bunch of tools that's
never seen before and it just, it doesn't call them.
There's something to being like the post training process being catered to certain sets of
tools.
So Anthropic is really a cloud for cloud three seven before that.
those models are the best at calling tools from a programming standpoint.
They'll actually keep trying and going for it.
Other models can be really smart like Gemini 2.5,
but it doesn't really,
it doesn't call tools very eagerly.
So there is still like this phase we're in right now
where you kind of have to like provide the set of tools that the model expects.
I don't think that'll always be the case.
But we've definitely given it a bunch of LSP tools.
I've played with, you know,
giving it go to definition and find references,
things like that.
And it just doesn't use them.
I mean,
You can get it to use them if you ask it to, but it doesn't like, it doesn't default to kind of thinking that way.
I think that'll change.
So, like, how do you set it up in such a way to know to use that tool?
Like if you say, hey, use Fides, find references.
How does it know when to use it?
Or because isn't this kind of like a prompt skill issue going on here where it's just like you don't have a comprehensive enough system prompt for it to be able to follow?
Is that, is that what the system?
I mean, it is a system prompt.
That's if you look like at the cloud code system prompt, there's a lot of like specific tools called out.
So use the to do tool to set up your plan before you start executing.
That is how you kind of like massage the model into using tools that it maybe doesn't have any awareness of.
It's just, yeah, there is kind of like a finite number of tools before you kind of hit diminishing returns today.
It just doesn't, it doesn't take advantage of all of them.
But it gets really good at using the handful that it uses.
and I think it's in a good spot right now.
I mean, it's very effective.
I think these agents today are
worth using and leveraging.
But I do think it'll get better.
I think more models will get better
at calling tools
and we'll have other options.
So I've never built a model,
so I just have kind of further up questions.
But here, Dax, why don't you go?
Because then I just still want to keep on this.
So one last thing on that.
I just totally blanked out what I was going to say.
It's okay.
Thanks for interrupting him.
It's prime.
I don't know if I interrupted him.
We kind of like race towards it.
I mean, Adam is still so upset about Twitter right now.
I remember.
I remember.
I'm going to go.
So one of the missing pieces here is you can add such as a system prompt and say,
hey, use this tool.
Max, were you waiting for Cluelly to load or something?
What's going on?
Yeah, it was buffering.
I need my cheat sheet.
One day.
It's hard.
hard to tell right now whether when you're making something better, are you making something
else worse? Because this L-LM is such a black box. It's not like a deterministic system they can
see the inside of. So what is missing, and to me, the next major thing we work on is a set of
consistent benchmarks that we can run whenever we make changes to system prompts. And these
benchmarks are probably not going to be very quantitative. We're thinking that we're going to come up
with a very real-world looking code base. We'll have a bunch of like features that we needed to
implement. And we'll have a bunch of standard prompts.
have it do that.
And we have this nice, like, way to see for anything we do, like, every single thing that
happened, diagnostics-wise.
So when we make a change, we can run it through this and see, like, the output and evaluate
it somewhat qualitatively.
So it's still going to be a qualitative thing, like, did it get better or worse?
But at least we have a consistent, we'll have a consistent set of things we're running
so we know.
Right now we're kind of flying in the dark.
We're like, yeah, this made it better, but I don't know if, like, some other person is
having a horrible experience because of this change.
So you're saying now you're doing not only vibe coding, but Vibe coding, but Vibe
Vib testing, Dax.
Vibed testing, vibe benchmarking.
Yeah.
It's really badly needed because there's so many benchmarks for the models,
which everyone has opinions on.
There's not really a benchmark.
Well, I don't know.
They're kind of worthless, I guess.
But why?
Why are they worth those items?
Say why.
Why are the model?
I mean, a lot of strong opinions.
The model benchmarks,
they just don't seem to correlate with like actual real world.
But why?
I mean, why?
I don't know.
because I guess because they're training to like hit the benchmarks are you guys like quizzing me what is going on here
no I'm trying to get you say what's wrong with the arch aGI benchmarks Adam what's wrong with the arch aegee i benchmarking
i'm trying to get you to say because they're all benchmarking python oh well that's a thing yeah
they're not all doing it but the major the sui bench benchmark is literally just python which
blew my mind when I learned that uh yeah there's not like a benchmark today for there's like a
a dozen agentic coding assistance.
And there's no benchmark that says, like, given the same prompts and the same code base,
here's like, it's part of it's like, it's qualitative.
But here's the one that did the best job.
It did it the cheapest.
It did it the most effectively.
You had to have kind of like a grading system and there'd have to be humans in that process.
And speeds another factor too, like how fasted it that even do this thing.
Someone did this funny thing.
Someone did me to something yesterday.
I was kind of interesting.
They told the agent to like write a book.
remember exactly what it was it was something where it would definitely like run in a loop for a long time
and he uh checked like how many steps would i think the highest one which was claude uh did 187
steps like it's the lm tool call to the lm tool call before it before it finally stopped
uh so that's a good way to like rank it to see like which one's the most persistent all these models
are not very like they give up very easily they run to an issue just like humans yeah just like
just like humans.
AGI has really been achieved internally.
Like,
I'm giving up on this.
They're going to just ask me again later.
It's fine.
Yeah.
All right.
I got real questions here.
I also agree RKGI sucks.
Okay.
But now that we got that out of the way,
with that stated,
so,
like,
how do you start a loop?
And how do you determine that a loop's done like?
Because like,
I know you're going to say,
here's what the user said.
You have some sort of system problem to say,
hey,
you can accomplish this task via the,
tools, whatever it says. How do you say, okay, I either repromp the system again to execute step
one of a several instruction plan. Do you have to like manually parse it out? How do you like, what's
the interaction between the model and this like start this looping process? Because that's what's
kind of confusing to me since I've never actually built the agent itself is that I imagine that
there's some, you have to do some weird parsing. What's cool is you don't have to do anything
because this is the responsibility of the of the model. So when you make a call to an out,
it'll generate a bunch of text and it'll stop, right?
There's reasons for why it stops.
Sometimes it stops because, oh, I'm done generating text.
I'm done with everything I need.
So that's the stop reason.
But it could also stop because I can't continue until you execute these tools.
So you don't have to figure out when to interrupt it and like do something else.
It'll tell you.
It'll stop and say call these tools and then continue with the responses to the tools.
That's baked into the models.
So that part's pretty easy.
very easy to build like that loop.
So that's not even a loop then.
We keep asking until it's done.
Eventually it's like a wild true break if
if the stop reason is done.
It's part of the request that you literally send like to Claude is like,
you know, I have these tools available.
They can do these different things.
And then the like Anthropic is the one, right?
If I'm understood, from what I've seen before,
you literally send them the tool definitions.
And then the model says, I'm going to run this thing.
Then you're like, cool, you said I'm going to run that thing?
Do do, do, do I run it?
I send it back to Anthropic.
And they said it just like you can do a conversation.
Oh, okay.
So it's not you going, if you were to execute, what would you do?
And it's like, I would do these things.
And you're like, okay, I'm going to post on all that things.
I'm going to execute it.
Then resend effectively the whole previous conversation.
Plus the current one's like, what's next, bro?
Because that seems like it would just do this forever because it always comes up with new stuff.
Like you and Devin, basically.
Exactly.
me and devon are tight yeah yeah so it's actually it's very easy that's why building like a basic
prototype of an agent yeah is like a weekend weekend project yeah because you just would define like
edit file as like a tool and then you'd give that to the model and then you'd define like some other
simple ones that you would need to do stuff locally search for files i don't know i don't know what
other ones are like the normal ones yeah and that's that grip yeah yeah yeah
fine grab said it runs scrap
yeah it's got a bash tool like where they can actually run
bash commands that one's always good
because it's like oh i'll let you run any bash command and then it just runs
anything that's a really smart that's actually my biggest worry is like how
okay so how do you protect against destructive operations
do you just simply say hey user you must like
approve of said operations because how do you know it's not going to like
rmrf root no preserve so there's a lot of different approaches here most
Most of these agentic coding assistants have a permissions model where basically, you know, certain
tools have to be granted permission.
Most of them also have like a full auto mode where you can bypass all those things.
If you know you're in a sandbox or whatever.
And then there's like there's approaches with sandboxing.
So like codec, CLA by OpenAI.
They've got, they use like the Apple.
I don't remember what seatbelt.
It's some kind of a sandboxing thing.
Same with Linux has a sandboxing thing.
where it basically constrains it to work only in this directory,
like in the project directory,
and then some network constraints as well.
Yeah, some people have,
some people will run cloud code in a Docker container.
Dark containers aren't like a perfect,
like security-wise,
they're not like a perfect sandbox,
but they're pretty good, practically speaking.
We're not thinking about that stuff too much right now
because we need to build something that's fun to use first,
and then we'll figure out how to, like, layer in some of these sandboxing things.
sorry that was just such a fun statement
okay so does that mean are you saying that windows was always right asking for
permission for everything including deleting files
that's what open code does right now right so yes
yes windows was correct
just say yeah dax you got it yeah
dax stop reading twitter
i know you're upset
i know you guys are upset about
the twitter thing we can i have so many questions so i didn't even know that there
was already this agentic loop effectively that exists for it.
That's what's inside of cursor.
Well, yeah.
I mean, I know that the cursor has it, but I thought that you wrote the loop, not that
these, these models effectively can kind of break and say, hey, I need more information
before continuing.
I mean, you still do have a loop on your side because if it's a wild loop, it's still a loop
where you're having to keep calling the LLM with all the messages until it tells you to stop,
basically.
Yeah.
And there's some tricks here too, right?
There's like,
there's obviously a limited context window on a lot of these models.
Sometimes they're like pretty tight.
So if that loop is going and a context window is starting to approach the limit,
we will pause for a second, take the whole history,
send it to LM, ask it to summarize,
and then we'll like continue the loop just with a summary.
Is that dangerous?
In what way?
I don't know.
I mean,
dangerous in the sense that the moment you asked it to summarize,
you do effectively a compression algorithm on top of it,
a very lossy one.
Like how far does it get off?
You get like warnings about it?
Does Open Code give you like a warning?
Like, hey,
we have to do a lossy compression on your history.
So therefore it may start going haywire.
We have that information to show you,
but it works so well that like it's kind of.
Okay.
Yeah, like the experience feels like you have infinite context.
effectively.
Most of the tokens that it's taken up are like,
here's the stack trace from one thing that I fixed eight minutes ago.
Yeah.
I mean,
really a better practice is just to keep creating new sessions and to not let any
context window grow very far.
Summary is like compacting it is basically doing that.
It's basically forcing you to start a new session,
but it's giving it a little bit of context,
I guess,
in the form of the summary.
But it's generally like you don't want to keep a long context window
filled up with a session that's,
you know,
five messages long.
It's like,
it's getting less and less effective over time.
Yeah.
And most,
I mean,
if you think about like how much of the actual tokens you're typing in,
telling it and giving it direction,
it's very little compared to how much it's like getting from reading file or like a,
oh,
I'm going to change this file.
It was 250 lines long and I changed a bunch of the lines.
Huge diff.
Tons of tokens.
We don't need that now.
That was an hour ago.
Yeah,
the effectiveness of the model is all tied to like,
signal to noise. How much signal can you give it without noise? And the longer your session goes,
just obviously there's going to be more noise. Make a new session for different things.
That's what we're saying, right? So if I'm going to do a new task, I should start a new session.
Yep. And sometimes, yeah, sometimes you want to do things in parallel. It's another reason to start
a new session. You like kick off one thing, new session, kick off another thing, new session.
I found my brain could only handle so many of those, but people do like paralyzing a lot.
So could you have an agent that spawns sessions?
So it is actually the compression algorithm for your brain.
Sub-agent.
It's all the rage right now.
Teage is on top of it.
Oh, I know.
I know everything there is to know about agents.
I came super prepared.
I started drama on X.com.
The everything application has been seating false websites in the chat.
I have 17 misinformation bots in the chat right now, spreading fake news about open code.
Nice.
The subagents are an interesting concept.
I think they should be used surgically.
I think there's a trap of like having just sub agents that are general purpose just to like parallelize work because we're not like open code at least is not intending to be an asynchronous agent.
Like you are in the loop on purpose.
Like we want you as a programmer able to interact with the agent and to steer it.
in the right directions.
I think if you get into like eight subagents doing all the work,
there's very little visibility into what's happening.
You lose the permissions model,
like the ability to inspect what's going on.
You just can't like your brain can't handle those eight subagents going in parallel.
So now you've kind of got like a background agent, right?
You've got jewels or whatever else.
And that's a different thing than what we're trying to create,
which is very much human in the loop,
human guiding the agent.
So I think subagents like using them,
so there's a task agent in clog code.
or a search agent.
It basically is just a read-only agent that can search through files and limit the amount
of context noise in your main context window.
That one makes a lot of sense.
I think like a code review or a planning agent that's a sub-agent, I think those can make
sense.
We're going to play with a lot of that stuff.
But I think like just general purpose sub-agent that does everything and then you're
just trying to parallelize all the tasks.
I don't know.
I think that's kind of a trap.
Can we get one that's like a zoomer version of it so that then the rest of the time it
responds like a zoomer. I want Tanner in open code.
You know what I'm saying? Like I want to write my prompt and then it turns it into Tanner and
Tanner reads everything off to me. Hey guys, it's me Tanner here. Just letting you know we're about
to edit the files. Okay, sweet. That's what I think. That's what I think you guys are missing.
Like if you're going to differentiate on product, that's what you, oh, prime. This is way better.
Yours is way better. I'm a way better Tanner do.
I don't know who Tanner is. I'm not going to lie. I was trying to laugh. It's an AI voice on
Prime channel.
Yeah.
Oh, gotcha.
Yeah, yeah.
I'm out.
It is not watching Twitch.
I'm sorry, guys.
I love you guys.
I don't watch Twitch.
Someone gifted you a sub, loser.
Yes.
A senior variable would be the first name.
Adam,
let me be honest.
Let me be honest with you.
And even though this is in public,
I'm going to be very frank with you.
If you were watching Twitch,
I'd be really mad because I'd be wondering what the heck,
who's working on Terminal.
Dot shop right now.
Yeah, exactly.
I've got three jobs.
I can't be watching Twitch.
You do not have time to be watching Twitch.
Adam.
You need to be building terminal.
Dot shop.
And as your co-founder,
I reject the notion
of Adam on Twitch.
Yes.
If you knew who Tanner was,
I'd be disappointed.
It was a test all along.
True.
We're putting the individual,
an individual contributor.
Man,
I love that line.
So that one.
It was very good.
Adam's building open code
so that he can have it work on terminal.
This is all like a straightforward plan.
Oh, so we're investing.
We're sharpening the axe.
Nice.
We're big fans of sharpening the axe around here.
Yeah.
That's really smart.
It's going to be so sharp when he's done with it in a few months.
It's going to be so sharp.
Oh, man.
Okay.
So it really, that's what it takes to build an angel.
What is the part that you didn't realize was going to be like the hardest when building an agent?
Like, what's the thing that caught you most off guard?
You know, it's just like every product.
It's the 80%.
is easy and you get into all the edge cases.
It's like, what happens when someone cancels the loop and it's like in the middle of a tool
call?
Like, it's that handled gracefully.
What happens when it like hits the context window but didn't get a chance to summarize?
Like there's just infinite of these like weird cases and it's multiplied by the fact that again,
these things are not very deterministic.
So discovering them is quite hard.
But yeah, it's just to me, I think for me it's been all the edge cases.
Adam's been mostly working on UI stuff.
Yeah, the twois are just hard.
every time I work on a 2E.
Would you say it's the way that Charm does the rendering that's hard?
Or is it the way that Charm takes over the open code repo and release it that's hard?
Like what would you say?
Tewis are hard.
Charms him on both sides right now.
That's what I'll say.
Multi-front.
It's like, that is not okay.
Do not associate.
I could be literally anybody.
It could be someone else over here.
I am not part of this podcast at all.
That's Miami, TJ.
I'm missing your chain, Dax.
That's what I need to complete it.
It really changes to look at the turtleneck when you throw a chain on.
It's so different.
The turtleneck does become way, way different.
Here's actually the tough.
I have one tough question for Adam.
Does watching Twitch make you not a vegan because you're consuming something made by animals?
Aminals
I heard that
Wow
That was unexpected
I love it
Yeah
Does using electricity
Mean you're not
Evasion because you're consuming
Gasol fuels
You hate the environment
Why'd you agree to come on?
I was surprised
Yeah
Okay okay
Okay we could probably do one more real question
But is the towee hard
because of the way, because I know what we've worked on charm together and it has this thing where it has to
render from top to bottom so it makes like modals and various things difficult. Like have you got,
because I know we talked about this, were you considering actually still building like the non-moat like
the or the non top to bottom style rendering that is currently, is it bubble tea? Is that the name of it?
Yeah. You're wanting to change it out to something that's more like you have a scene and you layer it.
I like bubble tea. I, yeah, it's very good. I've grown to really enjoy it. I enjoy it. I enjoy
go more all the time. I kind of hated
Go at first. Suck on a Twitter.
Suck on that.
I'm actually enjoying Go quite a bit.
It's nothing about the actual
like developer. You can learn to love anything.
Yeah, you can learn. Yeah, sure.
If we spent six months telling you
you had to write something in this and that it was good,
Adam, you would say Rust is good.
Like, this is not a strong endorse.
Do not over index on this.
The thing that makes Tui's hard is like coming
from being a web developer. I can't,
There's so many more constraints, and that's nice, sure, in a lot of ways, like designing a to-y.
It's kind of fun because you have all these constraints.
But it also just sucks when I can't just like let the user do things you can just do on the web.
And this vertical space is so constrained.
Like you're so just like you're trying to like minimize absolutely every pixel.
It's just stressful.
It's just like getting all the little things right and making a good to-e experience.
It just kind of like stretches all your brain cells.
It just sucks.
And overlays are hard.
Overleys are hard, yeah
Okay, see, TJ, that was a real question
TJ thought I was going to make another joke
I wasn't, I was in there because I'm actually
curious about it because I wrote when I did
when I did Flappy Bird
I actually gave up using that and just wrote
my own scene and did Z indexing
because I wanted to be able to do
layering and it's just easier
but then it takes care of all the colors
and the spacing and how you do that is actually a pretty
complicated subject at the end of the day
Yeah, I was just imagining
you asking exactly the
same question.
Nah, that joke's only funny ones.
Oh, man.
Okay, well, anything else you guys want to let us know about agents or open code?
We should, we said on the podcast, it's SST slash open code.
I've been pasting it in the chat.
I know.
Links will be in the description.
And even maybe in a pin comment, if you're lucky, Adam.
Oh, that'd be amazing.
SST slash open code.
I'm just already prepared for all the YouTube comments that are like, wait, it's not
open code AI slash open code?
That's not the one?
Well, now that you said that, you've just guaranteed 90% of the comments will be the wrong link, Adam.
Thanks, Adam.
So that's, so we, maybe we'll help you out.
Maybe we'll cut that part from the final thing.
But I will post the links.
I'll make sure the real links are there so that people can find it and test it out.
Yeah, thanks for asking you guys.
I still have no idea how to build an agent, but I really appreciate the talk.
It makes me feel better.
That's okay.
We're going to ask open code to build us an agent next week.
Yeah, there you go.
Or later.
It's going to write one loop.
Yeah.
It'll be like, you'll just see it.
It'll be super obvious.
It's the same thing as sending completion requests.
It's just tool requests.
That's it.
Okay.
And then put it in a loop until it's done.
Make it easy.
Yeah.
Okay.
Cool.
Okay.
Okay.
Okay.
Well, thanks Dax.
Thanks Adam.
Thanks everyone for watching and listening.
it's been another episode of the stand-up
hope that you enjoyed
bye
bye
bood up the day
vibe code and errors on my screen
terminal coffee
in hand
