The Changelog: Software Development, Open Source - MCP on Code Mode (Interview)
Episode Date: May 15, 2026This week I'm talking with Matt Carey about Code Mode and how most of us have been thinking about MCP all wrong. Matt works on the Agents SDK and MCP at Cloudflare — we discuss how server-side Code ...Mode lets one MCP server expose all ~2,500 Cloudflare API endpoints in about 1,000 tokens of context, the dynamic Worker loader that runs model-written code safely in a V8 isolate, Matt's own workflow with Claude, where memory fits into the future of agents, and his Zaggy git wrapper that keeps agents from force-pushing his repos.
Transcript
Discussion (0)
What's up friends, Adam here.
This is the change log.
I got an awesome show for you.
Matt Carey from Cloudflare, working on agents and MCP,
lots of fun things obviously happening at Cloudflare,
tons of releases, tons of momentum,
and even Matt will tell you here in this podcast,
it's moving fast and he's trying to keep up.
If you've been curious about Cloudflare's platform,
how they enabled all their APIs via their MCP
without destroying your token count,
Well, this shows for you a massive thank you to our friends and our partners at fly.io.
That is the home of changelaw.com.
Learn more at fly.io.
Okay, let's do this.
Well, friends, this episode is brought to you by our friends at coder.com secure environments
where developers and agents work in parallel.
And I'm joined by Nikki Pike, Field CTO for Coder.
Nikki, what is the field CTO?
So I get that question a lot and it's, you know, half the people understand it, half the people don't.
So a field CTO, I describe it very simply as we're dev rel for the C suite.
So we provide a bridge between the customer voice, between the C suite and the managers and the leadership teams of our customers back into our product.
And then we go through and we help enable our teams to have the same message to make sure that the message is correct.
And that we're building on something that people actually want, not just something that we think they want.
Okay.
So we're taking the laptop away from the developer.
Not really, though.
We're putting them in a cloud development environment, a secure environment where they can work with their agents in parallel.
These are blessed environments.
What's wrong with the laptop?
The laptop is the trap here.
And not only because the fact that it could be stolen, you could lose it, it breaks and you're out of work while you're waiting for a new one, but there's also just the consistency that you got there.
We all know developers.
Developers are going to be looking for some of the latest and greatest.
And if you're not really controlling how they get out there, that's where you get this.
It works on my machine.
It doesn't work in production.
It doesn't work anywhere else because you don't have that consistency.
You don't have that ability to really standardize what that environment looks like.
And this is a problem not only for new people coming in, you know, the onboarding statement is
average, I think, is like four to five weeks for a new employee to really get their local
laptops set up and ready to start doing their first time of code.
And, you know, the time to first commit is a metric that almost everybody knows.
And the reason they can't do that is because there's a lot of tribal knowledge out there.
They got to go talk to other developers.
What are we using?
where do we get our dependencies? Are we getting them from public? Are we getting them from private
repositories? But there's also the security and the supply chain aspect of this. When you have
local machines out there, look at like the shy Hulud, you know, that virus that went out not long ago.
This was a compromise of the MPM public repositories. They went and downloaded things. MPM did
what it did. Next thing you know, you're compromised. But when you use something like what we're doing
with cloud development environments, then you can mandate and you can put restrictions on there to say,
hey, you can only go get your packages from our private repo.
Those packages are expected to have been thoroughly vetted.
We know that they're clean.
Now, does this stop everything like Shai Halud?
No, if that compromised package gets into your private repo,
you can still have that,
but it really reduces the surface area of the attack.
And it also reduces the blast area of the compromise should it happen,
because if your laptop gets compromised and you have to kill the laptop for whatever reason,
that's weeks out of work while you're either fixing that or you're getting a new laptop in,
the cloud development environments allows you to kill that, start back up fresh,
and you're back and running in five minutes.
You don't have to wait all that time.
Well, friends, the first step is to go to coder.com, install coder, self-hosted environments
for your teams to enjoy, to standardize around, and it's open source.
So you can try it out today.
Once again, coder.com.
At Carrie, good to see you on the pod.
Thanks for taking my invite.
I saw code mode out there.
And I was like, you know what?
Let's talk about code mode.
So what do you think?
Yeah, well, thanks having me.
You podcast often?
Yeah, so I actually have one with a friend of mine,
but we don't do it super often,
a couple of times a month.
A couple times a month.
What do you talk about?
What's the show called?
It's called You've Been a Bad Agent,
and we just chat absolutely rubbish about agents.
That sounds like a fun.
How long is the show?
Well, it starts, and we just start rolling.
And I don't speak to him that much
because he's in San Francisco.
So I'm in Europe,
and so we just, like, use it.
We started having chats like every couple of weeks just because I like to catch up.
And then we like, we should record this.
And that's where we started recording it as a podcast.
It's literally just a chat between two of us.
Sometimes it's 20 minutes.
Sometimes it's an hour and a half.
We don't really edit it and just gets dumped online.
So I'm probably going to get sued one day for something I say on that.
Don't do that, man.
That I'll get you sued.
We don't want to get sued.
You know, one thing I'm really curious about really and why I want to talk to you
because, you know, obviously, code mode is cool.
and there's a misconfiguration, I would probably say,
and you could probably agree with that,
with how folks are thinking about MCP.
And I think I've been in that camp, too.
I think we're all sort of just navigating this new world
and trying to figure out how these tools work.
And there's a race, obviously, Cloudflare is involved in that race.
You know, you've got blood out there in the water on X between you and Versel.
I mean, you got stuff happening, you know.
And I just think about the sheer size and weight of Cloudflare.
Maybe you can or cannot speak to just how you personally feel about these outages.
Maybe some you can help, some you can help.
But I just think about the state of AI and how it's being deployed, accepted and deployed.
And so the acceptance is one thing.
But the deployment of it is another in a large organization like yours.
And you're in charge of agents.
You're in charge of MCP.
You can share with the audience, what you really do.
there. That's kind of what I want to cover is that bigger landscape of the deployment, acceptance
of AI, the misconfiguration of MCP, and that kind of stuff. What do you think?
Yeah, definitely. Let's go. I can chat a little bit about me, for instance. I work on agents,
specifically, like the agents SDK at Cloudflare. So I work on the open source stuff.
You'll have seen a bunch of my colleagues on X, Twitter, like if you're on Twitter. I also
work on some of the open source stuff for MCP and how we can support MCP at Cloudflare,
model context protocol.
It's kind of why I joined Cloudflare.
I was working a bunch on MCP, and I thought they had a really good, like, avenue there
to build the best agents with durable objects and give them the best tools via MCP,
and I was like, that looks super cool.
So it's kind of where I really wanted to join this team.
And, yeah, I've been here since October.
We released code mode in the summer of last year to do programmatic tool execution, basically.
You write code over your tools rather than calling tools.
And then Anthropic followed up with a bunch of cool stuff after that.
And then just a few weeks ago, well, a week or so ago, we released server-side code mode.
So running code mode inside an MCP server.
So the model doesn't need to, or the agent doesn't need to call tools.
The agent can just write code that acts upon.
the tools that we have on our, like, in our back end. And then all of that code is executed
super safely, securely on dynamic workers on our server side. And so your agent that calls the
tool can just write code. And yeah, it all gets executed on the server. It meant that we could
put our whole Cloudflare API, all 2,500 odd endpoints. There was a number, but the number
keeps changing, so every time I remember the number is, it's out of date. But yeah, so around
two and a half thousand endpoints, we can put all of that behind one MCP server that actually
works, that fills a thousand tokens of context. And there was a lot going around about like,
oh, fixing MCP. It's like, I don't think we're fixing MCP. We're just using MCP to the best of
its capabilities. And it was a really well-designed protocol, I believe. And I think it continues
to be well iterated on. And I think it was maybe not used as well as,
it could have been initially.
Yeah.
Could you explain MCP a little bit for us?
I mean, I know I've got a good dip in my toes in the water of MCP, of course.
But for the uninitiated or less initiated, what exactly is the model, what is it, model context
protocol?
MCP.
What exactly is that?
And maybe what are the myths about it that are incorrect?
And what are the things you like most about it?
Yeah.
So it came out in November.
of 2024 by two guys from Anthropic, David and Justin. And they, the whole idea was how can we
let Claude, in this case, Claude Desktop actually, how can we let Claude desktop do things on
my computer? How can I let it access my Apple Notes? How can I let it access maybe my web browser,
maybe Figma, maybe like whatever was on my computer? How can I let it do that? How can I let it
read my code directly.
That would be pretty cool.
Now we have code and all of that stuff, but like,
it wasn't around yet.
So they came up with a protocol that consisted of tools, prompts, and resources.
And tools are the things that everyone talks about.
Tools are like the functions, like function calling.
The prompts are instructions that the server can hold that the client might want to use at
some point and can request. I think of them as instructions. They're kind of like directions.
They're almost like skills. There is a debate about whether skills are a prompt or a resource
at the moment. So we can get into that. But the resources are like documents that might be held
on the server that the client might want to use. And like tools have far and away the most amount
of usage. But initially it was like how can we define some tools that can be used by an agent
that I don't own and vice versa.
How can an agent give access to tools that it doesn't own when it's being built?
And for that, you need some sort of like standardized protocol.
And when these guys made it, it was just local first, communicated via standard I.O.
And then like once it got some usage and they introduced a remote protocol and then
that remote protocol has changed a couple of times.
Now, like pretty much every big SaaS company publishes an MCP.
So I saw Datadog published theirs last week, which is pretty cool.
And yeah, I think I think I'm pretty bullish on MCP's, like, ability to be the protocol that agents use to access services in the future.
Help me understand this breakthrough that you made with just writing TypeScript versus all the context-filling tool calling, I guess.
I should call it, right?
That a lot of folks are kind of getting, I guess it's kind of wrong, but it's kind of
how it's designed, but maybe you're applying the application incorrectly.
Talk about the way that you've remodeled it to write typescript versus tool calling
and fill in the context window.
Yeah, so when LMS first came out, they just produced text, right?
And then at some point, I can't quite remember exactly what it was, but I'm pretty sure
it was after the big chat-tube team moment.
Function calling became popular.
And I remember the first model that could do function calling really well was, I think, GPT4.
And GPT4, you could ask it, like, what's the weather in London?
And it would reply, call weather function, param location equals London.
And you could intake that, that structured piece of information.
And you could plug that into some JavaScript, some Python, some code.
And you could call a third-party API, a weather API with London.
as the argument, the city argument,
and you could get a result.
And that result, whatever it was,
you would pass back into the model as the tool result
or as the function core result.
And then the model would continue generating
and make it all nice and pretty for you.
And that was like how LLMs performed actions
in the outside world.
And we,
I guess, like rightly or wrongly assumed
that each individual function would be like a piece,
like a, would be like,
a hand that the model could use in the outside world.
And then they were renamed to tools after a while.
And that makes more sense.
Like, each function was a tool that the model could use in the outside world.
But there is like a problem where as you try to get these agents to do more and more things,
you add more and more tools.
And then at some point, you start filling the initial context window of the model.
So, for instance, like the GitHub MCP server is always one that's used.
And they've done loads of work on it.
So, like, I don't, not throwing any shade.
But initially, when it came out, it was like 15,000 tokens or something.
And now I think it's a little bit less.
And they do some stuff to dynamically add tools or not.
But, like, if you're filling the context window with sort of 20,000 tokens initially,
before you've even given the model your task, like the models of yesterday, like, GPD4,
they had much smaller context windows.
And so you were filling them very quickly.
And even now, the foundational models, the best ones,
even though they have maybe have a million tokens context window
or 200K to a million for the normal ones,
they do start losing power around the 50K mark, like all of them.
And this is quite well documented.
And so you really don't want to be filling the context window too much.
And you see in Claude code, you'll have like compaction step.
That's triggered because you've filled a context window.
So say you've added like 20 tools or whatever, you've like got a really chunky context window.
But now that's only 20 things the model can do or the agent can do.
Imagine you want to have like proper personal AI or something that can actually try and automate your job.
Like imagine how many individual functions you do in your job.
It's way more than 20.
It's probably more than 100.
It's probably near a few hundred.
So, like, you can't really automate anything unless you have the ability to add all of those functions into a model.
So in the summer, Kenton and Sunil, who my colleagues of mine, worked out that you could, or they were a really great blog post, the Code Mode blog post, the original one, an amazing name from Kenton, like, really, really stunning.
It's like, I think it caught on a zeitgeist.
It's like, you could just use code mode.
And the idea was falling back to an idea that had been around for a while.
Like, I'm pretty sure Hugging Face did a research paper on Code Act a while ago.
But the idea was that models should just write code.
Like AI has been trained on so much code, we should just write code.
And if we write code, the code can interact with the functions that we want to use.
And so code mode generated a TypeScript API or TypeScript SDK really for the for the
function, the underlying functions, the molecule call. And then the model just wrote code to compose
those, um, those SDK calls. And the, I guess the good innovation for this, the reason why it's,
I think, slowly taking off and like, I'm going to see much more adoption this year, is the advent
of something called a dynamic worker loader, which is a cloud player primitive, but other people
have started building similar things, um, if not the same. Um, but like, this is a primitive that allows
you to execute a sandbox worker as a string.
So from a string, from some code that's a string,
you can just be like, eval this, new function this,
except it runs on a separate host
in a fully sandboxed environment in a VA isolate.
And what does this mean?
Traditionally, people got very, very scared
when you say, give me code and I'm going to execute it on my machine
because there are so many different ways that you can mess someone up by doing that.
You can, like, out of memory them,
You can access M variables.
You can do loads of stuff that's really hard to protect against.
And this sandbox is a very particular sandbox that's not a full VM.
It's just a VA isolate.
It allows you to spin up like billions of these little scripts almost instantaneously if you wanted to.
Like at Cloudflare scale, at global scale.
And just run these pieces of code.
Like very, very safely and securely.
You can even like restrict the outgoing fetch.
It's called a global outbound.
and you can say, I only want the outgoing fetch to be able to access example.com or mysass.com.
Or I don't want it to access anything. Just run code that's entirely constrained in this host.
And this is really cool because it allows this like code act idea, this code mode idea to like really take off.
Because the model can just write code. It doesn't matter if it's prompt injected or if it's like trying to be, I don't know, trying to be adversarial.
like the code is running in a super safe environment and that's all good.
And that meant the original code mode blog post had this code execution happening in the agent.
And that was like the smart thing to do, right?
The proper thing to do is like you make the agent do code execution.
And then you become like a code agent.
And that's what happens.
But this relies on every or on the agent that wants to do it, actually shipping code execution in the agent.
And like, it turns out that that is also quite tough.
And although for the last like six months we've been shouting, like, if you're, if you're, like, have problems with context window, just get the model to write code.
Like, not as many people did it as we thought would do it.
And so I just took what we'd done previously and moved it into our MCP server and basically just said, like, what does this enable now?
we have that massively reduced context window allowance,
and we can use it to enable our MCP server
to access the whole of the Cloudflare API.
Do all of the possible endpoints
that you wanted to call on the Cloudflare API.
You can now be accessible via MCP.
And before, like, you could come to this,
guess what I'm trying to say,
you could come to the same conclusion in the summer
by putting code mode inside the coding agent.
You could have connected as many MCPs
servers as you wanted, but put code mode inside the coding agent. But fundamentally, that's quite
hard to do. So how can we show the value of it? Well, we put it inside the MCP server, and that's
what we did. And it built like basically a one-of-a-kind MCP server. I know one had really seen
this type of thing before where I can have one coding agent that uses a thousand tokens of
context, access, our whole Cloudflare platform. I think that was pretty cool. Sorry, that was a bit
rambling, but that was the point.
It was a good deep dive. I like that. I got some
questions about that. So, what
is fundamentally different about the
MCP server that's uniquely different
than every other to enable
this many APIs and that
reduction in the context window?
Yeah, so most MCP
servers would map like
one API, one endpoint
to one tool,
and they'd be like, oh, we
want to get all
issues or
post an issue or like delete an issue or yeah like one to one mapping and sometimes you might do
like a one to many mapping a little bit if you had like a particular workflow that was very common
on your website you might like create a tool for that workflow if there was a very common workflow
that customers do but you're kind of restricted to around 10 to 20 tools like maybe up to 25
I know cursor has a 40 tool limit max for supplied MCP servers.
So theoretically you could fill that, but like it's getting pretty large.
It's getting pretty large.
And you're still not covering anywhere near.
You're still cherry picking.
Like for big platforms, like the Cloudflare platform, like GitHub, for instance, I use GitHub
and Cloudflare, but like you can imagine any large API.
You're not covering anywhere near the full amount of.
of the breadth of the API.
But when you put code mode in front of it,
you're now just asking the model to write code over that API.
And so the Cloudflare MCP server exports just two tools,
a search tool and an execute tool.
So this is combining a similar idea to tool search,
which is present inside Claude Code,
as also present inside cursor, I think,
where they will search for the right tool on demand,
and then they'll load the right tool,
depending on the user intent.
So we do that, but we do that on the server side.
So there's search and execute.
But the critical thing that no one else does is for search, we let the model write code to search over the Cloudflare Open API spec.
Code.
There's no like search function or anything.
The model just goes like spec dot paths and then filters, like super like naively.
And it works.
And then for execute, we say model here, it all.
agent, here is a Cloudflare fetch clients.
Call Cloudflare. request to make a request to the Cloudflare API.
And we just let them all go for it.
And so you end up with something that's super flexible.
If you say something like, build me a worker that hosts a next JS website that can do
this, this, this, this, and this, you would hope that the agent can just write the code.
maybe using V-Next, a new fun thing that we just built
to write NextJS and deploy it on Cloudflare.
Yeah.
They write the code, and then the model can look,
like search through workers' scripts, deployment APIs, whatever.
It finds it pretty quickly, normally.
And then it can just call the endpoints,
worker script deployment, and it will just deploy it to the cloud.
Like, super, super fun.
And you can have these, like, insane demos
where none of the code ever gets saved on your machine,
it only exists in the context of your chat with the coding agent and in the cloud.
And that was the first demo I ever did of this.
And I think it went pretty well.
And it went so well, they were like, we have to ship this.
Really?
Wow.
What was that like that demo?
Can you take us into that day?
And who was there?
Where was it?
Was it online only?
Yeah, yeah.
So it was the Friday demos.
The Friday demos are pretty legendary.
Cloudflare, like we, so I'm part of a team called like developer platform. So I work on workers.
I work on, yeah, on anything built on top of the developer platform. And yeah, we have like our own
demo sections on Fridays. And yeah, like the whole most, most of this org like turns up and we probably
have five to seven like awesome demos. Like some of the coolest stuff like you've seen come out of Cloudflare was
probably demoed at one of these platform sessions, like often only of like a short time before
it was released to the public, which I think is pretty cool. Yeah, it was good crack. You get a
couple of minutes, you know? Yeah. Yeah. Look at this thing I built and then 50% of the time it breaks,
but then 50% of the time it's awesome. I mean, a lot of these things happen, I would imagine,
like, maybe late nights, maybe just late in, like, maybe late in your ability to
keep thinking where you're sort of knee-deep in innovation. You're you're iterating over something.
Can you take us into how you stumbled upon this, you know, write some type script against an
SDK? How did you stumble into this? Was it a thought? Were you in the shower? You want to run?
You know, how did this, you know, how did this iteration come to be? Well, I've been putting it
off for ages. We knew it was something that we wanted to do. So I don't, I don't know if the,
like the idea had been around for ages.
Like Sunil, my tech lead,
I don't know if it's infamously now,
but he built the agents SDK,
which is the thing I work on,
the first iteration he built over a weekend.
And then he went to his product leads
and was like, guys, can we ship this?
And they were like,
ooh, maybe let's like hold off a week.
And then he shipped it a week later.
And the first version was literally just like,
durable object, export agent as durable object.
and it was like clean.
It was like one liner.
And that was the first version of the agency K.
But that was like his,
that was like a really cool innovation there
to use agents as durable objects.
And I guess for this,
and that was a weekend.
I guess for this,
like we knew it was something that we wanted to do.
Suddenly they'd been badgering me for ages
to like get on and do it and work it out
and work out the tool thing.
Why can't we just have loads of tools?
Like surely this is possible.
and yeah I'd put it off for a couple of weeks and then I think it was a Tuesday just sat down and was like right I'm going to do it
had a had a chat with Claude I reckon about how we might want to do it worked out very quickly that it would be
I was been working a lot on MCPs that worked out very quickly that the best use case for this would be how can we enable every MCP server to host an unlimited number of tools and that this seemed quite possible
with the search and execute paradigm.
And, yeah, like the model, I previously, I've, I don't know,
I've been working on this for quite a long time.
In my previous jobs and stuff, I'd always had problems with search functions.
So whenever you give the model a search function, now you need an eval.
Like 100% you need an eval because you need to work out that if you change the parameters
of your search, like, does the search get better or worse for your task?
And I was just pretty, like, I was very drawn to the idea of code mode that I would never have to do like an e-val in that way again, because the model can write code.
And as the models get better at writing code, my thing would get better.
Like, I keep everything in distribution.
And so I was really, really, really drawn to that idea.
Like, I knew these models, they're just going to get better at writing code.
Let's lean to their strengths and just, yeah.
Let the model get, yeah, keep the model in the distribution it was trained in, rather than like some hacky search function.
It's never going to be in that distribution.
And so I was like, right, let's go.
And that sort of brought all together, like the two pieces.
Do you have an unlimited context window?
Like, do you have my context window when you're working with Claude?
Give me an example of.
Do I have something special?
That's kind of a tongue-in-cheek request or ask, I guess, but what I'm trying to get to is less that real response.
I'm happy to take it.
but more so be more clear with what you're like this back and forth,
are you dropping files,
or you sort of,
you know,
sort of microcontexting where you sort of pull some out of the context,
you take it to a file.
Like what does the actual interaction look like to go back and forth with Claude
to innovate for the future like this?
Yeah,
yeah.
So my workflow with Claw has actually changed quite a lot over the past.
I'm sure everyone's has.
It's a huge amount of the past.
Yeah, I would say.
a year and a half ago I was like big into cursor I really liked it and I was using it I was using it mostly for the tab model I'd say about a year and a half ago and then January last year I was like now the agent model is the future I need to like work out how I can just prompt how can I just prompt things into existence like what guard rails do I have to put on things do I how do I can I make the feedback loop how can I have tests that have good patterns and all of the same
sort of good stuff, to like build basically my code basis for agents. And this was about January
last year. And then when Claude Code came out, I was like, right, now this is definitely the
future. Now I'm just going to have an IDE open when I want to actually visually look at code.
When I want to review, I'll open an ID, otherwise just straight in the terminal. Let's just chat.
Let's build something. And I tend to run everything on dangerously skip permissions,
like 100% of the time. I have some like, like,
sandboxing on my on my on my on my laptop it's a custom thing that I'm not like
don't super want to talk about but I am curious what you want to talk about
yeah yeah I'm not going to talk about hugely I have okay so I have my own
version of Git that runs on my that the runs on my machine and it's just
alias and it just stops like the model doing like stuff that I don't want it to do
so it stops it force put force pushing to branches overwrites
stuff, even because I have admin permissions on a lot of repos.
So to me, this was like the base level of thing that I didn't want to happen.
I didn't want anyone to be, I didn't want any agent running on my laptop to be able to
overwrite a remote repo.
That's like base level.
Yeah.
My own laptop OS, like I don't mind breaking stuff.
Like I don't mind breaking anything locally, but I don't want to break anything externally.
And so I have a few like aliases like that where I've just like completely overwritten the Git.
internally. And my like Git rappers called Zaggy. It's public on GitHub. I built it in Zig actually. It's super
fun to build using LibGit 2. And it just like has a bunch of these protections at the box. So I really
don't mind running stuff in dangerously skip permissions. I tend to do it all the time.
A chat with the ball, I chat with the model about things that I want to do. But I tend to come to
sit down at my laptop with a preconceived notion of what I want. I think a lot of, I think a lot of
the time when I sit down at my laptop without any idea of what I want, I almost feel like I'm
scrolling on Instagram or Twitter or something. Like even chatting with the model, it's just
feeding a dopamine rush that is not real. Like when I know what I want, I can, I feel like I
can evaluate stuff really well. So it took that Tuesday for me to sit down and be like, right,
I know what I want. Let's just do it. And in that case, I think I can be hugely effective.
I think most people can just speaking straight English to the model and not.
Nothing fancy.
Yeah, just go for it.
So a lot like a chat, like a real chat.
You're not doing some act as Cloudflare master worker slash developer.
You know, like these sort of like hacking things.
That used to be a thing.
That used to be a thing.
Yeah.
Yeah.
I'm not doing any of that.
I'm not acting as like every once in a while I might put it in a role.
But I mean, just it's like one out of a hundred if that.
It's usually just here's the problem.
Here's where I'm trying to go.
Here's where I'm at.
Here's what's in between.
us, let's just riff kind of thing.
You know, what's here, what's there?
And I sort of trust the model of those senses.
So I'm not trying to wield it and force it into a mode.
I kind of just kind of give it the trust it needs to do its great job.
Anytime you're fighting it, I feel like personally for me,
anytime I'm not getting good results or I'm fighting it,
I'm trying to push it into an area where it's just not so much not good at.
We're just in uncharted territories or just something like that.
And I feel like the more I just talk like I would a normal engineer next to me or a colleague, that's where I get my best results.
Yeah, you want to always keep it in distribution.
If you're doing something too wacky, then you're probably going to have less, you're probably going to have erratic results because like if the agent never saw or the model never saw anything like that in pre-training or in its post-training like RL stuff.
was that if you're like working on a common programming language is codebase,
you're speaking to it like maybe like you'd see in a GitHub issue with language that is like,
unintelligible, then I think you're pretty good.
I don't think I do anything special with Claude.
I would say that I tend to only use Opus now since Opus 4.6 came out or 4.5 or whichever one it was before Christmas.
I think that was a big step change in being able to do things more autonomously.
And so what I tend to do now actually, because I work on a lot of different repos,
I just open code or actually open code.
We use a lot, Cloudflare.
I just open that in the, in like my code folder on my laptop.
I basically open it, always just open it in my code folder.
And then I direct to the repo that we're working on.
Sometimes I'm like, right, make a new work tree.
But everything lives in the top level code folder because I work a lot on libraries and on products.
And the products use the libraries.
And so it's nice if it's all, like I'm working constantly at that top level.
And then I can move between stuff more fluidly.
Really?
And that's actually one of the reasons why I don't use cursor as much or like any of the IDs as much now.
Because just like having the ability to like open an agent in that top level folder, super nice.
Yeah, I guess if you're constantly,
context across projects or libraries are connected, it makes a lot of sense.
But if you have disparate projects where that is totally its own thing or this is its own thing,
you kind of want to have a directory of a silo is kind of how your code directory is.
It's like this is a silo of all Cloudflare work.
So therefore it's cool to just open that one directory.
Is that what you're saying?
Yeah, like I would even have like even like personal work.
I don't think it really.
Yeah.
I don't ever see the model, like, going into other directories that it's not meant to.
And I do actually watch quite a lot.
Like, I don't, maybe, maybe this is a mega security thing.
But I think it's pretty good.
I have a working directory, like, I have a directory of working code that I'm currently working on on my machine.
And I just open, I just open it in that.
And I tend to reuse patterns a lot.
So I mostly work on open source work, right?
So for instance, how the development worked with the Cloudflare MCP server was I built it as a POC.
I published it on my personal GitHub.
It was published.
And then once I got enough buy-in, once people thought it was good, once the quality was there, then we moved it over to a Cloudflare legit one and we did a big release post.
And I do that with quite a lot of stuff.
Like, there's procedure and things to making a new repo on the Cloudflare org.
and like it has to meet a certain quality bar.
So for just POCs and testing and stuff,
I still want version control.
So I just use my personal GitHub and it's fine.
Well, I use my, yeah, I just use my own org.
Well, that's clearly a lie to do that without have any,
any real issues.
I know that, I mean,
it's so sensitive whenever you're,
I mean,
whenever you're in your position as a brand as a company,
you do have to have,
you know,
locks on the doors.
You know what I mean?
And that's not so much not a lock on the door,
but that's cool that you have that.
kind of autonomy to, one, explore and two, not get any backlash for publishing to your
personal GitHub, where it's like you could be seen as like, I'm trying to take, and you're
not obviously, if I'm trying to take the Cloudflare Thunder. No, in fact, I'm going to innovate,
and I'm just trying to bother our main org and our brain integrity with my little toy here
until it becomes not a toy. I like, we have private internal version control as well, that we also
use. But for things where I want to share it and I want to even see other people if they're
interested in it and things like, it just makes sense. Like you want to get it out there.
I think I'm in a very special situation in the company where I work like predominantly on open
source. And so there is a more freedom like allowed there because anything that I share
is public by its very nature and is going to become public. If I, if I'm working on the agent's
SDK, like, we have to be really, it's hard to have even a proper release because everyone
sees what you're doing as you're doing it. And so, like, to even, like, do a bit of
experimentation is, like, quite, it's quite tough not to get found out. And, like, you still want
to be able to do a proper release even as an open source library. Well, friend, you know,
I'm a big fan of Tailskill. And you know what? I could not do anything. I'm serious,
anything without my tailnet. I'm here with my good friend, Alex Kretschmar, from Tail.
scale, Alex, how do you describe tailscale versus a VPN? How do you describe tail scale to someone who's not in the know?
Well, the biggest difference between tailscale and a traditional VPN is how the traffic flows.
When you look at a traditional VPN, the traffic flows through a central hub and then out to your client devices on the back end.
With tail scale, every device makes a connection directly to every other device.
And that means effectively you're cutting out the middleman and you get much better performance as a consequence.
And so that mesh network that you've built, you've got to have a way to control how the data flows between different devices.
Because you don't have that central choke point anymore, we have a thing called access policies, which allow you to granularly define using ACLs and grant policies,
which nodes are allowed to talk specifically to which other nodes, on which protocols, on which ports, and which users are allowed to even connect to different things, all over the tailscale encrypted tunnels, which underneath use the Wiregard technology.
Yeah, it's not my land, it's my tan, my tailscale area network.
But your word for it is tailnet, right?
The tail net is the word that we invented to call the logical grouping of devices that form your tailscale network.
Much like you might have a land of devices or something like that at home.
Effectively, the tailnet, we call it something different because those devices can transcend physical locations.
So you can have a server in the cloud, talking to your phone on the bus, talking to your servering,
in, I don't know, the basement of your mom's house across the other side of the ocean.
And that tailnet is a flat network that only you can connect to and access.
So that's why we call it a different name from anything else is because it's location
independent and you can connect to it anywhere.
Well, friends, check out taelscale at taelscale.com.
Totally free for your home lab.
And, of course, paid for your teams, pro and enterprise.
But literally, I could not do anything I'm doing in my home lab, in my dev lab,
without tailscale connectivity.
I'm out and about.
I'm here.
I'm there.
I'm everywhere.
And I've got to access my home lab,
my dev lab resources,
and Tail scale is how I do it personally.
And you should too.
Once again, check it out,
taelscale.com.
The reason why I ask you about how you actually work with Claud is
because, you know,
I think that's the curiosity of everybody.
Like, a lot of us are to some degree working in silos,
even if we're working together.
Because there's even speculation of
how large of a team can you actually work on in this new era because of how much you can get done in one slip versus as a team where you'd have to collaborate a lot more on a major feature.
I'm not sure you may have some inbound conversations in Slack or, you know, that kind of thing, maybe a pull request, review or something like that or in your case of POC as an actual repository.
But I feel like in this world, what I'm hearing a lot of is like it's actually kind of hard to.
to work at this level of ability and collaborate at the same time.
I think it depends how you like to work.
I'm not saying my way is the way and that I'm like six months in front of all you guys.
You should work like me because I don't think so.
I thought Dax from SST anomaly, OpenCode shared a really interesting post on Twitter the last
couple of days where he was talking about how everyone sounds like they put it all put
together. But really, he doesn't think so. And he knows that they don't have it all worked out.
Like, they're still working stuff out. They think they're faster with coding agents than without,
but not entirely sure. And it was really like a push to be like, can we just leave everything
better than we found it? Like that whole thing of like coding agents, yeah, sure, they let you work
very quickly in the short term. But let's, let's go through our code base.
Let's build everything the right way, the way that we're proud of, not the way that Claude told us to do it the first time round.
I thought it was really pretty.
I think everyone should remember that, yeah, sure, we're there to do a job, but we're also there to ensure that when the next person comes to have a look at the job we did, they can actually have a clue what's going on.
And it does work properly, and it is tested.
And there's a lot of slop being thrown around, a lot of slop PRs.
I think on the agents SDK, we actually closed like PRs from external collaborators for the time being.
Not to say we weren't open them again, but it was just getting too much.
The way that open source works has to change.
But us as a team is quite kind of interesting.
We support essentially three products on our team.
And our team was five until very recently.
We support the agents SDK that I've talked quite a lot about, how we build agents on Cloudflare.
We support MCP.
So how people build MCP servers, MCP clients, how we build those on Cloudflare and also the Cloudflare supported MCP service.
So the new one we just built.
And also we support all the ones that Cloudflare published last year as well, the external ones.
So we support those two avenues.
And we also support sandboxes.
So the whole sandbox product in Cloudflare comes from my team as well.
And so there was five of us working across these very three distinct parts.
of building agents.
And so there's a lot of surface areas.
So I think we're, as a team, we're pretty well versed
in like having our own domain and like building out
what we think should be built out in our own domain.
It's very hard to get under each other's feet
because there's so much space.
Are you six now or are you four?
I think we might be six now, seven very soon and eight very soon after.
It's going good.
How do you, are you autonomous in terms of like which product you focus on at any given time?
I'm sure there's missions, of course, and there's directives.
Yeah.
Whenever you think about, okay, like you said that Tuesday, when you sat down and innovated in this way to,
to give us these 2,500 plus APIs in a thousand tokens or less kind of thing.
How do you sit down or how do you even think about your work when you, when you're split across your products?
you shiny objected or is it pre-directed or you totally autonomous? How does your wind blow when it
comes to that? Our team is, it is kind of special in Cloudflare and it's changing a lot. So if we
have this conversation in six months, it might be a very different situation. But our team is
very new. So we, I think we were launched as a team under, under a year ago. And I joined in
October. And people have been joining every couple of months, basically, for the last year.
So I focus on MCP.
I also do a bunch on the agents SDK to help support MCP, to help support people building agents.
And I'm focusing on memory a lot at the moment and how we can build out a story for that.
And just support developers building on Cloudflare there.
Other people on the team have different specialties.
We basically all contribute to the agents SDK and then like nourish on our team.
He focuses very much on sandboxes.
Like sandboxes is his baby and he's built it from the ground up.
now he's getting some support on sandboxes.
But we were always all contributing to agents SDK, even if we were doing our other stuff.
Because it all ties back in.
Like, we need to have like this cohesive story and be one cohesive team.
And we're all, when I build an agent in my spare time, I use all of our products.
Like all of the SDKs we produce, I use.
So there is like, I think the main worry for our team is like how do we, how do we not end up with like domain specialists too much?
And we have a nice tracker about who's submitted PRs to different repos.
Because I think there is a worry there that like I haven't committed to sandboxes.
I have no idea what's going on there.
Like how can I answer a question when someone comes up to me at an event or something and talks about sandboxes?
Or when I like I'm developing on it myself and I find a bug, like it'd be really nice if I could fix it.
Like just very basic stuff like that.
I mean, that is the worry the way we do our team.
But I think like everyone is just so interested in building age.
agents and all of these are critical parts of it, that we float across it each other quite,
we float across it all, all the surface quite well.
Yeah, I'd be a little worried about that too, especially when things move so fastly.
I mean, like it didn't move this fast before.
And I guess a yearish ago, it was a little easier to, to have that disposition where you say,
you know what, I'm focused on agents in MCP, but if I don't contribute to sandbox,
quite that often, it's okay because it's not moving at the speed of agents, you know, which
was the case beforehand, but now it does. And so I would personally, if I were in your position
or any, or on that team, I would feel like a little anxious. I'm not keeping up. And I don't know,
I guess this is how I feel about most things, really, but especially if I had, you know,
my particular sliver that I'm focused on totally like agents and MCP. And I'm really
curious what you're talking about with memory, what you're doing there. But I would have some
anxiety about, my gosh, how do I even maintain any version of contact?
around sandbox is when if I step away for a week or I don't pay attention to some of the
side chatter, how far back do I go when it comes to progress?
Yeah.
I mean, we've got to be as forward-looking as possible.
I think our team attracts a lot of dreamers, I would say.
Yeah.
The guy that started our team is like an absolute dreamer.
Like he's thinking so far in advance, like I.
I really respect how he can think like that.
And, yeah, learn from that as much as possible.
The aim for us is to be ahead of the org.
What the org wants, we should already have, like, ready for them to use.
And I would say a year ago, we were quite a long time ahead of the org.
And now everything is going faster.
Like, there are some people building insanely cool agents at Cloudflare.
And yeah, that's where the memory thing is coming out of it.
Like how can I best support them?
How can I support the developers building on our, on our platform?
Yeah.
Yeah, I think we're all stressed about like falling behind.
That's why you'll find that a lot of Cloudflare people are like permanently online,
maybe a little bit too much.
Yeah.
If you, if you want to, if you want to throw shade at us for anything, like it won't be
because we're not, not receptive to feedback online.
Just curious on that note.
And you can, you can blur the line if you want to or not give the exact number.
How many hours you think you work a day?
And don't just say in front of the terminal because when you're making your coffee or you're on your back patio or you're walking your dog and you're thinking about work that's still kind of work in a way.
How much time do you truly separate from the problem set that you're dealing with or working on?
And how much of that turns into like Matt's life?
I don't know.
I think I'm thinking about this stuff all the time.
Like 23-7?
227.
I like my sleep, you know.
Like, I like eight hours minimum.
Well, I'll admit the moment I wake up, I'm going to sleep thinking about a problem.
I'm waking up thinking about that problem.
It's a sign of a good problem.
Zero am I even throwing shade at you.
And I think the reason why I ask this question is more of a reality check to our listening audience
because I know there's a lot of folks feeling like either they're not dipping their toe in
and they're abrasive to the situation,
and they're kind of late in a way, but still early,
which is kind of funny to think about.
Or they're just like you and I and others where they're like,
I mean, the race is on.
I just can't stop thinking about the things I want to change or do,
and there isn't enough time of the day.
Now, I'm not eking into my personal life
where I can't live my life by any means,
but I'm definitely thinking about the problems I'm trying to solve
far more than I ever have before agents into my life as a reality check.
Yeah, I don't know if it changed.
Since before agents, like I started my career writing code by hand, as most people listening,
this probably did.
I love how you said that.
That's so awesome.
Yeah, well, I mean, you've got to, you got to preface this.
It's all organic.
All organic code, you know, organic.
written by me. That's right. The OG code. Yeah. Yeah, definitely worse than some of the
code that cold spits out, definitely. I think I could always get very engrossed by a problem.
Like my girlfriend, she gets so mad at me sometimes. I'm just like, I get super sidetracked by
stuff, like incredibly attached to a problem and a solution. Well, more of the problem than the
solution. But so I don't think that has changed at all. I think what has changed is how I work. So I spend
much more time dreaming about like a future world and like a future things that I'd like to build and
or like thinking who might be best to build them. And when code is cheap, like you can build more
stuff, but you also still have a limited amount of time. You can't build everything. And like the hard
things are the things that, like the cool things are the hard things and those are the things that
take time. And there are like, there aren't that many quick wins there. You need to put in the
hours every day to like work out what you want to do and how you want to do it. And I guess now I'm
spending less time coding, like manually coding. I mean, I don't actually do that that much anymore.
And I'm spending much more time thinking about like what, what I'd like to build. But I'm always
thinking about the problems, I guess. Now, much more scatterbrained. So previously in the past,
I could sit down for eight hours and just code for eight hours. And like, that was great. I was
stayed in a terminal or I stayed in an IDE. I like never left it. I like knew, had enough knowledge
about the domain expertise and what I was building. I could just like smash it out. Was there now,
because I sit at things, I feel like the coding agents sit in between me and the code now. So I am much more,
in the back seat, or at least like in the bird's eye view, over multiple different things,
not normally just one, because I can, right? But it does, there is a compromise there that
you do feel much more scatterbrained. You're like, here, you're there, you're like, you have to
dive into this, you have to dive into this. And traditionally, not super good at that, I'm not going to
lie, hugely bad at multitasking for me. So, like, getting that compromise, right, I envy people who
feel like they can productively prompt like six versions of Claude Code or Open Code or whatever,
six coding agents at once.
I just don't see how that is humanly possible.
For me, I reckon I got three in me max, because I think I can only do three problems in my head at
once and still have a meaningful output to each of them.
And definitely over two by like, well,
over three, 100%, but like my capability,
my capability to like do something hard,
massively reduces.
I feel like for me,
three to six a couple times a week
is where I'll catch myself there,
not like I intentionally go there.
Yeah.
But I rather enjoy a one-to-one problem
except for when I'm waiting for it to like do the thing.
I find it slow.
I find it slow to one-to-one.
I can't do it.
So I kind of have to do one thing, but multiple things on that one thing, I suppose, is the way to describe it, where I guess that's still kind of three.
But it kind of depends, right?
It's the traditional, it depends, it's an area there.
Because when you're waiting, what are you doing?
Like, you know, maybe even your own spaghetti and your brain is getting unravel where you think you have the context.
You're sort of planning things.
So I kind of feel like my zone is like two to three because one is two.
and it's not too slow for me. It's too slow because it's doing its thing and it's doing
dramatic stuff. It's doing a week's worth of things and that 30 minutes I'm waiting or whatever
or that three minutes or four minutes I'm waiting. So I find like I have to be in the three zone
almost always. But then even multiple projects that are uniquely different but similar,
I find that's a couple times a week.
And if I do that more than a couple times a week,
I can get in that zone for hours, three, four hours, really,
where I'm working on like three or four different projects
and like three or four things per project.
That's wild.
And I'm not like prompting.
It is kind of wild to do that kind of stuff.
Really it is.
And I haven't sat back and said,
how well are you doing?
But what I can see is the get commits.
I can see the progress.
I can see the improvements.
And I see the real thing deployed and usable,
not just this fake thing that maybe, you know,
this agent psychosis kind of scenario where,
you're like,
I think I'm making progress.
You know what I mean?
I see that.
I actually like on that note,
if I'm doing one-on-one,
I actually often find myself sabotaging the model
because I'm thinking faster than it is writing.
And so I start writing stuff.
to correct the trajectory.
And I think that's really bad.
Yeah.
I think it meant because during the planning phase,
I got bored,
it meant that during the execution,
I'm just constantly fighting like it's trajectory.
And so,
and I do use plan quite a lot.
I also get plan and then get reviewed by another model.
I have a skill for that.
It's really, really good.
I think I nicked it from someone on Twitter.
I honestly,
amazing would recommend.
But when I do two or three,
then building the plan is better because I can set off one to build a plan and then get reviews.
And that might take like 10 minutes.
And then I can set off another one to do it.
And then the third one.
And then by the time I'm like done three, I'm like, can take a breather.
And I can be like, right, let's go into the first one and see what it's come up with.
And like, it's right on this plan.
And that's so much better than being like just one on one.
Oh, wait five minutes.
And now sabotage.
Like, because I want to, I just want to implement now.
I think giving it the time is nice.
So we have the cycle repeats itself for you.
This is my cycle.
I don't use plan mode a lot, but I do a different version of planning.
So I wrote a go-c-l-I and this flow I created called Agent Flow.
And it's a lot of, I guess, context dumping in a way, but I'm making plans.
and those plans are called PEPs.
It's stolen from the Python world where it's not a Python improvement proposal.
It's a project improvement.
I guess it's a, what's the E stand for again?
Enhancement, that's right.
I'm like improvement, enhancement.
Project enhancement proposal versus Python enhancement proposal.
And so what I find is I will either make a true spec based on RFC,
21119's protocol for like must should things like that and those are for bigger things
you know like the way an API should function or what kind of error codes we should respond with
and things like that like what the API surfaces so I'm speccing an API or different things
not literally every possible thing is getting a spec but here's what I'm trying to get to is
what I often do and maybe this is how it works in plan with for you is I just trust the model
and I say after they present the plan to me,
I ask it to review that plan for lack of clarity and blind spots.
Just that one prompt response back to it.
Like nothing else.
Not here's what I think is wrong with it.
Like I told it what I wanted to do.
I'm telling it where I'm at, what the gap is and what we're trying to go.
And so the problem is there.
And I'm trusting the model to kind of get us there.
And the plan is the iterative process.
And so once it presents this plan to me, in my case as a PEP,
I just say review that pep for lack of clarity and blind spots and it will go and it will review it and it comes back and it's like well we're missing this here and that's not right there.
What do you think the next thing is that I ask it after presents all these challenges from high to low?
What do you think I tell?
Fix the plan.
No, no, I don't.
No.
What did you tell?
Kind of yes, but no.
I give it one more little nudge because I want to trust the model.
I say, what are your suggestions for each?
That's literally all I say.
What are your suggestions for each?
It goes and it erased through each suggestion that gave back to me of all the problems.
It's like, here's how I'd solve it.
Here's how I'd solve it.
And I'm like, what do you think are respond back with after that?
What do you think?
My next prompt is.
Fix the plan.
Do it.
Literally the words, do it.
Okay, so let's make the plan, present the plan.
What clarity and blind spots are missing from this thing.
Present it back to me, a big old list.
What do you suggest for each?
It goes and does this thing, presents a plan back to me, do it.
That is literally what I do.
This is essentially, if you think about what you're doing in terms of like 20, 23 prompting techniques, it's like you're doing reflection.
By asking the model to look back at itself and see whether it's done anything silly.
And then by asking for suggestions, you're doing chain of thought prompting because you're getting a new train of.
of thought to go back on the original one.
Yeah, so it's this reflection plus chain of thought.
It's like, yeah, it's just really funny how all of these prompting techniques come back
around.
And what's even cooler, I think, is that the likelihood of those prompting techniques being
reflected in the underlying training data is, I think, super high.
So for Opus 4.6, yeah, for Opus 4.6, I find it often, it like, it will do like, wait
at the end, here are suggestions for each. So I do find it often does that step for you. Like,
it doesn't need to be told. So it does. And I kind of feel bad about asking it for more.
But all it did, it presented a bunch and then it kind of gave me three. So it may have given me a
list of, let's just say, six to 12 issues in the plan, right? And it comes down with like three
or, it always gives me some version suggestion. But that's not the real suggestions, man. I mean,
go back in the list. You know, what are your suggestions for each?
Each is a more, you know, four I loop through all the thing.
You know what I mean?
Like, that's all I'm really hesitant to do.
And I get such great results with that.
And then I kind of feel bad with my final prompt being like, do it.
Yeah, do the thing.
Because it feels so not smart on my part.
Do it.
Yeah.
None of this is that smart on our part.
So I think we have to accept that.
Like, I think the smart thing is knowing when you sit down to the computer,
what do you want to make?
Like, like, like, what?
The intent.
Where are we going?
we do. Yeah. And I think the suggestion side of things, like, I actually review those quite
heavily on each plan iteration because I do want to make sure we're following a trajectory.
Maybe that's my like, own nervousness around the model, but I do want to make sure we're
following the right trajectory that I have in mind. Do you ever use voice to get longer prompts?
Just recently started to do it. Actually, there's a cool thing called.
Handy. Handy.computer just mentioned that this week in Changeold News. It is an open source
voice to text. It's all done on your machine. It's free and open source. So I mean,
you know, a lot of safety there in terms of like what you're putting out there. I've tried it a
few times. I like it. It goes in any text box. You give it. But it's kind of hard to always
default to that because some things are technical and you got to, you can't speak a command very well.
or syntax or a file path or things like that.
So I find that I've just learned to type faster and more clearer.
And it keeps my brain in it more than I think it out loud.
Because if I talk out loud, I will talk a lot more to my podcast.
Whereas if I type, I'm more terse and more clear.
Whereas if I speak, I'm more ambiguous and thought provoking and meandering, so to speak.
you know, like I just would say the word, uh, and it's like, what are you talking about here?
Whereas if I'm, I don't never type the word, uh, as I'm trying to speak, because that's not
what happens when you, when you write.
Yeah.
That's an interesting avenue.
I know the AMP team have some thoughts around this where they do.
Yeah.
Yeah, they specifically made Enter just make a new line on in AMP originally, like in the
sidebar version of AMP, rather than command enter or control enter or share.
or whatever it was actually executed the prompt.
And the thought process behind that was like,
we want to encourage users to make longer prompts,
to make larger expressions of intent to like fully scope the problem at hand.
And if we make enter a new line,
then they might have some inspiration to write more stuff.
I think with, I finally got around to writing longer prompts.
And I'm very excited by Claude Codod.
just added voice support where you can hold space bar and have a speech to text model.
So I can like speak for five minutes or for 30 seconds or however long it is.
Then I can dump.
I can do like I have a clipboard.
So I just like command v, command V, command B, all of my context in below.
And then I think I end up with quite a nice, quite a nice prompt by doing that.
I'm very excited by that flow.
And I trust Opus 4.6 way more to like execute.
for a longer period of time.
I think in the past, using cursor and using cursor with,
I don't know what model it would have been at the time,
but like probably Sonnet 3 or sonnet up to Sonnet 3.5,
using cursor with those models, I would like send something.
And then I'd be like, oh, no, I meant this thing.
I need to add this more information.
And then I would like cancel the original prompt.
Yes.
And then like compress it.
And by the time I finally got a prompt that I was happy with,
I'd actually sent it like six times.
and then like canceled it and brought it back and all of that flow I thought was awful.
So I'm really consciously trying to make that problem, but more well-scoped.
Yeah.
So we got there by talking about the things you're working on, how you focus on agents, the open store stuff you're doing there, MCP.
And you mentioned that you're starting to think about memory.
Yeah.
Can you take me into what you mean by?
that, what are your thoughts on that, what's attracting you to that? How far are in are you? Do you feel
in over your head? What wisdom do you have? Do you have any wisdom at all? Where are you at?
I have felt in over my head for the past. Oh, my whole career, I'd say. Good for you.
It was crazy. In a crazy world we live in. I think before when you were saying about feeling left
behind, I think so many people feel slightly left behind or a lot left behind with this version of, like,
I think if you went and spoke to my friend, a lot of my friends in London, I actually recently
moved to Portugal, but a lot of my friends in London, the software engineers that I knew that I lived
with, I went to university with, the amount of them still not using AI at all, it's like, wow.
And then you realize that you're in this tiny little microcosm of people who are just obsessed
with this slot machine in a terminal.
It's freaking wild.
I don't know.
What was the question, Ann?
See, there you go.
There you go.
I'll bring it back.
Don't you worry.
Memory.
It was about memory, really.
I think it's cool that you're, I mean, I'm fine to even step back in there a little bit.
I mean, I do want to talk about memory and what you're working on there because I'm
curious about how I've never played with the memory side of things at all.
And so I'm super curious.
But, yeah, I can talk very briefly about memory.
I even have friends, too, that are zero.
Like, I just, here's an interesting, somewhat of a tangent in a way,
but I think it may play into what you're talking about because it's totally right of developer.
I was visiting with my, my newest doctor.
And I live in a small town outside of Austin called Dripping Springs.
And the doctor I go to, oddly enough, I live, I'm fortunate.
enough to live in a town where we have a concierge's doctor.
And so I don't go there with insurance.
I go there as a concierge's.
I pay out a pocket.
I won't explain it all.
I can use my agency against it.
But the point is,
they're a concierge's style doctor where you can be a part of a subscription and you
can go there as often as you want to and they're all about your health.
And it's not about giving you a medicine or a pill.
It's about root cause issue in your life from therapy to exercise to meals to bowel
movements, oddly enough even.
And I'm sitting down with this person and she's a well-trained physician, you know,
well-trained doctor and she's got this new practice.
Now that I'm telling the start, I'm realizing how much of a tangent this is, but follow
me.
And I'm sitting down there with her and I'm talking to her about her business because
I just naturally I'm an entrepreneur and I think business and I think in code and all the
things.
I'm very right-brain business and very left-brain developer.
which is a fantastic place to be in life, I think right now, especially now.
And I'm sitting down with her and we're going through this data that I have.
And it's in this PDF.
And it's on her screen.
And I'm like, how will I get this later?
And she's like, yeah, you'll get the PDF later.
And I'm like, but your conciergeist doctor, don't you think you should have like a, this is my brain?
Don't you think you should have like a formalized patient of record in your business, you know, and this kind of thing.
And like, here's this, here's this woman who's just really well off.
and doing well.
But she's not thinking about the data problem that people like you and I think about.
And I'm thinking, gosh, I mean, the thing that scanned me earlier probably has an API.
You could probably pull that data into my record.
And then you could do that for everyone in your practice.
And you could truly live up to your concierge's doctor.
And I guess the reason why I tell you that is that you got these people out there,
these folks out there who are super intelligent, but they're not thinking about AI at all.
And she was telling me how she's, she's really good at what she does, but she feels a little overwhelmed about the business side of her business because she's not really a business person.
She's not designed to be a business person.
And my response to her was, just use Claude.
Do you know what she said, Matt?
What did she say?
Back to me.
What do you think she said?
No idea.
What is Claude?
Yeah, nice.
You know what I'm trying to say?
Like, gosh.
And so I had a brain dump on her.
I'm like, okay, there's an API behind this thing here.
Here's how you can pull your data over there.
You need a Postgres database here.
I was like, okay, Adam, you're going to nerd.
Then I explained like, she's like, and when I got to explain it to her, she's like, whatever that is, I need that.
Can you do that for me?
I'm like, yeah, I could probably help you with that.
So now I have another job, by the way.
That's wild.
Helping my doctor, you know, formalize her practice on the future of AI.
And so all this to say is that you've got your friends who are developers that are not using AI.
We've got folks that are super intelligent like doctors that are not really fluent in AI.
And it's 2026.
It's March 2026.
And I'm a little nerve-wracked by these folks just like being so delayed.
You know, even developers, you know, there's going to be some people who listen to this thinking like, Adam, stop drinking the AI.
Cool it.
I'm an AI maximalist.
It's not going away.
The more you lean in, the better off you are.
And you can probably attest to that, Matt, with what you're doing.
But I feel like the folks that are just delaying it or feeling behind, I don't want them to feel behind.
But at the same time, like, it's not going to go away and leverage it.
I said, hey, if you don't know how to run your business or you need more help rid of your business, put all your problems in the cloud.
And it will help you at least make a system to solve them, not actually give you the solution, but help you get to a solution.
And no one's getting that.
It's like cheating on your homework.
It is a cheat.
It is a cheat.
All right.
Unless you have anything to say about that,
let's end that tangent and go back to memory and stuff like that.
What do you think?
Yeah.
Slightly on this point, I recently got my dad using Grinola,
and he's a doctor,
and he's kind of fed up where he writes so many notes,
and, like, Grinola has completely changed his whole workflow.
Oh, I bet.
And he like sees people on Zoom all the time.
And he used to see,
he sees people in person as well.
Now he just starts Grinola,
writes up all of his meeting notes.
And it's like,
it's just,
it's been like transformational for him.
Just that like basic summarization.
Like Grinolol is a great product.
Don't get me wrong.
It's stunning product.
But like,
it is not a complicated workflow and it's like completely changed his like quality.
Like how long he spends doing his consultations.
So yeah, I guess like shout out there.
Like there are there are some small things you can try.
I love granola by the way.
I'm a fan.
I'm actually paying user of granola.
So yeah.
Yeah.
Oh wow.
They should pay me.
Come on, granola.
Pay me.
Yeah.
I love granola.
It's amazing.
And I'm with you on that too.
I think I've even DM'd the designer.
I can't remember his name in the moment, but I think he's named Sam, I recall correctly.
Yeah, Sam.
Sam's one of the founders.
Yeah.
So answer your DMs.
But I'm a big fan of granola.
I think that's revolutionary.
Same thing.
My wife introduced somebody to who's also a doctor and she sits down with folks.
And she would spend three hours of literally every evening cramming all of her notes.
Yeah.
Well, in this new world, you don't have to do that.
Now, you do have things like HIPA compliance here in the United States.
You've got different healthier concerns where you have privacy and stuff.
I totally get that.
You should abide by all those things.
And if we don't have systems that support them.
that we should.
But imagine the unlock in your life where you're a teacher or a doctor or someone like
that where now you don't have to like arduously plan and think about your note process.
Now you can sort of have a lot of it formalized for you.
And you don't have to do all of that work to even report back to folks or summarize
this 45-minute session with a patient or a friend or a colleague or whatever.
You can have it do it for you.
That should just be the way.
Anyways, I think we could probably go on that front for sure.
I actually use Grinola on my personal stuff for if I have a really fun idea and I want to write something up about it.
Because I'm actually horrific at writing.
Like I would call myself a critique of writing or a critic of writing rather than a writer.
Like I love reading and I've read a lot since I was very young.
But I really struggled with putting my words to be.
paper in a way that like flows and makes sense and is cohesive and has a start, a middle,
and an end and all of the good stuff that you need for writing. So something was like,
dude, just start Gradola and chat to it, go for a walk and chat to it. And then come back.
And then, yeah, and then make a really good prompt that's like, this is what I want to
achieve from this. And that was the first iteration of the code mode blog post was something,
yeah, it was something similar to that. I thought it was.
really, really good because, like, I got the points that I wanted to get in because I just
spouted to the AI. The AI listened to me. The AI didn't quite summarize, but picked out
the key bits of information because I really hate summaries. I think that they're rubbish that
one person's summary is another person's, I don't know, mud or something. It's like really, really,
really hard to get something that summarizes something well while maintaining the full information.
But things like granola, you can export a nice blog post if you know exactly what you want.
And I tend to know exactly what I want.
And I think AI is like an unlock for very opinionated people because you don't have to do the thing.
You just have to be very good at critiquing the thing.
Absolutely.
Absolutely.
I actually like that idea a lot.
I'm glad you mentioned the personal use of granola because I have not considered granola.
in that way where it's my personal note taker.
Because it's great at that.
Hey friends, I'm here with Dan Mangus,
co-founder and CEO of RWX.
Dan, what makes RWX and the way you're doing CI
so different and interesting to our audience?
Obviously, we're talking to you
because we want to promote what we're doing.
We want more engineers to become aware
of what we're doing at RWX.
But I think the thing that's interesting to me
is that RWX is really kind of the first major evolution
in CI.
and the approach for CI.
And this is just highly relevant with agentic-driven coding.
You know, CI has largely been the same since the advent of the practice.
But these platforms were created when being able to run code in the cloud was really valuable.
The fact that you could spin up virtual machines that would run some automation on a Git push
was, you know, really impactful for engineering teams trying to like build
good developer processes and tools.
But that's kind of the extent.
What we've done at RWX is we've taken state-of-the-art techniques,
used in build systems at organizations like Google and Meta.
You know, Google has their internal build system blaze,
inspired the open source Basel tool.
But every engineering team I've talked to that wants to adopt Basel
who just found it extraordinarily difficult to use and configure.
You have to have a dedicated engineering team to build and maintain the rules.
It's hard to extend it to work with different types of languages and frameworks
that engineering teams are looking to adopt.
So it's been, you know, too prohibitive to actually adopt, you know, those technologies.
But the ideas behind Basil are really impactful.
They're similar to a lot of the ideas behind Nix.
I would say Nix is kind of very similar,
you know, in the difficulty to adopt.
And effectively what we've done at RWX
is we've taken those techniques.
We've made it very easy for engineers or agents
to actually adopt and utilize those,
which namely are the automatic content-based caching
and the graph-based task execution,
which means that RWX eliminates all redundancy.
You know, whereas other platforms are having to run
the same setup steps, on the same jobs,
in every virtual machine that's spinning up.
RWX can run the setup once on one machine
and then fan out accordingly based on just your dependency graph.
So effectively with RWX,
you never have to think about parallelization at all.
On other platforms, it's always like,
well, do I add this onto the existing job?
Do I make a new job for it?
But I have to duplicate all that setup.
With RWX, you just define the tasks
that you want to run in the dependencies between it.
And we will run it with maximum parallelization,
based on your dependency graph.
Well, friends, a good next step is to go to RWX.com.
Learn more.
Check out CI in a whole new way.
Once again, RWX.com.
Let's talk about memory.
So we're going back into the Dietz.
We're off of our personal soapboxes about how AI has changed our life
and how it's taken some away, how we can't stop thinking about it,
and how we prompt, et cetera, et cetera.
But take me into the world of, I guess, next few.
agents, MCP, where does memory fit in?
Yeah, so memory, it's like such a loaded term.
It's such a loaded term.
So it is quite hard to know where to start.
Essentially, I want a way for my agents to remember a conversation that we're having right now
and be able to refer back to context that I gave previously in the chat,
but also to like remember conversations over time.
and also to be like very programmable.
So I work on SDKs,
like developers are going to program with my SDKs.
It's like how do we,
how do we build something that's mega customizable
to like the next new,
the next new trend?
Like for instance, skills,
like skills are just a markdown file
that's loaded into a context on demand by an agent.
How can we support that in a memory system
that can also support compaction of sessions,
can also support content,
learning can support like the migration of a session to long-term storage so an agent can
like search over it over time. I guess I'm just trying to work out the shape of those APIs right
now. There's some really good examples, like maybe not examples, but there's some really good
inspiration in the TypeScript world at the moment. Like letter is very, very cool. Letter
just to name like a couple,
they all have some,
some cool memory stuff.
And I know there are,
there are some really cool memory startups
that are actually like doing managed memory,
like super memory.
Like I just like shout those guys out as like really good inspiration
with what we're trying to do.
What we're trying to do is not trying to replace anything like that.
But it's like how,
like Cloudflare has some really cool storage primitives.
How can we let developers best use those storage primitives
in the,
in the function of making a better agent?
And I realize they're all questions rather than answers.
And I don't have a huge amount of answers.
So I'll probably keep my powder dry on that one.
While you were sharing your ideas there, I was jetting down an idea I had.
Now, this may be totally wrong.
But this is how I'm currently thinking about if I was in your shoes.
Go on.
All chat captured to mark down or just plain text in some way, shape, or form.
So all your before compresses and goes away, it's captured.
and you could probably use an AI gateway for that.
Then you sent up an analysis across all that history.
Then you vectorize that into a database.
Then you SDK in front of that with two calls.
Search and what was it, execute?
Was it what you do?
That's what you do right there.
And you just treat your vector database on the sentiment analysis
that you've been capturing as plain text,
just like you do your APIs.
That's how you do it.
Yeah. So is that wrong or is that not even close to right? How would you approach you?
No, I think you're close. I think you're close. So there's a few things that I can't do with that that maybe consumers of my SDK would want to do. So I can't be that opinionated on on where the data is stored. Like some people might want to store it in a durable object in SQLite. Some people might want to store it in planet scale. Some people might want to store it. Like their data is going to live somewhere. And people normally very.
opinionated about that. So I can't be like, here is a vector store you must use.
Although I have to have the ability for people to use like vectorize if they want to.
So I need to go with more of a provider-based model, I think, in terms of like API design.
And then the next thing about search and execute being a thing, yes, yes, definitely it's a thing.
You've already made it, right? I mean, that's the model.
Just leverage it.
For longer-term memory, I think, and for search, like, for things that need to be loaded on,
demand, yes. But there are cases where you would want to programmatically load context into a
session. So the easiest one is, like, if you think of like a system prompt with some
direction, like in open claw, I think they call it sold or MD, like what is the agent? Like,
who does it respond to? Like, like, what is its personality, all of this little stuff? This
would need to be loaded on demand on the start of every session. So this is like slightly different.
And then the next one maybe is, like a to-do list is some sort of working context, you know, like Claude Code had a to-do list.
I don't even know if it still does anymore.
But it kept the agent on track for a while.
Maybe they RLed this out.
But at some point, people wanted a to-do list that the agent could fill and modify over time.
Like, this, that enables you do really cool stuff like create Ralph loops as well, which maybe we can talk about some other time.
But I need a way to be able to store all of this context in a way that's super flexible
and also have that ability to do continual learning and this and extraction of facts
and also have the ability to like for the agent to be able to pull in stuff like skills.
So it's multifaceted and I'm still trying to work out like in my head like what I want to focus on
because I don't think I can get all of these things right in the first time.
I just need to make something flexible enough that when the new things do come,
we can add them in without breaking changes.
And when you're speaking of memory, you're speaking of it as part of one of the Cloud
Thor products you work on, not so much.
I mean, I'm sure you have personal curiosities and how you can leverage it personally,
but you're talking about how you can bake it into agents, for example.
Yeah, I think at some point this might be a separate SDK, but yeah, like agents SDK will be
where it lives initially.
Yeah.
So people building agents on cloud-fledgerable objects.
But like theoretically there is nothing to say, like if you're building something on a
ECS somewhere, like a container somewhere, if you're building it on like a Lambda function
somewhere or on like you have your your next JAS routes on the cell.
Like it should be pretty cross-compatible for all of these things.
Like there shouldn't be anything runtime specific.
I think the provider model there will help because, yeah, sure, we can use the durable object SQLite, but also if someone wants to use Neo on a planet scale, they should be able to do that as well.
Yeah, for sure.
Would not want to dictate where you can store it at, maybe even one to many stores.
I don't know how the hard that would be, but, you know, that's where I would start to, I mean, that's what this is, right?
It's all exploratory.
It's like, that's the basis of how I would initially approach it.
And I might hit two brick walls and hurt real bad and learn something new and read.
a book. You know, I've become a real big fan of ePUB books. I've got an ETL that takes a book
from EPUB to, you know, really good markdown and then sent up analysis on that and then
vectorizing things across it and just searching it with DuckDB and Parquet. So reading a book
now is like is way different than it was before. So thankful for open format ePubs out there
because that's the way to do it. And like between DuckDB and Parquet and this, I mean, that's super
fast. Those few things there would really lean into what you're talking about with memory and that
that lookup process. It's super fast. Yeah. No, definitely. And like, I was chatting to some of the
more data engineering people in Cloudflare. They were like, yeah, so how can I use Clickhouse?
How can I use Clickhouse? And I was like, ah, ah, shit. Sorry, you beat that one. But like,
like, how, how, yeah, I don't know. I don't know. What's special?
like, what's special about Clickhouse that you don't think you can get from Postgres?
And then he kind of rolled his eyes at me.
And so that was how the conversation went.
A million more rows, so much faster, but really hard to set up.
You could do it on your own.
You can on-prem it yourself.
But it's definitely a ceremony.
I mean, it's a lot to run.
I mean, but with at Cloudflare scale, you got all of that, right?
I mean, I would run Clickhouse if I was on your team.
Definitely, definitely.
But DuckDB and Parquet, you can run right on your Mac.
I mean, you can just run it right there.
and it's super fast.
And you can have a ton of usage just in one context.
But as a product, you may think about it differently.
But DuckDB and Parquet files is like, it's the way to go.
I have some telemetry for an open source code of you project I did a few years ago.
That that just dumps everything in DuckDB.
It's quite good, actually.
I really like it.
Yeah.
I mean, it's really interesting too because the agent knows how to talk to it really well.
And so rather than you having to learn how to retype, you know, queries into it,
the agent can query it for you.
And I'm like, make me a just file command for that.
And so when we sort of like centralize on a query or on a style of query,
just turn that into a just file command.
And I throw it a few parameters.
And it's like a just in time CLA in a way on a large dataset that's super fast.
That's awesome.
To query that database any other way.
is just stupid.
Like, why would you do it the hard way?
That's the easy way.
You know, that is the way.
Maybe I'll do that with my, with my claw.
That sounds pretty fun.
Yeah, I've been thinking about,
there are some things that don't work in the situation I'm in,
and there are some things where I can really take inspiration from, like,
home labs.
So I've been, like, building my, my claw and, like, playing with,
I really like pie and pie agent from,
Oh, yeah, I heard about that.
I haven't played with it.
I heard about it.
You should play with that.
There are many like it, but this one's mine.
There are many like it, but this one's line.
Yeah, yeah, exactly.
Pi.
dev is what you're talking about?
Yeah, it's really well, really well built.
Like some of the best, some of the best type script I've seen.
It's like really nice.
Such a cool domain name to, P.I.dev.
Yeah.
It says there are many coding agents, but this one is mine.
This one's cool.
It's cool.
I haven't played with it, but I saw it.
I was like, yeah, that's a good nod right there.
Okay.
So the provider model that they have and like the lower level primitives, so not necessarily
the agent.
I don't tend to use the agent, but when I'm, if I'm building an agent, then their primitives
are pretty cool.
And I think I'm still in the specking phase of like working out how exactly I want to run
like my like personal AI.
Yeah, just like finding nice product avenues from different products I like.
I really like poke from interaction.
I don't know.
Was this like, did we talk about this?
No, we didn't talk about this yet.
That's my last call.
Yeah, poke from interactions,
really, really nice,
like how they do like the stateful workflows in the background.
I take a lot of inspiration from other products like that.
Yeah, you've got to be a consumer.
I mean, consume everything, everyone's creating around AI,
all the new innovations,
even if they seem silly and toy-like,
there's some little thing that's going on.
on there that is inspiration elsewhere.
I mean, I've been a home labored for a very long time now.
I would just say I feel like I feel so thankful to be this knee-deep in Linux
than I ever was in my life because, you know, it's such a, it's a superpower right now.
So to the right of me, I have a prox, mox box with just way too much RAM and storage and CPU available.
And so I essentially have my own cloud here.
So I can just like unleash my agents.
I can build something, deploy it to that, and battle test it in almost real time on my own hardware.
And I have to send it to the cloud and deal with keys and deal with payments and just whatever comes with that.
I can like skunk works whatever I want right here.
And it's too easy.
And shout to my buddy.
As a matter of fact, on the pod recently names Adam Jacob.
If you know Adam Jicki
from chef
But Swamp.combe has changed my life, y'all, okay?
Matt, you got to check this out.
Swamp.combe.
Okay.
Waste a whole day.
It's not a waste.
Spend a whole day on swamp.combe.
And learn what you can automate,
especially if you have like a little actual raspberry pie.
It is software automation like you've never seen before.
I'm just telling you that much, man.
It's insane.
Okay, I'll look it up.
Swamp.com.
I'm serious.
I'm enamored by this stuff.
I love Adam.
He's a good friend of mine.
He's a super big and open source.
System initiative,
you know,
automating infrastructure,
et cetera,
but it's amazing.
So I've been doing that with my ProxMox.
So ProxMox,
if you're not familiar,
is a hypervisor.
So you can host VMs,
LXC containers on there.
And so it's like a mini cloud,
basically, for you.
And so,
standing up a new VM on ProxMox is a lot of clicking in a GUI.
Old days, right?
Who's doing that?
Well, with Swamp, you just tell Swamp, hey, this is the IP of my ProxMunk server.
Automate all the things.
And I'm compressing all that down to that one phrase.
It's not exactly that, but it feels like it.
And so I had this Go-C-LI that I was writing that did everything that Swamp did for me in minutes.
And I wrote that with AI too, you know.
but Swamp automated so much stuff in my ProxMoc server.
Let's make up a new VM,
hardening that thing to be a DNS server,
adding tailscale to it with my off key, with my secrets,
standing up one password on there for my secrets distribution.
I mean, like, amazing stuff.
It automates so quickly.
That's cool.
Much like you,
much like code mode,
it actually writes code to,
it doesn't,
it creates it via writing type script,
workflows and modules and
models and workflows
and it's just so wild
that what he's done with there is so it's a lot of like
what you're doing with code mode where you're like
rather than calling all these tools
you write the code that calls the tools
kind of same thing in a way but check it out
yeah no definitely definitely it looks like it
it's super cool if you're not home labing
though is you got to be home lab and
and what I mean by home lab it is like literally standing up
your own VM literally standing up your own
Linux Ubuntu
Fedora pick your distro Debian
go wherever you want and just play.
Don't drink the Cloudflare Kool-Aid too long, man.
Get your own VM.
Get your own Linux.
Play with your own keys,
with your own rules,
with your own pseudo.
And feel the medal, man.
Feel the metal.
I have a couple of Raspberry pies looking at me
from the corner of my room
that I need to do something.
Plug a man, man.
Ethernet those things.
Yeah, let's go.
Get them in there, man.
I have a great friend
and he's one of my colleagues now, actually.
since I joined Clubflare.
And he's been telling me for ages that, like, it's K3s, right?
He's running K3s on his Raspberry Pi's.
He has like a whole long cluster of them.
He keeps on adding another one every now and again.
And he's got his agent deploying, yeah, like deploying apps, running apps, like on different pods.
It's like, it's kind of wild.
It is wild, man.
Yeah.
I think I've got a bunch to learn about Kubernetes.
I mean, even the stuff you're talking about here, too, I mean, you could, I mean, now you do have the, you know, the Cloudflare account.
And so you, you have the world's oyster in front of you, so to speak, in terms of compute and power.
So, I mean, I'm not saying you shouldn't use that.
But there's something that changes when you go on-prem, home lab, feel the true metal of the actual physical hardware,
install an actual operating system onto it, whether there's Debian or ProxMox, which is actually built on top of Debian.
You can actually install Debian and then install ProxMox on top if you wanted to.
Or you could just use the ProxMux installer and just isolate from the stop, you know, from a bootable USB.
Point being is like literal metal, choosing your RAM, choosing your CPU, choosing your disk storage,
MVME, of course.
Like there's something to that where you take parts and you make it and then you put the thing on it,
which is Linux, of course, and then you build on top of that.
Like just something about that in this world of AI that, especially now, right?
Like you may feel a little lost or inadequate with Linux.
Maybe, maybe not.
Well, Claude is not.
I have a question to ask you.
Claude is not.
So are you running any local models?
No.
And the reason why is because I'm not enough time and too lazy, I suppose.
When the world's best models are available to me with the credit card swipe, I have
more of that ability than time
to, I even have a
GPU and I'm just not even using it
because all of my
interests, like nothing has to be private
to that point. So I'm just like, why would I do
that? Cloud's right here.
Codex is right here. So I'm
primarily lately a Codex
GPT, GPT5, I guess, 54.
Usually on high, not medium
because medium is not cool. High is cool.
Extra high is super
cool, of course, but
it takes about a year on extra high
it does but you get some really deep thoughts
you know for the good stuff I go there but not for most things
I'm just I'm just hanging out in high
no not a lot with models because I just find
that all my problems don't require local models
and I'm not trying to be private about any of this stuff
in the way that I feel fearful to be private
you know it's not like I'm talking about like this goiter I've gotten
it's a medical problem I mean you know I don't know
but I'm not talking about
anything that's embarrassing, I suppose, and not doing anything nefarious. So a local model is not
needed for me right now. Do I plan to? A hundred percent, Matt, I would love to. I would love
to have more time to play with local models. I just don't. So I had a good experience at the last
startup I was at where we were building like basically a glorified PDF passing pipeline.
And I got to play with some local models there, which was really good fun, because we ended up
hosting our own on H-100s because there was no need to go to the like the top of the range
like GPT5 in them in that moment. It would have been way too expensive. And so we needed to cut some
costs and these didn't have a huge amount of usage. So it was like it was really good to use H-100s
and play with it and like do a little bit of tweaking about like which model like have have a couple
of eva house. Oh my God, this model couldn't do it. This model couldn't do it. This model couldn't do it.
oh my god this model managed it right can we do use this can we use this size but can we go
or can we go a little bit smaller with this with this brand of um of model this version can we go
a little bit smaller a little bit more quantized does it still manage our evals like that was really fun
that was a lot of tweaking it was a lot of fun but um so i i have some like pull to want to like
play with some of the new open source models i mean that if you're feeling about left behind
those open source models, they make you feel left behind every three weeks.
Every three weeks, you're like a new version came out of like the model.
A new version.
Some of the new.
Yeah.
A new leapfrog.
It's a tough game, that.
A tough game.
You know, the one thing I will say this here, and you might enjoy this as a fellow
homelabber up and coming, maybe.
Definitely.
Is I've written a DNS server in Rust.
It's called DNS hole.
And I've been teasing my audience about this for a while.
I'm sorry about that, but I am getting really close to releasing it.
I just did some really cool code review on it.
It was super dope, but, you know, I'm just nervous, I suppose, about releasing it to the world.
But I'm using it.
Right now, it's my DNS server as we speak right here, right now.
And it's a replacement for piehole.
So since you have a Raspberry Pi, you may hear about one of the first things you tend to do with a pie hole,
or sorry, with a Raspberry Pi, is install a pie hole or stand up a pie hole in your home lab or on your land.
or on your land.
And so I've written this DNS server,
but I have this idea for kind of like a,
I want an AI that constantly sniffs my traffic.
So rather than me build my block list based upon the various actors,
I want the agent, the AI, I suppose, to sit at my network
and pay attention to all the real time, the hot path traffic that my DNS is resolving.
And I want it to, to,
I suppose with intelligence, with AI, add to my block list because it knows.
I don't want to have to manage my block list, Matt, you know?
And so I want to have an add-on that calls the API and paste it into the traffic and it's got that hot path.
And it wouldn't be the primary DNS server because that would be stupid.
Let's put it to the sidecar of that.
But one place where I wanted to play with the local model was in that is I wanted to have the hot path of the DNIS being.
resolved and then an AI that's localized right there, but a very small parameter, like a 1.5 or a
3 billion parameter kind of thing, just enough to be intelligent about that kind of traffic.
And if it's something that it doesn't really know about, it just sort of files it as like
this needs deeper investigation.
But for the most part, it can sort of classify most traffic as good or good or not good.
But it's going to manage my block list for me rather than, and the cool thing about that is you
may say, people say, go get this block list or that block list.
Well, that, that block list is not based on my traffic.
You know, and so it's this massive list that is contextually not really true to my network.
And so my idea is like, let's let's add an agent in the loop there.
And let's make a local model.
And let's let that thing determine my block list based on the actual traffic coming into my network.
And so the plan I have in place, which I don't have time to build.
yet is around 5 to 10 seconds after the DNS gets called the first resolution of it,
this agent will be able to infer it, check it, and add it to the block list within 10 seconds
of it entering my network.
Now, I don't know how you feel about that with security, but that's about as close as instance
you can get, right?
It's not days later.
It's not somebody else's block list later that I'm once a day sinking.
It's literally based on my traffic, almost in.
real time. And the moment it's seen, it's evaluated and added to the block list. And how would
I know? Like, what is a, what is like a key indicator of nefarious traffic? Man, I'm glad you
asked this. I mean, subdomains is a big one. So a lot of these, a lot of weird characters,
man, I wish I had my notes in front of me. There is a really cool, let me see if I can get my
notes in front of me. It's essentially, I'll figure out the name of it, but I'll paraphrase what
it is, because I can tell you that part, but I can't tell you the name of it in the moment. But it's
essentially like saying, okay, you have the name Matt, right? M-A-T-T-T. Check. No problem with Matt.
But now if you do M-Z-A-1-T-T, that's a weird characterization. So it essentially watches. It's a name for
a thing that knows what the proper sequential in English or any language should be.
And so when the characters are off, it flags it.
And that happens a lot in the various actions.
So like Google.com is a pretty, it's an easy way to spell out a domain or even p.i.d.v to
pull it back to our friends of pi.
Dot, right?
That's normal.
That passes the test.
So just based on the domain alone, which is DNS, it's like, well, is this a really weird
subdomain with weird characters flag that immediately.
And so when you look down all these block lists, it's a lot of that.
And so just on that alone, you can, at the DNS level, which is like network lookups,
that's the most secure you can be when it comes to stopping something in a network.
Just based on this one algorithm alone, you can stop 99.9% of bad traffic that should be
on, should not be on your network.
So just that alone.
And that's not even intelligence.
That's before the AI.
So just based on that algorithm, I will check all those and block based on that.
Or flag it for the AI to go and do deeper analysis.
And the AI will take care of the 0.5% or 0.1% that that can't catch or that doesn't catch.
That is truly nefarious and needs a little bit more sniffing.
That's what I'm building.
Cool.
It's dope, man.
It's dope.
It's dope.
It's dope, man.
That's cool.
That's why you need a home lab, man, because then you have, you.
these kinds of ideas, man.
You start worrying about your DNS
and you start worrying about how
you can block the nefarious actors.
Because, like, you know, all those block lists out there,
they don't do you any justice when it's not your network
and not your traffic.
Like, you're doing real time.
10 seconds later, after the first lookup.
Yeah, that'd be cool.
That'd be cool.
Kind of off track.
Happy to ran about my DNS whole, which is super cool.
But I do want to bring it home for one more thing
before we tail off.
is I would like for you to give the audience a takeaway in some way, shape, or form.
If folks are like, you know what, man, this code mode is so cool, how do you use code mode?
Like, what's the first step?
Give us the first step to using code mode.
And how do you actually build day to day with code mode?
Yeah, of course.
So I guess, like, the first thing would be to go and have a look at the blog post or dump it in your coding agent.
So it's like blog.com.
I think forward slash code dash mode dash MCP and we'll hopefully pop it in the show notes.
I think that's like, if you dumb that in your coding agent or just like have a read, I think it's a
decent read, give you a lot of insight.
But the main thing is try not to let your AI do determinists, do like individual discrete actions.
Try to write an SDK, write a CLI.
Like a CLI is also code mode in some way.
because the model is writing, if it's writing bash, as far as I'm concerned, it's writing code.
You know, like, I prefer to write typescript or Python, but if it has to write bash, then go to town.
So, like, let the model write code and get out of its way.
Just, like, let it roll.
It'll be fine.
Let it roll.
Yeah.
And then if you want it to be, like, secure and you want to deploy it properly, then have a look at deployment options for building,
or using like a sandboxed interpreter or some type of sandbox, whether it's like a VM,
whether it's, I know Piedantic released something pretty cool called Monty, which is like a Python
interpreter built entirely for code mode.
That's pretty cool.
Or dynamic worker loaders, like the Cloudflare option for running JavaScript.
Like really, like, yeah, sure, it's fun.
You can run code locally, just eval it.
But if you want it to, if you want it to be safe, secure and.
I don't know, deployed properly, then have a look at some of those options.
But really, yeah, the main takeaway is just let the model write code, dude.
Let the model right code.
Yeah, I assume you prefer TypeScript over Bash.
I don't mind Bash.
Bash is the linguifranca of agents these days.
TypeScript is the second best, I think, for the agents, but they're really fun with Bash.
I just let it rip.
Yeah.
I like Bash.
I just think there'll be, as an easier permissions model.
if you're generating a TypeScript SDK
because the first thing that the disclosure of features is easier,
it's much harder to generate,
you can't generate types for a CLI,
so the model has to go through each individual,
you have to have a skill or something to tell the model
to call the CLI to begin with,
and then the model has to go through
and look at each of the options,
call help on each of the options,
to find the one it wants,
whether if you can generate types up front and you give some more information,
some more like concise information.
So I like that.
And secondly, yeah, the permissions model I think is better.
If you run JavaScript, you're running or Python,
you're running it in like some type of sandbox interpreter.
You can take the fetch requests that are trying to leave that sandbox that in terms of us,
it's our dynamic worker loader.
Like you can take the fetch request that are trying to leave the isolate
and you can be like inspect them, have a look at them.
Like what's the model trying to call?
Does it meet your permissions set?
Does it need special permissions?
Can you, do you need to add authentication tokens?
Do you need to add like what was it trying to do?
Like basically you can have like this like anti-corruption layer,
this ACL that sits around the, the model like the code execution.
And I think that's a better permission.
layer than just like yoloing into a terminal.
I like that.
I'm going to give you a nugget.
I think you might like this since you're speaking like that.
Go on.
Hit me.
I have this thing I've been building into all my CLEs lately.
Well, I'd say lately is like the last three months, maybe more, is dash dash agent.
Nice.
You've got to add this flag.
So I mean, we humans, we love dash dash help, right?
But agents don't have that.
And our help is not their help because they parse things differently.
They like markdown.
So my dash-dash agent essentially is tell the agent what this thing is and how to use it in markdown.
And that's what it does.
It responds with a markdown, stand it out, you know, to the prompt.
And so that's my gift to you and everyone else.
I've been doing this and getting great results, but you can throw it on any command dash-d agent.
So, you know, CloudflareD login dash-d agent.
Like, what is this login kind of thing?
and explains it to the agent.
That's a pretty easy one though,
but something maybe more difficult
might be like CloudflareD tunnel new
or something like that,
dash-dash agent.
And it explains it in Markdown to the agent
how to use it.
And so when we have code mode,
I suppose, in a CLI,
you can give it the same next best tool.
So it can parse all the commands
and figure it out,
but you can give it one more easier nudge
by doing dash-dash agent versus dash-dash-d-d-d-
That's cool.
It's cool.
I like the nugget.
I like the nugget.
Have you not thought about just doing dash-dash help,
but detecting whether it's in an interactive environment?
And if it's not-interactive, then printing a markdown?
I suppose you could do both, really.
I mean, I would alias at that point.
I didn't think about that.
That's a good point.
My thought was just really like,
I want something special just for the agent.
And that sounds cool than dash-dash-help.
But you can certainly alias it where the first-class citizen is the dash-dash agent,
and then if you're doing dash-dash help and you're,
And you didn't tell it to do that and it's discovering it,
determining if it's in an interactive terminal,
just give it the same thing and just alias it.
That's a good point.
I think there's a lot of trying to make CLYs.
Like some CLIs, everyone's been trying about how cool CLIs are,
but some of them are not natively useful for agents.
Like some of them rely on interactive process.
The one that really bugs me, and we need to finish sometimes soon,
but the one that really bugs me is that,
have you ever used change sets?
No?
Well, change sets, CLY is entirely interactive.
It's like a package version manager thing.
It deploys new versions when you do a change sets and then like you put your change notes in there and it collates them all together when you do a release and stuff.
It's like a management for open source package or for packages.
It's really good.
I would recommend.
It's really nice, but they don't have a non-interactive version of their CLI.
if a model tries to do
MPX change sets,
it just like,
it freaks out
because nothing works.
So it's so annoying.
All I want is
NPX change sets,
the package name,
and then the change log.
And I might make a PR for this
because,
like,
it would save my life.
Yeah.
I think the more we can go
out of the agent's way
with interactivity around that.
Like,
I don't mind to see a lot
being designed for a human,
but also agent-aware,
or agent native.
Because I still use sealize myself,
or at least I want to in some cases.
But in most cases, I'm just like telling the agent to do that stuff for me.
Yeah.
Because why would I do that anymore when I don't have to?
I can just like let that one thing over there spin and do this thing here and here and here.
I mean, that's the better world in dramatic cases, really.
So there you go.
Yeah.
We have gone long and I appreciate it.
We went deeper than I thought on some, some cool stuff.
though, but I enjoyed it, though, very much so, Matt.
I'll link up obviously both your blog post as well as the original code mode blog post that we
talked about in the show.
I have my robots treasure trove, this entire transcript for all the cool stuff and the bits
and bobs.
It's all in there in the show notes.
So it'll all be in there as best it can be.
And if not, our show notes are open source on GitHub.
So if you missed something or you want to add something that is contextually true,
then send a PR, I guess, or have your agents in a PR or I don't know, something like that.
Awesome.
Matt, thank you so much for all you do, man.
It's fun talking to you.
Thank you.
Lovely to me.
Well, friends, this show is done.
Thank you for tuning in.
I hope you enjoyed this conversation I had with Matt, Carrie, from Cloudflare.
Wow.
I mean, like, seriously, some cool stuff happening in and around this agent space, MCP, APIs,
what Cloudflare is doing.
They're doing some really incredible stuff.
I got to tell you, during the podcast, maybe you can tell.
I got a little fomo.
I kind of wanted to work at Cloudflare about midway through this podcast.
You know, I feel like I can make a dent there.
I don't know about you, but I feel like there's just so much to do,
so much we can build in this very moment.
I'm having some fun building my own stuff.
But hey, that's it.
This show's done.
Thank you for tuning in.
We'll see you again.
So soon.
So soon.
Thank you.
