The Changelog: Software Development, Open Source - MCP on Code Mode (Interview)

Starting point is 00:00:00 What's up friends, Adam here. This is the change log. I got an awesome show for you. Matt Carey from Cloudflare, working on agents and MCP, lots of fun things obviously happening at Cloudflare, tons of releases, tons of momentum, and even Matt will tell you here in this podcast, it's moving fast and he's trying to keep up.

Starting point is 00:00:22 If you've been curious about Cloudflare's platform, how they enabled all their APIs via their MCP without destroying your token count, Well, this shows for you a massive thank you to our friends and our partners at fly.io. That is the home of changelaw.com. Learn more at fly.io. Okay, let's do this. Well, friends, this episode is brought to you by our friends at coder.com secure environments

Starting point is 00:00:57 where developers and agents work in parallel. And I'm joined by Nikki Pike, Field CTO for Coder. Nikki, what is the field CTO? So I get that question a lot and it's, you know, half the people understand it, half the people don't. So a field CTO, I describe it very simply as we're dev rel for the C suite. So we provide a bridge between the customer voice, between the C suite and the managers and the leadership teams of our customers back into our product. And then we go through and we help enable our teams to have the same message to make sure that the message is correct. And that we're building on something that people actually want, not just something that we think they want.

Starting point is 00:01:32 Okay. So we're taking the laptop away from the developer. Not really, though. We're putting them in a cloud development environment, a secure environment where they can work with their agents in parallel. These are blessed environments. What's wrong with the laptop? The laptop is the trap here. And not only because the fact that it could be stolen, you could lose it, it breaks and you're out of work while you're waiting for a new one, but there's also just the consistency that you got there.

Starting point is 00:01:54 We all know developers. Developers are going to be looking for some of the latest and greatest. And if you're not really controlling how they get out there, that's where you get this. It works on my machine. It doesn't work in production. It doesn't work anywhere else because you don't have that consistency. You don't have that ability to really standardize what that environment looks like. And this is a problem not only for new people coming in, you know, the onboarding statement is

Starting point is 00:02:14 average, I think, is like four to five weeks for a new employee to really get their local laptops set up and ready to start doing their first time of code. And, you know, the time to first commit is a metric that almost everybody knows. And the reason they can't do that is because there's a lot of tribal knowledge out there. They got to go talk to other developers. What are we using? where do we get our dependencies? Are we getting them from public? Are we getting them from private repositories? But there's also the security and the supply chain aspect of this. When you have

Starting point is 00:02:39 local machines out there, look at like the shy Hulud, you know, that virus that went out not long ago. This was a compromise of the MPM public repositories. They went and downloaded things. MPM did what it did. Next thing you know, you're compromised. But when you use something like what we're doing with cloud development environments, then you can mandate and you can put restrictions on there to say, hey, you can only go get your packages from our private repo. Those packages are expected to have been thoroughly vetted. We know that they're clean. Now, does this stop everything like Shai Halud?

Starting point is 00:03:09 No, if that compromised package gets into your private repo, you can still have that, but it really reduces the surface area of the attack. And it also reduces the blast area of the compromise should it happen, because if your laptop gets compromised and you have to kill the laptop for whatever reason, that's weeks out of work while you're either fixing that or you're getting a new laptop in, the cloud development environments allows you to kill that, start back up fresh, and you're back and running in five minutes.

Starting point is 00:03:35 You don't have to wait all that time. Well, friends, the first step is to go to coder.com, install coder, self-hosted environments for your teams to enjoy, to standardize around, and it's open source. So you can try it out today. Once again, coder.com. At Carrie, good to see you on the pod. Thanks for taking my invite. I saw code mode out there.

Starting point is 00:04:16 And I was like, you know what? Let's talk about code mode. So what do you think? Yeah, well, thanks having me. You podcast often? Yeah, so I actually have one with a friend of mine, but we don't do it super often, a couple of times a month.

Starting point is 00:04:28 A couple times a month. What do you talk about? What's the show called? It's called You've Been a Bad Agent, and we just chat absolutely rubbish about agents. That sounds like a fun. How long is the show? Well, it starts, and we just start rolling.

Starting point is 00:04:43 And I don't speak to him that much because he's in San Francisco. So I'm in Europe, and so we just, like, use it. We started having chats like every couple of weeks just because I like to catch up. And then we like, we should record this. And that's where we started recording it as a podcast. It's literally just a chat between two of us.

Starting point is 00:05:00 Sometimes it's 20 minutes. Sometimes it's an hour and a half. We don't really edit it and just gets dumped online. So I'm probably going to get sued one day for something I say on that. Don't do that, man. That I'll get you sued. We don't want to get sued. You know, one thing I'm really curious about really and why I want to talk to you

Starting point is 00:05:18 because, you know, obviously, code mode is cool. and there's a misconfiguration, I would probably say, and you could probably agree with that, with how folks are thinking about MCP. And I think I've been in that camp, too. I think we're all sort of just navigating this new world and trying to figure out how these tools work. And there's a race, obviously, Cloudflare is involved in that race.

Starting point is 00:05:40 You know, you've got blood out there in the water on X between you and Versel. I mean, you got stuff happening, you know. And I just think about the sheer size and weight of Cloudflare. Maybe you can or cannot speak to just how you personally feel about these outages. Maybe some you can help, some you can help. But I just think about the state of AI and how it's being deployed, accepted and deployed. And so the acceptance is one thing. But the deployment of it is another in a large organization like yours.

Starting point is 00:06:14 And you're in charge of agents. You're in charge of MCP. You can share with the audience, what you really do. there. That's kind of what I want to cover is that bigger landscape of the deployment, acceptance of AI, the misconfiguration of MCP, and that kind of stuff. What do you think? Yeah, definitely. Let's go. I can chat a little bit about me, for instance. I work on agents, specifically, like the agents SDK at Cloudflare. So I work on the open source stuff. You'll have seen a bunch of my colleagues on X, Twitter, like if you're on Twitter. I also

Starting point is 00:06:50 work on some of the open source stuff for MCP and how we can support MCP at Cloudflare, model context protocol. It's kind of why I joined Cloudflare. I was working a bunch on MCP, and I thought they had a really good, like, avenue there to build the best agents with durable objects and give them the best tools via MCP, and I was like, that looks super cool. So it's kind of where I really wanted to join this team. And, yeah, I've been here since October.

Starting point is 00:07:20 We released code mode in the summer of last year to do programmatic tool execution, basically. You write code over your tools rather than calling tools. And then Anthropic followed up with a bunch of cool stuff after that. And then just a few weeks ago, well, a week or so ago, we released server-side code mode. So running code mode inside an MCP server. So the model doesn't need to, or the agent doesn't need to call tools. The agent can just write code that acts upon. the tools that we have on our, like, in our back end. And then all of that code is executed

Starting point is 00:07:57 super safely, securely on dynamic workers on our server side. And so your agent that calls the tool can just write code. And yeah, it all gets executed on the server. It meant that we could put our whole Cloudflare API, all 2,500 odd endpoints. There was a number, but the number keeps changing, so every time I remember the number is, it's out of date. But yeah, so around two and a half thousand endpoints, we can put all of that behind one MCP server that actually works, that fills a thousand tokens of context. And there was a lot going around about like, oh, fixing MCP. It's like, I don't think we're fixing MCP. We're just using MCP to the best of its capabilities. And it was a really well-designed protocol, I believe. And I think it continues

Starting point is 00:08:42 to be well iterated on. And I think it was maybe not used as well as, it could have been initially. Yeah. Could you explain MCP a little bit for us? I mean, I know I've got a good dip in my toes in the water of MCP, of course. But for the uninitiated or less initiated, what exactly is the model, what is it, model context protocol? MCP.

Starting point is 00:09:07 What exactly is that? And maybe what are the myths about it that are incorrect? And what are the things you like most about it? Yeah. So it came out in November. of 2024 by two guys from Anthropic, David and Justin. And they, the whole idea was how can we let Claude, in this case, Claude Desktop actually, how can we let Claude desktop do things on my computer? How can I let it access my Apple Notes? How can I let it access maybe my web browser,

Starting point is 00:09:40 maybe Figma, maybe like whatever was on my computer? How can I let it do that? How can I let it read my code directly. That would be pretty cool. Now we have code and all of that stuff, but like, it wasn't around yet. So they came up with a protocol that consisted of tools, prompts, and resources. And tools are the things that everyone talks about. Tools are like the functions, like function calling.

Starting point is 00:10:06 The prompts are instructions that the server can hold that the client might want to use at some point and can request. I think of them as instructions. They're kind of like directions. They're almost like skills. There is a debate about whether skills are a prompt or a resource at the moment. So we can get into that. But the resources are like documents that might be held on the server that the client might want to use. And like tools have far and away the most amount of usage. But initially it was like how can we define some tools that can be used by an agent that I don't own and vice versa. How can an agent give access to tools that it doesn't own when it's being built?

Starting point is 00:10:52 And for that, you need some sort of like standardized protocol. And when these guys made it, it was just local first, communicated via standard I.O. And then like once it got some usage and they introduced a remote protocol and then that remote protocol has changed a couple of times. Now, like pretty much every big SaaS company publishes an MCP. So I saw Datadog published theirs last week, which is pretty cool. And yeah, I think I think I'm pretty bullish on MCP's, like, ability to be the protocol that agents use to access services in the future. Help me understand this breakthrough that you made with just writing TypeScript versus all the context-filling tool calling, I guess.

Starting point is 00:11:41 I should call it, right? That a lot of folks are kind of getting, I guess it's kind of wrong, but it's kind of how it's designed, but maybe you're applying the application incorrectly. Talk about the way that you've remodeled it to write typescript versus tool calling and fill in the context window. Yeah, so when LMS first came out, they just produced text, right? And then at some point, I can't quite remember exactly what it was, but I'm pretty sure it was after the big chat-tube team moment.

Starting point is 00:12:11 Function calling became popular. And I remember the first model that could do function calling really well was, I think, GPT4. And GPT4, you could ask it, like, what's the weather in London? And it would reply, call weather function, param location equals London. And you could intake that, that structured piece of information. And you could plug that into some JavaScript, some Python, some code. And you could call a third-party API, a weather API with London. as the argument, the city argument,

Starting point is 00:12:44 and you could get a result. And that result, whatever it was, you would pass back into the model as the tool result or as the function core result. And then the model would continue generating and make it all nice and pretty for you. And that was like how LLMs performed actions in the outside world.

Starting point is 00:13:02 And we, I guess, like rightly or wrongly assumed that each individual function would be like a piece, like a, would be like, a hand that the model could use in the outside world. And then they were renamed to tools after a while. And that makes more sense. Like, each function was a tool that the model could use in the outside world.

Starting point is 00:13:22 But there is like a problem where as you try to get these agents to do more and more things, you add more and more tools. And then at some point, you start filling the initial context window of the model. So, for instance, like the GitHub MCP server is always one that's used. And they've done loads of work on it. So, like, I don't, not throwing any shade. But initially, when it came out, it was like 15,000 tokens or something. And now I think it's a little bit less.

Starting point is 00:13:51 And they do some stuff to dynamically add tools or not. But, like, if you're filling the context window with sort of 20,000 tokens initially, before you've even given the model your task, like the models of yesterday, like, GPD4, they had much smaller context windows. And so you were filling them very quickly. And even now, the foundational models, the best ones, even though they have maybe have a million tokens context window or 200K to a million for the normal ones,

Starting point is 00:14:20 they do start losing power around the 50K mark, like all of them. And this is quite well documented. And so you really don't want to be filling the context window too much. And you see in Claude code, you'll have like compaction step. That's triggered because you've filled a context window. So say you've added like 20 tools or whatever, you've like got a really chunky context window. But now that's only 20 things the model can do or the agent can do. Imagine you want to have like proper personal AI or something that can actually try and automate your job.

Starting point is 00:14:58 Like imagine how many individual functions you do in your job. It's way more than 20. It's probably more than 100. It's probably near a few hundred. So, like, you can't really automate anything unless you have the ability to add all of those functions into a model. So in the summer, Kenton and Sunil, who my colleagues of mine, worked out that you could, or they were a really great blog post, the Code Mode blog post, the original one, an amazing name from Kenton, like, really, really stunning. It's like, I think it caught on a zeitgeist. It's like, you could just use code mode.

Starting point is 00:15:32 And the idea was falling back to an idea that had been around for a while. Like, I'm pretty sure Hugging Face did a research paper on Code Act a while ago. But the idea was that models should just write code. Like AI has been trained on so much code, we should just write code. And if we write code, the code can interact with the functions that we want to use. And so code mode generated a TypeScript API or TypeScript SDK really for the for the function, the underlying functions, the molecule call. And then the model just wrote code to compose those, um, those SDK calls. And the, I guess the good innovation for this, the reason why it's,

Starting point is 00:16:13 I think, slowly taking off and like, I'm going to see much more adoption this year, is the advent of something called a dynamic worker loader, which is a cloud player primitive, but other people have started building similar things, um, if not the same. Um, but like, this is a primitive that allows you to execute a sandbox worker as a string. So from a string, from some code that's a string, you can just be like, eval this, new function this, except it runs on a separate host in a fully sandboxed environment in a VA isolate.

Starting point is 00:16:46 And what does this mean? Traditionally, people got very, very scared when you say, give me code and I'm going to execute it on my machine because there are so many different ways that you can mess someone up by doing that. You can, like, out of memory them, You can access M variables. You can do loads of stuff that's really hard to protect against. And this sandbox is a very particular sandbox that's not a full VM.

Starting point is 00:17:10 It's just a VA isolate. It allows you to spin up like billions of these little scripts almost instantaneously if you wanted to. Like at Cloudflare scale, at global scale. And just run these pieces of code. Like very, very safely and securely. You can even like restrict the outgoing fetch. It's called a global outbound. and you can say, I only want the outgoing fetch to be able to access example.com or mysass.com.

Starting point is 00:17:37 Or I don't want it to access anything. Just run code that's entirely constrained in this host. And this is really cool because it allows this like code act idea, this code mode idea to like really take off. Because the model can just write code. It doesn't matter if it's prompt injected or if it's like trying to be, I don't know, trying to be adversarial. like the code is running in a super safe environment and that's all good. And that meant the original code mode blog post had this code execution happening in the agent. And that was like the smart thing to do, right? The proper thing to do is like you make the agent do code execution. And then you become like a code agent.

Starting point is 00:18:18 And that's what happens. But this relies on every or on the agent that wants to do it, actually shipping code execution in the agent. And like, it turns out that that is also quite tough. And although for the last like six months we've been shouting, like, if you're, if you're, like, have problems with context window, just get the model to write code. Like, not as many people did it as we thought would do it. And so I just took what we'd done previously and moved it into our MCP server and basically just said, like, what does this enable now? we have that massively reduced context window allowance, and we can use it to enable our MCP server

Starting point is 00:19:03 to access the whole of the Cloudflare API. Do all of the possible endpoints that you wanted to call on the Cloudflare API. You can now be accessible via MCP. And before, like, you could come to this, guess what I'm trying to say, you could come to the same conclusion in the summer by putting code mode inside the coding agent.

Starting point is 00:19:22 You could have connected as many MCPs servers as you wanted, but put code mode inside the coding agent. But fundamentally, that's quite hard to do. So how can we show the value of it? Well, we put it inside the MCP server, and that's what we did. And it built like basically a one-of-a-kind MCP server. I know one had really seen this type of thing before where I can have one coding agent that uses a thousand tokens of context, access, our whole Cloudflare platform. I think that was pretty cool. Sorry, that was a bit rambling, but that was the point. It was a good deep dive. I like that. I got some

Starting point is 00:19:56 questions about that. So, what is fundamentally different about the MCP server that's uniquely different than every other to enable this many APIs and that reduction in the context window? Yeah, so most MCP servers would map like

Starting point is 00:20:11 one API, one endpoint to one tool, and they'd be like, oh, we want to get all issues or post an issue or like delete an issue or yeah like one to one mapping and sometimes you might do like a one to many mapping a little bit if you had like a particular workflow that was very common on your website you might like create a tool for that workflow if there was a very common workflow

Starting point is 00:20:42 that customers do but you're kind of restricted to around 10 to 20 tools like maybe up to 25 I know cursor has a 40 tool limit max for supplied MCP servers. So theoretically you could fill that, but like it's getting pretty large. It's getting pretty large. And you're still not covering anywhere near. You're still cherry picking. Like for big platforms, like the Cloudflare platform, like GitHub, for instance, I use GitHub and Cloudflare, but like you can imagine any large API.

Starting point is 00:21:15 You're not covering anywhere near the full amount of. of the breadth of the API. But when you put code mode in front of it, you're now just asking the model to write code over that API. And so the Cloudflare MCP server exports just two tools, a search tool and an execute tool. So this is combining a similar idea to tool search, which is present inside Claude Code,

Starting point is 00:21:39 as also present inside cursor, I think, where they will search for the right tool on demand, and then they'll load the right tool, depending on the user intent. So we do that, but we do that on the server side. So there's search and execute. But the critical thing that no one else does is for search, we let the model write code to search over the Cloudflare Open API spec. Code.

Starting point is 00:22:06 There's no like search function or anything. The model just goes like spec dot paths and then filters, like super like naively. And it works. And then for execute, we say model here, it all. agent, here is a Cloudflare fetch clients. Call Cloudflare. request to make a request to the Cloudflare API. And we just let them all go for it. And so you end up with something that's super flexible.

Starting point is 00:22:34 If you say something like, build me a worker that hosts a next JS website that can do this, this, this, this, and this, you would hope that the agent can just write the code. maybe using V-Next, a new fun thing that we just built to write NextJS and deploy it on Cloudflare. Yeah. They write the code, and then the model can look, like search through workers' scripts, deployment APIs, whatever. It finds it pretty quickly, normally.

Starting point is 00:23:06 And then it can just call the endpoints, worker script deployment, and it will just deploy it to the cloud. Like, super, super fun. And you can have these, like, insane demos where none of the code ever gets saved on your machine, it only exists in the context of your chat with the coding agent and in the cloud. And that was the first demo I ever did of this. And I think it went pretty well.

Starting point is 00:23:29 And it went so well, they were like, we have to ship this. Really? Wow. What was that like that demo? Can you take us into that day? And who was there? Where was it? Was it online only?

Starting point is 00:23:41 Yeah, yeah. So it was the Friday demos. The Friday demos are pretty legendary. Cloudflare, like we, so I'm part of a team called like developer platform. So I work on workers. I work on, yeah, on anything built on top of the developer platform. And yeah, we have like our own demo sections on Fridays. And yeah, like the whole most, most of this org like turns up and we probably have five to seven like awesome demos. Like some of the coolest stuff like you've seen come out of Cloudflare was probably demoed at one of these platform sessions, like often only of like a short time before

Starting point is 00:24:21 it was released to the public, which I think is pretty cool. Yeah, it was good crack. You get a couple of minutes, you know? Yeah. Yeah. Look at this thing I built and then 50% of the time it breaks, but then 50% of the time it's awesome. I mean, a lot of these things happen, I would imagine, like, maybe late nights, maybe just late in, like, maybe late in your ability to keep thinking where you're sort of knee-deep in innovation. You're you're iterating over something. Can you take us into how you stumbled upon this, you know, write some type script against an SDK? How did you stumble into this? Was it a thought? Were you in the shower? You want to run? You know, how did this, you know, how did this iteration come to be? Well, I've been putting it

Starting point is 00:25:08 off for ages. We knew it was something that we wanted to do. So I don't, I don't know if the, like the idea had been around for ages. Like Sunil, my tech lead, I don't know if it's infamously now, but he built the agents SDK, which is the thing I work on, the first iteration he built over a weekend. And then he went to his product leads

Starting point is 00:25:29 and was like, guys, can we ship this? And they were like, ooh, maybe let's like hold off a week. And then he shipped it a week later. And the first version was literally just like, durable object, export agent as durable object. and it was like clean. It was like one liner.

Starting point is 00:25:46 And that was the first version of the agency K. But that was like his, that was like a really cool innovation there to use agents as durable objects. And I guess for this, and that was a weekend. I guess for this, like we knew it was something that we wanted to do.

Starting point is 00:26:03 Suddenly they'd been badgering me for ages to like get on and do it and work it out and work out the tool thing. Why can't we just have loads of tools? Like surely this is possible. and yeah I'd put it off for a couple of weeks and then I think it was a Tuesday just sat down and was like right I'm going to do it had a had a chat with Claude I reckon about how we might want to do it worked out very quickly that it would be I was been working a lot on MCPs that worked out very quickly that the best use case for this would be how can we enable every MCP server to host an unlimited number of tools and that this seemed quite possible

Starting point is 00:26:35 with the search and execute paradigm. And, yeah, like the model, I previously, I've, I don't know, I've been working on this for quite a long time. In my previous jobs and stuff, I'd always had problems with search functions. So whenever you give the model a search function, now you need an eval. Like 100% you need an eval because you need to work out that if you change the parameters of your search, like, does the search get better or worse for your task? And I was just pretty, like, I was very drawn to the idea of code mode that I would never have to do like an e-val in that way again, because the model can write code.

Starting point is 00:27:15 And as the models get better at writing code, my thing would get better. Like, I keep everything in distribution. And so I was really, really, really drawn to that idea. Like, I knew these models, they're just going to get better at writing code. Let's lean to their strengths and just, yeah. Let the model get, yeah, keep the model in the distribution it was trained in, rather than like some hacky search function. It's never going to be in that distribution. And so I was like, right, let's go.

Starting point is 00:27:41 And that sort of brought all together, like the two pieces. Do you have an unlimited context window? Like, do you have my context window when you're working with Claude? Give me an example of. Do I have something special? That's kind of a tongue-in-cheek request or ask, I guess, but what I'm trying to get to is less that real response. I'm happy to take it. but more so be more clear with what you're like this back and forth,

Starting point is 00:28:08 are you dropping files, or you sort of, you know, sort of microcontexting where you sort of pull some out of the context, you take it to a file. Like what does the actual interaction look like to go back and forth with Claude to innovate for the future like this? Yeah,

Starting point is 00:28:22 yeah. So my workflow with Claw has actually changed quite a lot over the past. I'm sure everyone's has. It's a huge amount of the past. Yeah, I would say. a year and a half ago I was like big into cursor I really liked it and I was using it I was using it mostly for the tab model I'd say about a year and a half ago and then January last year I was like now the agent model is the future I need to like work out how I can just prompt how can I just prompt things into existence like what guard rails do I have to put on things do I how do I can I make the feedback loop how can I have tests that have good patterns and all of the same sort of good stuff, to like build basically my code basis for agents. And this was about January last year. And then when Claude Code came out, I was like, right, now this is definitely the

Starting point is 00:29:10 future. Now I'm just going to have an IDE open when I want to actually visually look at code. When I want to review, I'll open an ID, otherwise just straight in the terminal. Let's just chat. Let's build something. And I tend to run everything on dangerously skip permissions, like 100% of the time. I have some like, like, sandboxing on my on my on my on my laptop it's a custom thing that I'm not like don't super want to talk about but I am curious what you want to talk about yeah yeah I'm not going to talk about hugely I have okay so I have my own version of Git that runs on my that the runs on my machine and it's just

Starting point is 00:29:52 alias and it just stops like the model doing like stuff that I don't want it to do so it stops it force put force pushing to branches overwrites stuff, even because I have admin permissions on a lot of repos. So to me, this was like the base level of thing that I didn't want to happen. I didn't want anyone to be, I didn't want any agent running on my laptop to be able to overwrite a remote repo. That's like base level. Yeah.

Starting point is 00:30:18 My own laptop OS, like I don't mind breaking stuff. Like I don't mind breaking anything locally, but I don't want to break anything externally. And so I have a few like aliases like that where I've just like completely overwritten the Git. internally. And my like Git rappers called Zaggy. It's public on GitHub. I built it in Zig actually. It's super fun to build using LibGit 2. And it just like has a bunch of these protections at the box. So I really don't mind running stuff in dangerously skip permissions. I tend to do it all the time. A chat with the ball, I chat with the model about things that I want to do. But I tend to come to sit down at my laptop with a preconceived notion of what I want. I think a lot of, I think a lot of

Starting point is 00:31:01 the time when I sit down at my laptop without any idea of what I want, I almost feel like I'm scrolling on Instagram or Twitter or something. Like even chatting with the model, it's just feeding a dopamine rush that is not real. Like when I know what I want, I can, I feel like I can evaluate stuff really well. So it took that Tuesday for me to sit down and be like, right, I know what I want. Let's just do it. And in that case, I think I can be hugely effective. I think most people can just speaking straight English to the model and not. Nothing fancy. Yeah, just go for it.

Starting point is 00:31:34 So a lot like a chat, like a real chat. You're not doing some act as Cloudflare master worker slash developer. You know, like these sort of like hacking things. That used to be a thing. That used to be a thing. Yeah. Yeah. I'm not doing any of that.

Starting point is 00:31:47 I'm not acting as like every once in a while I might put it in a role. But I mean, just it's like one out of a hundred if that. It's usually just here's the problem. Here's where I'm trying to go. Here's where I'm at. Here's what's in between. us, let's just riff kind of thing. You know, what's here, what's there?

Starting point is 00:32:07 And I sort of trust the model of those senses. So I'm not trying to wield it and force it into a mode. I kind of just kind of give it the trust it needs to do its great job. Anytime you're fighting it, I feel like personally for me, anytime I'm not getting good results or I'm fighting it, I'm trying to push it into an area where it's just not so much not good at. We're just in uncharted territories or just something like that. And I feel like the more I just talk like I would a normal engineer next to me or a colleague, that's where I get my best results.

Starting point is 00:32:39 Yeah, you want to always keep it in distribution. If you're doing something too wacky, then you're probably going to have less, you're probably going to have erratic results because like if the agent never saw or the model never saw anything like that in pre-training or in its post-training like RL stuff. was that if you're like working on a common programming language is codebase, you're speaking to it like maybe like you'd see in a GitHub issue with language that is like, unintelligible, then I think you're pretty good. I don't think I do anything special with Claude. I would say that I tend to only use Opus now since Opus 4.6 came out or 4.5 or whichever one it was before Christmas. I think that was a big step change in being able to do things more autonomously.

Starting point is 00:33:31 And so what I tend to do now actually, because I work on a lot of different repos, I just open code or actually open code. We use a lot, Cloudflare. I just open that in the, in like my code folder on my laptop. I basically open it, always just open it in my code folder. And then I direct to the repo that we're working on. Sometimes I'm like, right, make a new work tree. But everything lives in the top level code folder because I work a lot on libraries and on products.

Starting point is 00:34:02 And the products use the libraries. And so it's nice if it's all, like I'm working constantly at that top level. And then I can move between stuff more fluidly. Really? And that's actually one of the reasons why I don't use cursor as much or like any of the IDs as much now. Because just like having the ability to like open an agent in that top level folder, super nice. Yeah, I guess if you're constantly, context across projects or libraries are connected, it makes a lot of sense.

Starting point is 00:34:30 But if you have disparate projects where that is totally its own thing or this is its own thing, you kind of want to have a directory of a silo is kind of how your code directory is. It's like this is a silo of all Cloudflare work. So therefore it's cool to just open that one directory. Is that what you're saying? Yeah, like I would even have like even like personal work. I don't think it really. Yeah.

Starting point is 00:34:54 I don't ever see the model, like, going into other directories that it's not meant to. And I do actually watch quite a lot. Like, I don't, maybe, maybe this is a mega security thing. But I think it's pretty good. I have a working directory, like, I have a directory of working code that I'm currently working on on my machine. And I just open, I just open it in that. And I tend to reuse patterns a lot. So I mostly work on open source work, right?

Starting point is 00:35:24 So for instance, how the development worked with the Cloudflare MCP server was I built it as a POC. I published it on my personal GitHub. It was published. And then once I got enough buy-in, once people thought it was good, once the quality was there, then we moved it over to a Cloudflare legit one and we did a big release post. And I do that with quite a lot of stuff. Like, there's procedure and things to making a new repo on the Cloudflare org. and like it has to meet a certain quality bar. So for just POCs and testing and stuff,

Starting point is 00:35:58 I still want version control. So I just use my personal GitHub and it's fine. Well, I use my, yeah, I just use my own org. Well, that's clearly a lie to do that without have any, any real issues. I know that, I mean, it's so sensitive whenever you're, I mean,

Starting point is 00:36:12 whenever you're in your position as a brand as a company, you do have to have, you know, locks on the doors. You know what I mean? And that's not so much not a lock on the door, but that's cool that you have that. kind of autonomy to, one, explore and two, not get any backlash for publishing to your

Starting point is 00:36:31 personal GitHub, where it's like you could be seen as like, I'm trying to take, and you're not obviously, if I'm trying to take the Cloudflare Thunder. No, in fact, I'm going to innovate, and I'm just trying to bother our main org and our brain integrity with my little toy here until it becomes not a toy. I like, we have private internal version control as well, that we also use. But for things where I want to share it and I want to even see other people if they're interested in it and things like, it just makes sense. Like you want to get it out there. I think I'm in a very special situation in the company where I work like predominantly on open source. And so there is a more freedom like allowed there because anything that I share

Starting point is 00:37:15 is public by its very nature and is going to become public. If I, if I'm working on the agent's SDK, like, we have to be really, it's hard to have even a proper release because everyone sees what you're doing as you're doing it. And so, like, to even, like, do a bit of experimentation is, like, quite, it's quite tough not to get found out. And, like, you still want to be able to do a proper release even as an open source library. Well, friend, you know, I'm a big fan of Tailskill. And you know what? I could not do anything. I'm serious, anything without my tailnet. I'm here with my good friend, Alex Kretschmar, from Tail. scale, Alex, how do you describe tailscale versus a VPN? How do you describe tail scale to someone who's not in the know?

Starting point is 00:38:02 Well, the biggest difference between tailscale and a traditional VPN is how the traffic flows. When you look at a traditional VPN, the traffic flows through a central hub and then out to your client devices on the back end. With tail scale, every device makes a connection directly to every other device. And that means effectively you're cutting out the middleman and you get much better performance as a consequence. And so that mesh network that you've built, you've got to have a way to control how the data flows between different devices. Because you don't have that central choke point anymore, we have a thing called access policies, which allow you to granularly define using ACLs and grant policies, which nodes are allowed to talk specifically to which other nodes, on which protocols, on which ports, and which users are allowed to even connect to different things, all over the tailscale encrypted tunnels, which underneath use the Wiregard technology. Yeah, it's not my land, it's my tan, my tailscale area network.

Starting point is 00:38:58 But your word for it is tailnet, right? The tail net is the word that we invented to call the logical grouping of devices that form your tailscale network. Much like you might have a land of devices or something like that at home. Effectively, the tailnet, we call it something different because those devices can transcend physical locations. So you can have a server in the cloud, talking to your phone on the bus, talking to your servering, in, I don't know, the basement of your mom's house across the other side of the ocean. And that tailnet is a flat network that only you can connect to and access. So that's why we call it a different name from anything else is because it's location

Starting point is 00:39:36 independent and you can connect to it anywhere. Well, friends, check out taelscale at taelscale.com. Totally free for your home lab. And, of course, paid for your teams, pro and enterprise. But literally, I could not do anything I'm doing in my home lab, in my dev lab, without tailscale connectivity. I'm out and about. I'm here.

Starting point is 00:39:57 I'm there. I'm everywhere. And I've got to access my home lab, my dev lab resources, and Tail scale is how I do it personally. And you should too. Once again, check it out, taelscale.com.

Starting point is 00:40:16 The reason why I ask you about how you actually work with Claud is because, you know, I think that's the curiosity of everybody. Like, a lot of us are to some degree working in silos, even if we're working together. Because there's even speculation of how large of a team can you actually work on in this new era because of how much you can get done in one slip versus as a team where you'd have to collaborate a lot more on a major feature. I'm not sure you may have some inbound conversations in Slack or, you know, that kind of thing, maybe a pull request, review or something like that or in your case of POC as an actual repository.

Starting point is 00:40:55 But I feel like in this world, what I'm hearing a lot of is like it's actually kind of hard to. to work at this level of ability and collaborate at the same time. I think it depends how you like to work. I'm not saying my way is the way and that I'm like six months in front of all you guys. You should work like me because I don't think so. I thought Dax from SST anomaly, OpenCode shared a really interesting post on Twitter the last couple of days where he was talking about how everyone sounds like they put it all put together. But really, he doesn't think so. And he knows that they don't have it all worked out.

Starting point is 00:41:36 Like, they're still working stuff out. They think they're faster with coding agents than without, but not entirely sure. And it was really like a push to be like, can we just leave everything better than we found it? Like that whole thing of like coding agents, yeah, sure, they let you work very quickly in the short term. But let's, let's go through our code base. Let's build everything the right way, the way that we're proud of, not the way that Claude told us to do it the first time round. I thought it was really pretty. I think everyone should remember that, yeah, sure, we're there to do a job, but we're also there to ensure that when the next person comes to have a look at the job we did, they can actually have a clue what's going on. And it does work properly, and it is tested.

Starting point is 00:42:23 And there's a lot of slop being thrown around, a lot of slop PRs. I think on the agents SDK, we actually closed like PRs from external collaborators for the time being. Not to say we weren't open them again, but it was just getting too much. The way that open source works has to change. But us as a team is quite kind of interesting. We support essentially three products on our team. And our team was five until very recently. We support the agents SDK that I've talked quite a lot about, how we build agents on Cloudflare.

Starting point is 00:42:54 We support MCP. So how people build MCP servers, MCP clients, how we build those on Cloudflare and also the Cloudflare supported MCP service. So the new one we just built. And also we support all the ones that Cloudflare published last year as well, the external ones. So we support those two avenues. And we also support sandboxes. So the whole sandbox product in Cloudflare comes from my team as well. And so there was five of us working across these very three distinct parts.

Starting point is 00:43:26 of building agents. And so there's a lot of surface areas. So I think we're, as a team, we're pretty well versed in like having our own domain and like building out what we think should be built out in our own domain. It's very hard to get under each other's feet because there's so much space. Are you six now or are you four?

Starting point is 00:43:51 I think we might be six now, seven very soon and eight very soon after. It's going good. How do you, are you autonomous in terms of like which product you focus on at any given time? I'm sure there's missions, of course, and there's directives. Yeah. Whenever you think about, okay, like you said that Tuesday, when you sat down and innovated in this way to, to give us these 2,500 plus APIs in a thousand tokens or less kind of thing. How do you sit down or how do you even think about your work when you, when you're split across your products?

Starting point is 00:44:26 you shiny objected or is it pre-directed or you totally autonomous? How does your wind blow when it comes to that? Our team is, it is kind of special in Cloudflare and it's changing a lot. So if we have this conversation in six months, it might be a very different situation. But our team is very new. So we, I think we were launched as a team under, under a year ago. And I joined in October. And people have been joining every couple of months, basically, for the last year. So I focus on MCP. I also do a bunch on the agents SDK to help support MCP, to help support people building agents. And I'm focusing on memory a lot at the moment and how we can build out a story for that.

Starting point is 00:45:09 And just support developers building on Cloudflare there. Other people on the team have different specialties. We basically all contribute to the agents SDK and then like nourish on our team. He focuses very much on sandboxes. Like sandboxes is his baby and he's built it from the ground up. now he's getting some support on sandboxes. But we were always all contributing to agents SDK, even if we were doing our other stuff. Because it all ties back in.

Starting point is 00:45:36 Like, we need to have like this cohesive story and be one cohesive team. And we're all, when I build an agent in my spare time, I use all of our products. Like all of the SDKs we produce, I use. So there is like, I think the main worry for our team is like how do we, how do we not end up with like domain specialists too much? And we have a nice tracker about who's submitted PRs to different repos. Because I think there is a worry there that like I haven't committed to sandboxes. I have no idea what's going on there. Like how can I answer a question when someone comes up to me at an event or something and talks about sandboxes?

Starting point is 00:46:13 Or when I like I'm developing on it myself and I find a bug, like it'd be really nice if I could fix it. Like just very basic stuff like that. I mean, that is the worry the way we do our team. But I think like everyone is just so interested in building age. agents and all of these are critical parts of it, that we float across it each other quite, we float across it all, all the surface quite well. Yeah, I'd be a little worried about that too, especially when things move so fastly. I mean, like it didn't move this fast before.

Starting point is 00:46:41 And I guess a yearish ago, it was a little easier to, to have that disposition where you say, you know what, I'm focused on agents in MCP, but if I don't contribute to sandbox, quite that often, it's okay because it's not moving at the speed of agents, you know, which was the case beforehand, but now it does. And so I would personally, if I were in your position or any, or on that team, I would feel like a little anxious. I'm not keeping up. And I don't know, I guess this is how I feel about most things, really, but especially if I had, you know, my particular sliver that I'm focused on totally like agents and MCP. And I'm really curious what you're talking about with memory, what you're doing there. But I would have some

Starting point is 00:47:20 anxiety about, my gosh, how do I even maintain any version of contact? around sandbox is when if I step away for a week or I don't pay attention to some of the side chatter, how far back do I go when it comes to progress? Yeah. I mean, we've got to be as forward-looking as possible. I think our team attracts a lot of dreamers, I would say. Yeah. The guy that started our team is like an absolute dreamer.

Starting point is 00:47:51 Like he's thinking so far in advance, like I. I really respect how he can think like that. And, yeah, learn from that as much as possible. The aim for us is to be ahead of the org. What the org wants, we should already have, like, ready for them to use. And I would say a year ago, we were quite a long time ahead of the org. And now everything is going faster. Like, there are some people building insanely cool agents at Cloudflare.

Starting point is 00:48:22 And yeah, that's where the memory thing is coming out of it. Like how can I best support them? How can I support the developers building on our, on our platform? Yeah. Yeah, I think we're all stressed about like falling behind. That's why you'll find that a lot of Cloudflare people are like permanently online, maybe a little bit too much. Yeah.

Starting point is 00:48:41 If you, if you want to, if you want to throw shade at us for anything, like it won't be because we're not, not receptive to feedback online. Just curious on that note. And you can, you can blur the line if you want to or not give the exact number. How many hours you think you work a day? And don't just say in front of the terminal because when you're making your coffee or you're on your back patio or you're walking your dog and you're thinking about work that's still kind of work in a way. How much time do you truly separate from the problem set that you're dealing with or working on? And how much of that turns into like Matt's life?

Starting point is 00:49:22 I don't know. I think I'm thinking about this stuff all the time. Like 23-7? 227. I like my sleep, you know. Like, I like eight hours minimum. Well, I'll admit the moment I wake up, I'm going to sleep thinking about a problem. I'm waking up thinking about that problem.

Starting point is 00:49:40 It's a sign of a good problem. Zero am I even throwing shade at you. And I think the reason why I ask this question is more of a reality check to our listening audience because I know there's a lot of folks feeling like either they're not dipping their toe in and they're abrasive to the situation, and they're kind of late in a way, but still early, which is kind of funny to think about. Or they're just like you and I and others where they're like,

Starting point is 00:50:06 I mean, the race is on. I just can't stop thinking about the things I want to change or do, and there isn't enough time of the day. Now, I'm not eking into my personal life where I can't live my life by any means, but I'm definitely thinking about the problems I'm trying to solve far more than I ever have before agents into my life as a reality check. Yeah, I don't know if it changed.

Starting point is 00:50:32 Since before agents, like I started my career writing code by hand, as most people listening, this probably did. I love how you said that. That's so awesome. Yeah, well, I mean, you've got to, you got to preface this. It's all organic. All organic code, you know, organic. written by me. That's right. The OG code. Yeah. Yeah, definitely worse than some of the

Starting point is 00:50:58 code that cold spits out, definitely. I think I could always get very engrossed by a problem. Like my girlfriend, she gets so mad at me sometimes. I'm just like, I get super sidetracked by stuff, like incredibly attached to a problem and a solution. Well, more of the problem than the solution. But so I don't think that has changed at all. I think what has changed is how I work. So I spend much more time dreaming about like a future world and like a future things that I'd like to build and or like thinking who might be best to build them. And when code is cheap, like you can build more stuff, but you also still have a limited amount of time. You can't build everything. And like the hard things are the things that, like the cool things are the hard things and those are the things that

Starting point is 00:51:52 take time. And there are like, there aren't that many quick wins there. You need to put in the hours every day to like work out what you want to do and how you want to do it. And I guess now I'm spending less time coding, like manually coding. I mean, I don't actually do that that much anymore. And I'm spending much more time thinking about like what, what I'd like to build. But I'm always thinking about the problems, I guess. Now, much more scatterbrained. So previously in the past, I could sit down for eight hours and just code for eight hours. And like, that was great. I was stayed in a terminal or I stayed in an IDE. I like never left it. I like knew, had enough knowledge about the domain expertise and what I was building. I could just like smash it out. Was there now,

Starting point is 00:52:39 because I sit at things, I feel like the coding agents sit in between me and the code now. So I am much more, in the back seat, or at least like in the bird's eye view, over multiple different things, not normally just one, because I can, right? But it does, there is a compromise there that you do feel much more scatterbrained. You're like, here, you're there, you're like, you have to dive into this, you have to dive into this. And traditionally, not super good at that, I'm not going to lie, hugely bad at multitasking for me. So, like, getting that compromise, right, I envy people who feel like they can productively prompt like six versions of Claude Code or Open Code or whatever, six coding agents at once.

Starting point is 00:53:26 I just don't see how that is humanly possible. For me, I reckon I got three in me max, because I think I can only do three problems in my head at once and still have a meaningful output to each of them. And definitely over two by like, well, over three, 100%, but like my capability, my capability to like do something hard, massively reduces. I feel like for me,

Starting point is 00:53:56 three to six a couple times a week is where I'll catch myself there, not like I intentionally go there. Yeah. But I rather enjoy a one-to-one problem except for when I'm waiting for it to like do the thing. I find it slow. I find it slow to one-to-one.

Starting point is 00:54:15 I can't do it. So I kind of have to do one thing, but multiple things on that one thing, I suppose, is the way to describe it, where I guess that's still kind of three. But it kind of depends, right? It's the traditional, it depends, it's an area there. Because when you're waiting, what are you doing? Like, you know, maybe even your own spaghetti and your brain is getting unravel where you think you have the context. You're sort of planning things. So I kind of feel like my zone is like two to three because one is two.

Starting point is 00:54:46 and it's not too slow for me. It's too slow because it's doing its thing and it's doing dramatic stuff. It's doing a week's worth of things and that 30 minutes I'm waiting or whatever or that three minutes or four minutes I'm waiting. So I find like I have to be in the three zone almost always. But then even multiple projects that are uniquely different but similar, I find that's a couple times a week. And if I do that more than a couple times a week, I can get in that zone for hours, three, four hours, really, where I'm working on like three or four different projects

Starting point is 00:55:24 and like three or four things per project. That's wild. And I'm not like prompting. It is kind of wild to do that kind of stuff. Really it is. And I haven't sat back and said, how well are you doing? But what I can see is the get commits.

Starting point is 00:55:41 I can see the progress. I can see the improvements. And I see the real thing deployed and usable, not just this fake thing that maybe, you know, this agent psychosis kind of scenario where, you're like, I think I'm making progress. You know what I mean?

Starting point is 00:55:56 I see that. I actually like on that note, if I'm doing one-on-one, I actually often find myself sabotaging the model because I'm thinking faster than it is writing. And so I start writing stuff. to correct the trajectory. And I think that's really bad.

Starting point is 00:56:15 Yeah. I think it meant because during the planning phase, I got bored, it meant that during the execution, I'm just constantly fighting like it's trajectory. And so, and I do use plan quite a lot. I also get plan and then get reviewed by another model.

Starting point is 00:56:32 I have a skill for that. It's really, really good. I think I nicked it from someone on Twitter. I honestly, amazing would recommend. But when I do two or three, then building the plan is better because I can set off one to build a plan and then get reviews. And that might take like 10 minutes.

Starting point is 00:56:49 And then I can set off another one to do it. And then the third one. And then by the time I'm like done three, I'm like, can take a breather. And I can be like, right, let's go into the first one and see what it's come up with. And like, it's right on this plan. And that's so much better than being like just one on one. Oh, wait five minutes. And now sabotage.

Starting point is 00:57:09 Like, because I want to, I just want to implement now. I think giving it the time is nice. So we have the cycle repeats itself for you. This is my cycle. I don't use plan mode a lot, but I do a different version of planning. So I wrote a go-c-l-I and this flow I created called Agent Flow. And it's a lot of, I guess, context dumping in a way, but I'm making plans. and those plans are called PEPs.

Starting point is 00:57:42 It's stolen from the Python world where it's not a Python improvement proposal. It's a project improvement. I guess it's a, what's the E stand for again? Enhancement, that's right. I'm like improvement, enhancement. Project enhancement proposal versus Python enhancement proposal. And so what I find is I will either make a true spec based on RFC, 21119's protocol for like must should things like that and those are for bigger things

Starting point is 00:58:17 you know like the way an API should function or what kind of error codes we should respond with and things like that like what the API surfaces so I'm speccing an API or different things not literally every possible thing is getting a spec but here's what I'm trying to get to is what I often do and maybe this is how it works in plan with for you is I just trust the model and I say after they present the plan to me, I ask it to review that plan for lack of clarity and blind spots. Just that one prompt response back to it. Like nothing else.

Starting point is 00:58:50 Not here's what I think is wrong with it. Like I told it what I wanted to do. I'm telling it where I'm at, what the gap is and what we're trying to go. And so the problem is there. And I'm trusting the model to kind of get us there. And the plan is the iterative process. And so once it presents this plan to me, in my case as a PEP, I just say review that pep for lack of clarity and blind spots and it will go and it will review it and it comes back and it's like well we're missing this here and that's not right there.

Starting point is 00:59:17 What do you think the next thing is that I ask it after presents all these challenges from high to low? What do you think I tell? Fix the plan. No, no, I don't. No. What did you tell? Kind of yes, but no. I give it one more little nudge because I want to trust the model.

Starting point is 00:59:33 I say, what are your suggestions for each? That's literally all I say. What are your suggestions for each? It goes and it erased through each suggestion that gave back to me of all the problems. It's like, here's how I'd solve it. Here's how I'd solve it. And I'm like, what do you think are respond back with after that? What do you think?

Starting point is 00:59:49 My next prompt is. Fix the plan. Do it. Literally the words, do it. Okay, so let's make the plan, present the plan. What clarity and blind spots are missing from this thing. Present it back to me, a big old list. What do you suggest for each?

Starting point is 01:00:05 It goes and does this thing, presents a plan back to me, do it. That is literally what I do. This is essentially, if you think about what you're doing in terms of like 20, 23 prompting techniques, it's like you're doing reflection. By asking the model to look back at itself and see whether it's done anything silly. And then by asking for suggestions, you're doing chain of thought prompting because you're getting a new train of. of thought to go back on the original one. Yeah, so it's this reflection plus chain of thought. It's like, yeah, it's just really funny how all of these prompting techniques come back

Starting point is 01:00:43 around. And what's even cooler, I think, is that the likelihood of those prompting techniques being reflected in the underlying training data is, I think, super high. So for Opus 4.6, yeah, for Opus 4.6, I find it often, it like, it will do like, wait at the end, here are suggestions for each. So I do find it often does that step for you. Like, it doesn't need to be told. So it does. And I kind of feel bad about asking it for more. But all it did, it presented a bunch and then it kind of gave me three. So it may have given me a list of, let's just say, six to 12 issues in the plan, right? And it comes down with like three

Starting point is 01:01:24 or, it always gives me some version suggestion. But that's not the real suggestions, man. I mean, go back in the list. You know, what are your suggestions for each? Each is a more, you know, four I loop through all the thing. You know what I mean? Like, that's all I'm really hesitant to do. And I get such great results with that. And then I kind of feel bad with my final prompt being like, do it. Yeah, do the thing.

Starting point is 01:01:46 Because it feels so not smart on my part. Do it. Yeah. None of this is that smart on our part. So I think we have to accept that. Like, I think the smart thing is knowing when you sit down to the computer, what do you want to make? Like, like, like, what?

Starting point is 01:02:00 The intent. Where are we going? we do. Yeah. And I think the suggestion side of things, like, I actually review those quite heavily on each plan iteration because I do want to make sure we're following a trajectory. Maybe that's my like, own nervousness around the model, but I do want to make sure we're following the right trajectory that I have in mind. Do you ever use voice to get longer prompts? Just recently started to do it. Actually, there's a cool thing called. Handy. Handy.computer just mentioned that this week in Changeold News. It is an open source

Starting point is 01:02:39 voice to text. It's all done on your machine. It's free and open source. So I mean, you know, a lot of safety there in terms of like what you're putting out there. I've tried it a few times. I like it. It goes in any text box. You give it. But it's kind of hard to always default to that because some things are technical and you got to, you can't speak a command very well. or syntax or a file path or things like that. So I find that I've just learned to type faster and more clearer. And it keeps my brain in it more than I think it out loud. Because if I talk out loud, I will talk a lot more to my podcast.

Starting point is 01:03:20 Whereas if I type, I'm more terse and more clear. Whereas if I speak, I'm more ambiguous and thought provoking and meandering, so to speak. you know, like I just would say the word, uh, and it's like, what are you talking about here? Whereas if I'm, I don't never type the word, uh, as I'm trying to speak, because that's not what happens when you, when you write. Yeah. That's an interesting avenue. I know the AMP team have some thoughts around this where they do.

Starting point is 01:03:47 Yeah. Yeah, they specifically made Enter just make a new line on in AMP originally, like in the sidebar version of AMP, rather than command enter or control enter or share. or whatever it was actually executed the prompt. And the thought process behind that was like, we want to encourage users to make longer prompts, to make larger expressions of intent to like fully scope the problem at hand. And if we make enter a new line,

Starting point is 01:04:18 then they might have some inspiration to write more stuff. I think with, I finally got around to writing longer prompts. And I'm very excited by Claude Codod. just added voice support where you can hold space bar and have a speech to text model. So I can like speak for five minutes or for 30 seconds or however long it is. Then I can dump. I can do like I have a clipboard. So I just like command v, command V, command B, all of my context in below.

Starting point is 01:04:48 And then I think I end up with quite a nice, quite a nice prompt by doing that. I'm very excited by that flow. And I trust Opus 4.6 way more to like execute. for a longer period of time. I think in the past, using cursor and using cursor with, I don't know what model it would have been at the time, but like probably Sonnet 3 or sonnet up to Sonnet 3.5, using cursor with those models, I would like send something.

Starting point is 01:05:17 And then I'd be like, oh, no, I meant this thing. I need to add this more information. And then I would like cancel the original prompt. Yes. And then like compress it. And by the time I finally got a prompt that I was happy with, I'd actually sent it like six times. and then like canceled it and brought it back and all of that flow I thought was awful.

Starting point is 01:05:35 So I'm really consciously trying to make that problem, but more well-scoped. Yeah. So we got there by talking about the things you're working on, how you focus on agents, the open store stuff you're doing there, MCP. And you mentioned that you're starting to think about memory. Yeah. Can you take me into what you mean by? that, what are your thoughts on that, what's attracting you to that? How far are in are you? Do you feel in over your head? What wisdom do you have? Do you have any wisdom at all? Where are you at?

Starting point is 01:06:09 I have felt in over my head for the past. Oh, my whole career, I'd say. Good for you. It was crazy. In a crazy world we live in. I think before when you were saying about feeling left behind, I think so many people feel slightly left behind or a lot left behind with this version of, like, I think if you went and spoke to my friend, a lot of my friends in London, I actually recently moved to Portugal, but a lot of my friends in London, the software engineers that I knew that I lived with, I went to university with, the amount of them still not using AI at all, it's like, wow. And then you realize that you're in this tiny little microcosm of people who are just obsessed with this slot machine in a terminal.

Starting point is 01:07:01 It's freaking wild. I don't know. What was the question, Ann? See, there you go. There you go. I'll bring it back. Don't you worry. Memory.

Starting point is 01:07:12 It was about memory, really. I think it's cool that you're, I mean, I'm fine to even step back in there a little bit. I mean, I do want to talk about memory and what you're working on there because I'm curious about how I've never played with the memory side of things at all. And so I'm super curious. But, yeah, I can talk very briefly about memory. I even have friends, too, that are zero. Like, I just, here's an interesting, somewhat of a tangent in a way,

Starting point is 01:07:38 but I think it may play into what you're talking about because it's totally right of developer. I was visiting with my, my newest doctor. And I live in a small town outside of Austin called Dripping Springs. And the doctor I go to, oddly enough, I live, I'm fortunate. enough to live in a town where we have a concierge's doctor. And so I don't go there with insurance. I go there as a concierge's. I pay out a pocket.

Starting point is 01:08:03 I won't explain it all. I can use my agency against it. But the point is, they're a concierge's style doctor where you can be a part of a subscription and you can go there as often as you want to and they're all about your health. And it's not about giving you a medicine or a pill. It's about root cause issue in your life from therapy to exercise to meals to bowel movements, oddly enough even.

Starting point is 01:08:26 And I'm sitting down with this person and she's a well-trained physician, you know, well-trained doctor and she's got this new practice. Now that I'm telling the start, I'm realizing how much of a tangent this is, but follow me. And I'm sitting down there with her and I'm talking to her about her business because I just naturally I'm an entrepreneur and I think business and I think in code and all the things. I'm very right-brain business and very left-brain developer.

Starting point is 01:08:54 which is a fantastic place to be in life, I think right now, especially now. And I'm sitting down with her and we're going through this data that I have. And it's in this PDF. And it's on her screen. And I'm like, how will I get this later? And she's like, yeah, you'll get the PDF later. And I'm like, but your conciergeist doctor, don't you think you should have like a, this is my brain? Don't you think you should have like a formalized patient of record in your business, you know, and this kind of thing.

Starting point is 01:09:22 And like, here's this, here's this woman who's just really well off. and doing well. But she's not thinking about the data problem that people like you and I think about. And I'm thinking, gosh, I mean, the thing that scanned me earlier probably has an API. You could probably pull that data into my record. And then you could do that for everyone in your practice. And you could truly live up to your concierge's doctor. And I guess the reason why I tell you that is that you got these people out there,

Starting point is 01:09:48 these folks out there who are super intelligent, but they're not thinking about AI at all. And she was telling me how she's, she's really good at what she does, but she feels a little overwhelmed about the business side of her business because she's not really a business person. She's not designed to be a business person. And my response to her was, just use Claude. Do you know what she said, Matt? What did she say? Back to me. What do you think she said?

Starting point is 01:10:13 No idea. What is Claude? Yeah, nice. You know what I'm trying to say? Like, gosh. And so I had a brain dump on her. I'm like, okay, there's an API behind this thing here. Here's how you can pull your data over there.

Starting point is 01:10:27 You need a Postgres database here. I was like, okay, Adam, you're going to nerd. Then I explained like, she's like, and when I got to explain it to her, she's like, whatever that is, I need that. Can you do that for me? I'm like, yeah, I could probably help you with that. So now I have another job, by the way. That's wild. Helping my doctor, you know, formalize her practice on the future of AI.

Starting point is 01:10:50 And so all this to say is that you've got your friends who are developers that are not using AI. We've got folks that are super intelligent like doctors that are not really fluent in AI. And it's 2026. It's March 2026. And I'm a little nerve-wracked by these folks just like being so delayed. You know, even developers, you know, there's going to be some people who listen to this thinking like, Adam, stop drinking the AI. Cool it. I'm an AI maximalist.

Starting point is 01:11:19 It's not going away. The more you lean in, the better off you are. And you can probably attest to that, Matt, with what you're doing. But I feel like the folks that are just delaying it or feeling behind, I don't want them to feel behind. But at the same time, like, it's not going to go away and leverage it. I said, hey, if you don't know how to run your business or you need more help rid of your business, put all your problems in the cloud. And it will help you at least make a system to solve them, not actually give you the solution, but help you get to a solution. And no one's getting that.

Starting point is 01:11:49 It's like cheating on your homework. It is a cheat. It is a cheat. All right. Unless you have anything to say about that, let's end that tangent and go back to memory and stuff like that. What do you think? Yeah.

Starting point is 01:12:02 Slightly on this point, I recently got my dad using Grinola, and he's a doctor, and he's kind of fed up where he writes so many notes, and, like, Grinola has completely changed his whole workflow. Oh, I bet. And he like sees people on Zoom all the time. And he used to see, he sees people in person as well.

Starting point is 01:12:24 Now he just starts Grinola, writes up all of his meeting notes. And it's like, it's just, it's been like transformational for him. Just that like basic summarization. Like Grinolol is a great product. Don't get me wrong.

Starting point is 01:12:37 It's stunning product. But like, it is not a complicated workflow and it's like completely changed his like quality. Like how long he spends doing his consultations. So yeah, I guess like shout out there. Like there are there are some small things you can try. I love granola by the way. I'm a fan.

Starting point is 01:12:54 I'm actually paying user of granola. So yeah. Yeah. Oh wow. They should pay me. Come on, granola. Pay me. Yeah.

Starting point is 01:13:01 I love granola. It's amazing. And I'm with you on that too. I think I've even DM'd the designer. I can't remember his name in the moment, but I think he's named Sam, I recall correctly. Yeah, Sam. Sam's one of the founders. Yeah.

Starting point is 01:13:15 So answer your DMs. But I'm a big fan of granola. I think that's revolutionary. Same thing. My wife introduced somebody to who's also a doctor and she sits down with folks. And she would spend three hours of literally every evening cramming all of her notes. Yeah. Well, in this new world, you don't have to do that.

Starting point is 01:13:37 Now, you do have things like HIPA compliance here in the United States. You've got different healthier concerns where you have privacy and stuff. I totally get that. You should abide by all those things. And if we don't have systems that support them. that we should. But imagine the unlock in your life where you're a teacher or a doctor or someone like that where now you don't have to like arduously plan and think about your note process.

Starting point is 01:13:59 Now you can sort of have a lot of it formalized for you. And you don't have to do all of that work to even report back to folks or summarize this 45-minute session with a patient or a friend or a colleague or whatever. You can have it do it for you. That should just be the way. Anyways, I think we could probably go on that front for sure. I actually use Grinola on my personal stuff for if I have a really fun idea and I want to write something up about it. Because I'm actually horrific at writing.

Starting point is 01:14:32 Like I would call myself a critique of writing or a critic of writing rather than a writer. Like I love reading and I've read a lot since I was very young. But I really struggled with putting my words to be. paper in a way that like flows and makes sense and is cohesive and has a start, a middle, and an end and all of the good stuff that you need for writing. So something was like, dude, just start Gradola and chat to it, go for a walk and chat to it. And then come back. And then, yeah, and then make a really good prompt that's like, this is what I want to achieve from this. And that was the first iteration of the code mode blog post was something,

Starting point is 01:15:12 yeah, it was something similar to that. I thought it was. really, really good because, like, I got the points that I wanted to get in because I just spouted to the AI. The AI listened to me. The AI didn't quite summarize, but picked out the key bits of information because I really hate summaries. I think that they're rubbish that one person's summary is another person's, I don't know, mud or something. It's like really, really, really hard to get something that summarizes something well while maintaining the full information. But things like granola, you can export a nice blog post if you know exactly what you want. And I tend to know exactly what I want.

Starting point is 01:15:52 And I think AI is like an unlock for very opinionated people because you don't have to do the thing. You just have to be very good at critiquing the thing. Absolutely. Absolutely. I actually like that idea a lot. I'm glad you mentioned the personal use of granola because I have not considered granola. in that way where it's my personal note taker. Because it's great at that.

Starting point is 01:16:19 Hey friends, I'm here with Dan Mangus, co-founder and CEO of RWX. Dan, what makes RWX and the way you're doing CI so different and interesting to our audience? Obviously, we're talking to you because we want to promote what we're doing. We want more engineers to become aware of what we're doing at RWX.

Starting point is 01:16:35 But I think the thing that's interesting to me is that RWX is really kind of the first major evolution in CI. and the approach for CI. And this is just highly relevant with agentic-driven coding. You know, CI has largely been the same since the advent of the practice. But these platforms were created when being able to run code in the cloud was really valuable. The fact that you could spin up virtual machines that would run some automation on a Git push

Starting point is 01:16:59 was, you know, really impactful for engineering teams trying to like build good developer processes and tools. But that's kind of the extent. What we've done at RWX is we've taken state-of-the-art techniques, used in build systems at organizations like Google and Meta. You know, Google has their internal build system blaze, inspired the open source Basel tool. But every engineering team I've talked to that wants to adopt Basel

Starting point is 01:17:25 who just found it extraordinarily difficult to use and configure. You have to have a dedicated engineering team to build and maintain the rules. It's hard to extend it to work with different types of languages and frameworks that engineering teams are looking to adopt. So it's been, you know, too prohibitive to actually adopt, you know, those technologies. But the ideas behind Basil are really impactful. They're similar to a lot of the ideas behind Nix. I would say Nix is kind of very similar,

Starting point is 01:17:48 you know, in the difficulty to adopt. And effectively what we've done at RWX is we've taken those techniques. We've made it very easy for engineers or agents to actually adopt and utilize those, which namely are the automatic content-based caching and the graph-based task execution, which means that RWX eliminates all redundancy.

Starting point is 01:18:09 You know, whereas other platforms are having to run the same setup steps, on the same jobs, in every virtual machine that's spinning up. RWX can run the setup once on one machine and then fan out accordingly based on just your dependency graph. So effectively with RWX, you never have to think about parallelization at all. On other platforms, it's always like,

Starting point is 01:18:29 well, do I add this onto the existing job? Do I make a new job for it? But I have to duplicate all that setup. With RWX, you just define the tasks that you want to run in the dependencies between it. And we will run it with maximum parallelization, based on your dependency graph. Well, friends, a good next step is to go to RWX.com.

Starting point is 01:18:46 Learn more. Check out CI in a whole new way. Once again, RWX.com. Let's talk about memory. So we're going back into the Dietz. We're off of our personal soapboxes about how AI has changed our life and how it's taken some away, how we can't stop thinking about it, and how we prompt, et cetera, et cetera.

Starting point is 01:19:10 But take me into the world of, I guess, next few. agents, MCP, where does memory fit in? Yeah, so memory, it's like such a loaded term. It's such a loaded term. So it is quite hard to know where to start. Essentially, I want a way for my agents to remember a conversation that we're having right now and be able to refer back to context that I gave previously in the chat, but also to like remember conversations over time.

Starting point is 01:19:41 and also to be like very programmable. So I work on SDKs, like developers are going to program with my SDKs. It's like how do we, how do we build something that's mega customizable to like the next new, the next new trend? Like for instance, skills,

Starting point is 01:19:59 like skills are just a markdown file that's loaded into a context on demand by an agent. How can we support that in a memory system that can also support compaction of sessions, can also support content, learning can support like the migration of a session to long-term storage so an agent can like search over it over time. I guess I'm just trying to work out the shape of those APIs right now. There's some really good examples, like maybe not examples, but there's some really good

Starting point is 01:20:27 inspiration in the TypeScript world at the moment. Like letter is very, very cool. Letter just to name like a couple, they all have some, some cool memory stuff. And I know there are, there are some really cool memory startups that are actually like doing managed memory, like super memory.

Starting point is 01:20:47 Like I just like shout those guys out as like really good inspiration with what we're trying to do. What we're trying to do is not trying to replace anything like that. But it's like how, like Cloudflare has some really cool storage primitives. How can we let developers best use those storage primitives in the, in the function of making a better agent?

Starting point is 01:21:05 And I realize they're all questions rather than answers. And I don't have a huge amount of answers. So I'll probably keep my powder dry on that one. While you were sharing your ideas there, I was jetting down an idea I had. Now, this may be totally wrong. But this is how I'm currently thinking about if I was in your shoes. Go on. All chat captured to mark down or just plain text in some way, shape, or form.

Starting point is 01:21:32 So all your before compresses and goes away, it's captured. and you could probably use an AI gateway for that. Then you sent up an analysis across all that history. Then you vectorize that into a database. Then you SDK in front of that with two calls. Search and what was it, execute? Was it what you do? That's what you do right there.

Starting point is 01:21:54 And you just treat your vector database on the sentiment analysis that you've been capturing as plain text, just like you do your APIs. That's how you do it. Yeah. So is that wrong or is that not even close to right? How would you approach you? No, I think you're close. I think you're close. So there's a few things that I can't do with that that maybe consumers of my SDK would want to do. So I can't be that opinionated on on where the data is stored. Like some people might want to store it in a durable object in SQLite. Some people might want to store it in planet scale. Some people might want to store it. Like their data is going to live somewhere. And people normally very. opinionated about that. So I can't be like, here is a vector store you must use. Although I have to have the ability for people to use like vectorize if they want to.

Starting point is 01:22:43 So I need to go with more of a provider-based model, I think, in terms of like API design. And then the next thing about search and execute being a thing, yes, yes, definitely it's a thing. You've already made it, right? I mean, that's the model. Just leverage it. For longer-term memory, I think, and for search, like, for things that need to be loaded on, demand, yes. But there are cases where you would want to programmatically load context into a session. So the easiest one is, like, if you think of like a system prompt with some direction, like in open claw, I think they call it sold or MD, like what is the agent? Like,

Starting point is 01:23:24 who does it respond to? Like, like, what is its personality, all of this little stuff? This would need to be loaded on demand on the start of every session. So this is like slightly different. And then the next one maybe is, like a to-do list is some sort of working context, you know, like Claude Code had a to-do list. I don't even know if it still does anymore. But it kept the agent on track for a while. Maybe they RLed this out. But at some point, people wanted a to-do list that the agent could fill and modify over time. Like, this, that enables you do really cool stuff like create Ralph loops as well, which maybe we can talk about some other time.

Starting point is 01:24:02 But I need a way to be able to store all of this context in a way that's super flexible and also have that ability to do continual learning and this and extraction of facts and also have the ability to like for the agent to be able to pull in stuff like skills. So it's multifaceted and I'm still trying to work out like in my head like what I want to focus on because I don't think I can get all of these things right in the first time. I just need to make something flexible enough that when the new things do come, we can add them in without breaking changes. And when you're speaking of memory, you're speaking of it as part of one of the Cloud

Starting point is 01:24:44 Thor products you work on, not so much. I mean, I'm sure you have personal curiosities and how you can leverage it personally, but you're talking about how you can bake it into agents, for example. Yeah, I think at some point this might be a separate SDK, but yeah, like agents SDK will be where it lives initially. Yeah. So people building agents on cloud-fledgerable objects. But like theoretically there is nothing to say, like if you're building something on a

Starting point is 01:25:12 ECS somewhere, like a container somewhere, if you're building it on like a Lambda function somewhere or on like you have your your next JAS routes on the cell. Like it should be pretty cross-compatible for all of these things. Like there shouldn't be anything runtime specific. I think the provider model there will help because, yeah, sure, we can use the durable object SQLite, but also if someone wants to use Neo on a planet scale, they should be able to do that as well. Yeah, for sure. Would not want to dictate where you can store it at, maybe even one to many stores. I don't know how the hard that would be, but, you know, that's where I would start to, I mean, that's what this is, right?

Starting point is 01:25:53 It's all exploratory. It's like, that's the basis of how I would initially approach it. And I might hit two brick walls and hurt real bad and learn something new and read. a book. You know, I've become a real big fan of ePUB books. I've got an ETL that takes a book from EPUB to, you know, really good markdown and then sent up analysis on that and then vectorizing things across it and just searching it with DuckDB and Parquet. So reading a book now is like is way different than it was before. So thankful for open format ePubs out there because that's the way to do it. And like between DuckDB and Parquet and this, I mean, that's super

Starting point is 01:26:32 fast. Those few things there would really lean into what you're talking about with memory and that that lookup process. It's super fast. Yeah. No, definitely. And like, I was chatting to some of the more data engineering people in Cloudflare. They were like, yeah, so how can I use Clickhouse? How can I use Clickhouse? And I was like, ah, ah, shit. Sorry, you beat that one. But like, like, how, how, yeah, I don't know. I don't know. What's special? like, what's special about Clickhouse that you don't think you can get from Postgres? And then he kind of rolled his eyes at me. And so that was how the conversation went.

Starting point is 01:27:12 A million more rows, so much faster, but really hard to set up. You could do it on your own. You can on-prem it yourself. But it's definitely a ceremony. I mean, it's a lot to run. I mean, but with at Cloudflare scale, you got all of that, right? I mean, I would run Clickhouse if I was on your team. Definitely, definitely.

Starting point is 01:27:29 But DuckDB and Parquet, you can run right on your Mac. I mean, you can just run it right there. and it's super fast. And you can have a ton of usage just in one context. But as a product, you may think about it differently. But DuckDB and Parquet files is like, it's the way to go. I have some telemetry for an open source code of you project I did a few years ago. That that just dumps everything in DuckDB.

Starting point is 01:27:53 It's quite good, actually. I really like it. Yeah. I mean, it's really interesting too because the agent knows how to talk to it really well. And so rather than you having to learn how to retype, you know, queries into it, the agent can query it for you. And I'm like, make me a just file command for that. And so when we sort of like centralize on a query or on a style of query,

Starting point is 01:28:17 just turn that into a just file command. And I throw it a few parameters. And it's like a just in time CLA in a way on a large dataset that's super fast. That's awesome. To query that database any other way. is just stupid. Like, why would you do it the hard way? That's the easy way.

Starting point is 01:28:35 You know, that is the way. Maybe I'll do that with my, with my claw. That sounds pretty fun. Yeah, I've been thinking about, there are some things that don't work in the situation I'm in, and there are some things where I can really take inspiration from, like, home labs. So I've been, like, building my, my claw and, like, playing with,

Starting point is 01:28:55 I really like pie and pie agent from, Oh, yeah, I heard about that. I haven't played with it. I heard about it. You should play with that. There are many like it, but this one's mine. There are many like it, but this one's line. Yeah, yeah, exactly.

Starting point is 01:29:08 Pi. dev is what you're talking about? Yeah, it's really well, really well built. Like some of the best, some of the best type script I've seen. It's like really nice. Such a cool domain name to, P.I.dev. Yeah. It says there are many coding agents, but this one is mine.

Starting point is 01:29:24 This one's cool. It's cool. I haven't played with it, but I saw it. I was like, yeah, that's a good nod right there. Okay. So the provider model that they have and like the lower level primitives, so not necessarily the agent. I don't tend to use the agent, but when I'm, if I'm building an agent, then their primitives

Starting point is 01:29:43 are pretty cool. And I think I'm still in the specking phase of like working out how exactly I want to run like my like personal AI. Yeah, just like finding nice product avenues from different products I like. I really like poke from interaction. I don't know. Was this like, did we talk about this? No, we didn't talk about this yet.

Starting point is 01:30:06 That's my last call. Yeah, poke from interactions, really, really nice, like how they do like the stateful workflows in the background. I take a lot of inspiration from other products like that. Yeah, you've got to be a consumer. I mean, consume everything, everyone's creating around AI, all the new innovations,

Starting point is 01:30:26 even if they seem silly and toy-like, there's some little thing that's going on. on there that is inspiration elsewhere. I mean, I've been a home labored for a very long time now. I would just say I feel like I feel so thankful to be this knee-deep in Linux than I ever was in my life because, you know, it's such a, it's a superpower right now. So to the right of me, I have a prox, mox box with just way too much RAM and storage and CPU available. And so I essentially have my own cloud here.

Starting point is 01:31:00 So I can just like unleash my agents. I can build something, deploy it to that, and battle test it in almost real time on my own hardware. And I have to send it to the cloud and deal with keys and deal with payments and just whatever comes with that. I can like skunk works whatever I want right here. And it's too easy. And shout to my buddy. As a matter of fact, on the pod recently names Adam Jacob. If you know Adam Jicki

Starting point is 01:31:28 from chef But Swamp.combe has changed my life, y'all, okay? Matt, you got to check this out. Swamp.combe. Okay. Waste a whole day. It's not a waste. Spend a whole day on swamp.combe.

Starting point is 01:31:45 And learn what you can automate, especially if you have like a little actual raspberry pie. It is software automation like you've never seen before. I'm just telling you that much, man. It's insane. Okay, I'll look it up. Swamp.com. I'm serious.

Starting point is 01:32:00 I'm enamored by this stuff. I love Adam. He's a good friend of mine. He's a super big and open source. System initiative, you know, automating infrastructure, et cetera,

Starting point is 01:32:10 but it's amazing. So I've been doing that with my ProxMox. So ProxMox, if you're not familiar, is a hypervisor. So you can host VMs, LXC containers on there. And so it's like a mini cloud,

Starting point is 01:32:22 basically, for you. And so, standing up a new VM on ProxMox is a lot of clicking in a GUI. Old days, right? Who's doing that? Well, with Swamp, you just tell Swamp, hey, this is the IP of my ProxMunk server. Automate all the things. And I'm compressing all that down to that one phrase.

Starting point is 01:32:41 It's not exactly that, but it feels like it. And so I had this Go-C-LI that I was writing that did everything that Swamp did for me in minutes. And I wrote that with AI too, you know. but Swamp automated so much stuff in my ProxMoc server. Let's make up a new VM, hardening that thing to be a DNS server, adding tailscale to it with my off key, with my secrets, standing up one password on there for my secrets distribution.

Starting point is 01:33:11 I mean, like, amazing stuff. It automates so quickly. That's cool. Much like you, much like code mode, it actually writes code to, it doesn't, it creates it via writing type script,

Starting point is 01:33:23 workflows and modules and models and workflows and it's just so wild that what he's done with there is so it's a lot of like what you're doing with code mode where you're like rather than calling all these tools you write the code that calls the tools kind of same thing in a way but check it out

Starting point is 01:33:39 yeah no definitely definitely it looks like it it's super cool if you're not home labing though is you got to be home lab and and what I mean by home lab it is like literally standing up your own VM literally standing up your own Linux Ubuntu Fedora pick your distro Debian go wherever you want and just play.

Starting point is 01:33:57 Don't drink the Cloudflare Kool-Aid too long, man. Get your own VM. Get your own Linux. Play with your own keys, with your own rules, with your own pseudo. And feel the medal, man. Feel the metal.

Starting point is 01:34:10 I have a couple of Raspberry pies looking at me from the corner of my room that I need to do something. Plug a man, man. Ethernet those things. Yeah, let's go. Get them in there, man. I have a great friend

Starting point is 01:34:20 and he's one of my colleagues now, actually. since I joined Clubflare. And he's been telling me for ages that, like, it's K3s, right? He's running K3s on his Raspberry Pi's. He has like a whole long cluster of them. He keeps on adding another one every now and again. And he's got his agent deploying, yeah, like deploying apps, running apps, like on different pods. It's like, it's kind of wild.

Starting point is 01:34:47 It is wild, man. Yeah. I think I've got a bunch to learn about Kubernetes. I mean, even the stuff you're talking about here, too, I mean, you could, I mean, now you do have the, you know, the Cloudflare account. And so you, you have the world's oyster in front of you, so to speak, in terms of compute and power. So, I mean, I'm not saying you shouldn't use that. But there's something that changes when you go on-prem, home lab, feel the true metal of the actual physical hardware, install an actual operating system onto it, whether there's Debian or ProxMox, which is actually built on top of Debian.

Starting point is 01:35:21 You can actually install Debian and then install ProxMox on top if you wanted to. Or you could just use the ProxMux installer and just isolate from the stop, you know, from a bootable USB. Point being is like literal metal, choosing your RAM, choosing your CPU, choosing your disk storage, MVME, of course. Like there's something to that where you take parts and you make it and then you put the thing on it, which is Linux, of course, and then you build on top of that. Like just something about that in this world of AI that, especially now, right? Like you may feel a little lost or inadequate with Linux.

Starting point is 01:35:58 Maybe, maybe not. Well, Claude is not. I have a question to ask you. Claude is not. So are you running any local models? No. And the reason why is because I'm not enough time and too lazy, I suppose. When the world's best models are available to me with the credit card swipe, I have

Starting point is 01:36:18 more of that ability than time to, I even have a GPU and I'm just not even using it because all of my interests, like nothing has to be private to that point. So I'm just like, why would I do that? Cloud's right here. Codex is right here. So I'm

Starting point is 01:36:34 primarily lately a Codex GPT, GPT5, I guess, 54. Usually on high, not medium because medium is not cool. High is cool. Extra high is super cool, of course, but it takes about a year on extra high it does but you get some really deep thoughts

Starting point is 01:36:54 you know for the good stuff I go there but not for most things I'm just I'm just hanging out in high no not a lot with models because I just find that all my problems don't require local models and I'm not trying to be private about any of this stuff in the way that I feel fearful to be private you know it's not like I'm talking about like this goiter I've gotten it's a medical problem I mean you know I don't know

Starting point is 01:37:15 but I'm not talking about anything that's embarrassing, I suppose, and not doing anything nefarious. So a local model is not needed for me right now. Do I plan to? A hundred percent, Matt, I would love to. I would love to have more time to play with local models. I just don't. So I had a good experience at the last startup I was at where we were building like basically a glorified PDF passing pipeline. And I got to play with some local models there, which was really good fun, because we ended up hosting our own on H-100s because there was no need to go to the like the top of the range like GPT5 in them in that moment. It would have been way too expensive. And so we needed to cut some

Starting point is 01:38:01 costs and these didn't have a huge amount of usage. So it was like it was really good to use H-100s and play with it and like do a little bit of tweaking about like which model like have have a couple of eva house. Oh my God, this model couldn't do it. This model couldn't do it. This model couldn't do it. oh my god this model managed it right can we do use this can we use this size but can we go or can we go a little bit smaller with this with this brand of um of model this version can we go a little bit smaller a little bit more quantized does it still manage our evals like that was really fun that was a lot of tweaking it was a lot of fun but um so i i have some like pull to want to like play with some of the new open source models i mean that if you're feeling about left behind

Starting point is 01:38:43 those open source models, they make you feel left behind every three weeks. Every three weeks, you're like a new version came out of like the model. A new version. Some of the new. Yeah. A new leapfrog. It's a tough game, that. A tough game.

Starting point is 01:38:56 You know, the one thing I will say this here, and you might enjoy this as a fellow homelabber up and coming, maybe. Definitely. Is I've written a DNS server in Rust. It's called DNS hole. And I've been teasing my audience about this for a while. I'm sorry about that, but I am getting really close to releasing it. I just did some really cool code review on it.

Starting point is 01:39:18 It was super dope, but, you know, I'm just nervous, I suppose, about releasing it to the world. But I'm using it. Right now, it's my DNS server as we speak right here, right now. And it's a replacement for piehole. So since you have a Raspberry Pi, you may hear about one of the first things you tend to do with a pie hole, or sorry, with a Raspberry Pi, is install a pie hole or stand up a pie hole in your home lab or on your land. or on your land. And so I've written this DNS server,

Starting point is 01:39:46 but I have this idea for kind of like a, I want an AI that constantly sniffs my traffic. So rather than me build my block list based upon the various actors, I want the agent, the AI, I suppose, to sit at my network and pay attention to all the real time, the hot path traffic that my DNS is resolving. And I want it to, to, I suppose with intelligence, with AI, add to my block list because it knows. I don't want to have to manage my block list, Matt, you know?

Starting point is 01:40:23 And so I want to have an add-on that calls the API and paste it into the traffic and it's got that hot path. And it wouldn't be the primary DNS server because that would be stupid. Let's put it to the sidecar of that. But one place where I wanted to play with the local model was in that is I wanted to have the hot path of the DNIS being. resolved and then an AI that's localized right there, but a very small parameter, like a 1.5 or a 3 billion parameter kind of thing, just enough to be intelligent about that kind of traffic. And if it's something that it doesn't really know about, it just sort of files it as like this needs deeper investigation.

Starting point is 01:40:57 But for the most part, it can sort of classify most traffic as good or good or not good. But it's going to manage my block list for me rather than, and the cool thing about that is you may say, people say, go get this block list or that block list. Well, that, that block list is not based on my traffic. You know, and so it's this massive list that is contextually not really true to my network. And so my idea is like, let's let's add an agent in the loop there. And let's make a local model. And let's let that thing determine my block list based on the actual traffic coming into my network.

Starting point is 01:41:37 And so the plan I have in place, which I don't have time to build. yet is around 5 to 10 seconds after the DNS gets called the first resolution of it, this agent will be able to infer it, check it, and add it to the block list within 10 seconds of it entering my network. Now, I don't know how you feel about that with security, but that's about as close as instance you can get, right? It's not days later. It's not somebody else's block list later that I'm once a day sinking.

Starting point is 01:42:07 It's literally based on my traffic, almost in. real time. And the moment it's seen, it's evaluated and added to the block list. And how would I know? Like, what is a, what is like a key indicator of nefarious traffic? Man, I'm glad you asked this. I mean, subdomains is a big one. So a lot of these, a lot of weird characters, man, I wish I had my notes in front of me. There is a really cool, let me see if I can get my notes in front of me. It's essentially, I'll figure out the name of it, but I'll paraphrase what it is, because I can tell you that part, but I can't tell you the name of it in the moment. But it's essentially like saying, okay, you have the name Matt, right? M-A-T-T-T. Check. No problem with Matt.

Starting point is 01:42:53 But now if you do M-Z-A-1-T-T, that's a weird characterization. So it essentially watches. It's a name for a thing that knows what the proper sequential in English or any language should be. And so when the characters are off, it flags it. And that happens a lot in the various actions. So like Google.com is a pretty, it's an easy way to spell out a domain or even p.i.d.v to pull it back to our friends of pi. Dot, right? That's normal.

Starting point is 01:43:26 That passes the test. So just based on the domain alone, which is DNS, it's like, well, is this a really weird subdomain with weird characters flag that immediately. And so when you look down all these block lists, it's a lot of that. And so just on that alone, you can, at the DNS level, which is like network lookups, that's the most secure you can be when it comes to stopping something in a network. Just based on this one algorithm alone, you can stop 99.9% of bad traffic that should be on, should not be on your network.

Starting point is 01:44:00 So just that alone. And that's not even intelligence. That's before the AI. So just based on that algorithm, I will check all those and block based on that. Or flag it for the AI to go and do deeper analysis. And the AI will take care of the 0.5% or 0.1% that that can't catch or that doesn't catch. That is truly nefarious and needs a little bit more sniffing. That's what I'm building.

Starting point is 01:44:23 Cool. It's dope, man. It's dope. It's dope. It's dope, man. That's cool. That's why you need a home lab, man, because then you have, you. these kinds of ideas, man.

Starting point is 01:44:33 You start worrying about your DNS and you start worrying about how you can block the nefarious actors. Because, like, you know, all those block lists out there, they don't do you any justice when it's not your network and not your traffic. Like, you're doing real time. 10 seconds later, after the first lookup.

Starting point is 01:44:50 Yeah, that'd be cool. That'd be cool. Kind of off track. Happy to ran about my DNS whole, which is super cool. But I do want to bring it home for one more thing before we tail off. is I would like for you to give the audience a takeaway in some way, shape, or form. If folks are like, you know what, man, this code mode is so cool, how do you use code mode?

Starting point is 01:45:11 Like, what's the first step? Give us the first step to using code mode. And how do you actually build day to day with code mode? Yeah, of course. So I guess, like, the first thing would be to go and have a look at the blog post or dump it in your coding agent. So it's like blog.com. I think forward slash code dash mode dash MCP and we'll hopefully pop it in the show notes. I think that's like, if you dumb that in your coding agent or just like have a read, I think it's a

Starting point is 01:45:41 decent read, give you a lot of insight. But the main thing is try not to let your AI do determinists, do like individual discrete actions. Try to write an SDK, write a CLI. Like a CLI is also code mode in some way. because the model is writing, if it's writing bash, as far as I'm concerned, it's writing code. You know, like, I prefer to write typescript or Python, but if it has to write bash, then go to town. So, like, let the model write code and get out of its way. Just, like, let it roll.

Starting point is 01:46:18 It'll be fine. Let it roll. Yeah. And then if you want it to be, like, secure and you want to deploy it properly, then have a look at deployment options for building, or using like a sandboxed interpreter or some type of sandbox, whether it's like a VM, whether it's, I know Piedantic released something pretty cool called Monty, which is like a Python interpreter built entirely for code mode. That's pretty cool.

Starting point is 01:46:45 Or dynamic worker loaders, like the Cloudflare option for running JavaScript. Like really, like, yeah, sure, it's fun. You can run code locally, just eval it. But if you want it to, if you want it to be safe, secure and. I don't know, deployed properly, then have a look at some of those options. But really, yeah, the main takeaway is just let the model write code, dude. Let the model right code. Yeah, I assume you prefer TypeScript over Bash.

Starting point is 01:47:11 I don't mind Bash. Bash is the linguifranca of agents these days. TypeScript is the second best, I think, for the agents, but they're really fun with Bash. I just let it rip. Yeah. I like Bash. I just think there'll be, as an easier permissions model. if you're generating a TypeScript SDK

Starting point is 01:47:32 because the first thing that the disclosure of features is easier, it's much harder to generate, you can't generate types for a CLI, so the model has to go through each individual, you have to have a skill or something to tell the model to call the CLI to begin with, and then the model has to go through and look at each of the options,

Starting point is 01:47:54 call help on each of the options, to find the one it wants, whether if you can generate types up front and you give some more information, some more like concise information. So I like that. And secondly, yeah, the permissions model I think is better. If you run JavaScript, you're running or Python, you're running it in like some type of sandbox interpreter.

Starting point is 01:48:17 You can take the fetch requests that are trying to leave that sandbox that in terms of us, it's our dynamic worker loader. Like you can take the fetch request that are trying to leave the isolate and you can be like inspect them, have a look at them. Like what's the model trying to call? Does it meet your permissions set? Does it need special permissions? Can you, do you need to add authentication tokens?

Starting point is 01:48:41 Do you need to add like what was it trying to do? Like basically you can have like this like anti-corruption layer, this ACL that sits around the, the model like the code execution. And I think that's a better permission. layer than just like yoloing into a terminal. I like that. I'm going to give you a nugget. I think you might like this since you're speaking like that.

Starting point is 01:49:07 Go on. Hit me. I have this thing I've been building into all my CLEs lately. Well, I'd say lately is like the last three months, maybe more, is dash dash agent. Nice. You've got to add this flag. So I mean, we humans, we love dash dash help, right? But agents don't have that.

Starting point is 01:49:22 And our help is not their help because they parse things differently. They like markdown. So my dash-dash agent essentially is tell the agent what this thing is and how to use it in markdown. And that's what it does. It responds with a markdown, stand it out, you know, to the prompt. And so that's my gift to you and everyone else. I've been doing this and getting great results, but you can throw it on any command dash-d agent. So, you know, CloudflareD login dash-d agent.

Starting point is 01:49:53 Like, what is this login kind of thing? and explains it to the agent. That's a pretty easy one though, but something maybe more difficult might be like CloudflareD tunnel new or something like that, dash-dash agent. And it explains it in Markdown to the agent

Starting point is 01:50:06 how to use it. And so when we have code mode, I suppose, in a CLI, you can give it the same next best tool. So it can parse all the commands and figure it out, but you can give it one more easier nudge by doing dash-dash agent versus dash-dash-d-d-d-

Starting point is 01:50:22 That's cool. It's cool. I like the nugget. I like the nugget. Have you not thought about just doing dash-dash help, but detecting whether it's in an interactive environment? And if it's not-interactive, then printing a markdown? I suppose you could do both, really.

Starting point is 01:50:35 I mean, I would alias at that point. I didn't think about that. That's a good point. My thought was just really like, I want something special just for the agent. And that sounds cool than dash-dash-help. But you can certainly alias it where the first-class citizen is the dash-dash agent, and then if you're doing dash-dash help and you're,

Starting point is 01:50:55 And you didn't tell it to do that and it's discovering it, determining if it's in an interactive terminal, just give it the same thing and just alias it. That's a good point. I think there's a lot of trying to make CLYs. Like some CLIs, everyone's been trying about how cool CLIs are, but some of them are not natively useful for agents. Like some of them rely on interactive process.

Starting point is 01:51:18 The one that really bugs me, and we need to finish sometimes soon, but the one that really bugs me is that, have you ever used change sets? No? Well, change sets, CLY is entirely interactive. It's like a package version manager thing. It deploys new versions when you do a change sets and then like you put your change notes in there and it collates them all together when you do a release and stuff. It's like a management for open source package or for packages.

Starting point is 01:51:48 It's really good. I would recommend. It's really nice, but they don't have a non-interactive version of their CLI. if a model tries to do MPX change sets, it just like, it freaks out because nothing works.

Starting point is 01:52:03 So it's so annoying. All I want is NPX change sets, the package name, and then the change log. And I might make a PR for this because, like,

Starting point is 01:52:14 it would save my life. Yeah. I think the more we can go out of the agent's way with interactivity around that. Like, I don't mind to see a lot being designed for a human,

Starting point is 01:52:23 but also agent-aware, or agent native. Because I still use sealize myself, or at least I want to in some cases. But in most cases, I'm just like telling the agent to do that stuff for me. Yeah. Because why would I do that anymore when I don't have to? I can just like let that one thing over there spin and do this thing here and here and here.

Starting point is 01:52:43 I mean, that's the better world in dramatic cases, really. So there you go. Yeah. We have gone long and I appreciate it. We went deeper than I thought on some, some cool stuff. though, but I enjoyed it, though, very much so, Matt. I'll link up obviously both your blog post as well as the original code mode blog post that we talked about in the show.

Starting point is 01:53:06 I have my robots treasure trove, this entire transcript for all the cool stuff and the bits and bobs. It's all in there in the show notes. So it'll all be in there as best it can be. And if not, our show notes are open source on GitHub. So if you missed something or you want to add something that is contextually true, then send a PR, I guess, or have your agents in a PR or I don't know, something like that. Awesome.

Starting point is 01:53:32 Matt, thank you so much for all you do, man. It's fun talking to you. Thank you. Lovely to me. Well, friends, this show is done. Thank you for tuning in. I hope you enjoyed this conversation I had with Matt, Carrie, from Cloudflare. Wow.

Starting point is 01:53:47 I mean, like, seriously, some cool stuff happening in and around this agent space, MCP, APIs, what Cloudflare is doing. They're doing some really incredible stuff. I got to tell you, during the podcast, maybe you can tell. I got a little fomo. I kind of wanted to work at Cloudflare about midway through this podcast. You know, I feel like I can make a dent there. I don't know about you, but I feel like there's just so much to do,

Starting point is 01:54:12 so much we can build in this very moment. I'm having some fun building my own stuff. But hey, that's it. This show's done. Thank you for tuning in. We'll see you again. So soon. So soon.

Starting point is 01:54:21 Thank you.

The Changelog: Software Development, Open Source - MCP on Code Mode (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.