The Changelog: Software Development, Open Source - Flowing with agents (Interview)

Starting point is 00:00:00 Welcome back, friends. It is your favorite podcast. That's Change Law. We speak to the hackers, those leaders and the innovators living at the speed of AI. And if you don't know, everything is changing. Now, I'm joined today by my good friend, Beyond Lou from Sourcegraph, this time talking about AMP. Check it out at AMPcode.com. Amp is one of the many and one of my favorite agentic coding tools to use. What makes it different is how they've engineered it to maximize what's possible with frontier models, autonomous reasoning, access to the Oracle, super cool, comprehensive code editing, and complex task execution. That's nearly verbatim from their website, I admit, but it's also exactly what I've experienced. We talk through all things agents, how I might have been holding it wrong, and we even talk through my idea called Agent Flow. If you're babysitting agents, this episode's for you. A big, massive thank you to our friends and our partners over at fly.io. That is the home of changelaw.com. Learn more at fly.io.

Starting point is 00:01:10 Okay, let's agent. Well, friends, the news is out. Our friends over at Coddrabbit, codivot.aI. They've raised a massive series B, and they've launched their CLI reviews tool. It is now out there. been playing with it it's cool the bottleneck is not code the bottleneck is code review with so much code happening so many people coding now so much code being generated and so many things competing for developers time and attention to

Starting point is 00:01:46 maximize code review still remains a bottleneck but not anymore code rabbit CLI code reviews code reviews in your pool requests code reviews in your VS code and more teams now have a true answer to what it means to code review at scale code review at the speed of AI and code rabbit is right there for you you go learn more at codrabbit.a i will link up their latest blog announcing their series B and their announcement of their cly review tool again codrabbit.a ii beyond lu welcome back to the change log let's go as deep as humanly possible on AMP. Not source graph, necessarily, but AMP. What do you think? Cool. Yeah, it sounds good to me, and thanks for having you back on the show, Adam. What is AMP? Yeah, I don't think that it takes too much

Starting point is 00:02:42 explanation. It's AMP is a coding agent. So most people, I would imagine your audience, know what that is at this point. But, you know, if you've been living under a rock for the past six months, a coding agent is essentially a AI powered program that takes natural language instructions and then goes and like modifies, uh, gathers context and then modifies the code following your instruction. And so it's, it's much more, um, kind of like high level. You kind of describe what you want. Uh, you figure out how to instruct it. And then it does and figures out how to, uh, it figures out how to do most of the editing, uh, for you, you know, runs the tests, uh, runs compilers. it can correct itself.

Starting point is 00:03:25 And the idea is that we want to enable programmers to operate at sort of like a higher level. So, you know, dictating what the architecture should be, what the key interfaces should be, what the UI should look like, and not as much have to be in the weeds of every single line of code that is written. And then AMP in particular among the landscape of coding agents is distinguished by the fact that I think we're the only coding agent that I've come across that has this approach. of, so we're a multi-model. We use multiple LMs, which is not distinctive. There's a lot of coding agents that use a variety of underlying large language models and small language models now. But I think our approach is different in that we don't have like a model selector. It's not like a model harness. So an AMP user doesn't think in terms of like, oh, let me use, you know,

Starting point is 00:04:15 Claude Sonnet 4 today or let me use GPD5 today. We view that as basically like an implementation detail. So, like, we'll figure out what the models are good at, and the contract of the end user is an agentic experience that hopefully just works well for your use cases. So there's no toggling on different specific models. It's more like, oh, you know, for this particular feature or sub capability of AMP, we're going to use this model because it has the right characteristics in terms of latency, intelligence, and, like, competency around, you know, one particular task. Maybe just a side tangent, which I didn't really plan to do until this very moment, but I was thinking back to the last time we had a conversation. And I think at the very end, I'm not even sure if it ended up on the air or not. So this may have just been side conversation between you and I and maybe just from feedback.

Starting point is 00:05:10 I think it was, how in the world do I sign up for Cody? And so obviously we're talking about AMP. I don't know where Cody went. I've literally forgotten about Cody until this very moment. and that feedback I gave to you, which was, hey, I'd love to play with your stuff, but I don't feel like I know how to do it. I think I have a source graphic account.

Starting point is 00:05:28 I've tried 16 ways, and I keep beating, you know, no progress. So there you go. Oh, you've run into like off issues with AMP? No, no, no. This was Cody back the one we talked. Oh, yeah. I don't even know when we talked,

Starting point is 00:05:41 maybe eight months ago, 10 months ago or something like that. Yeah. So where's Cody? So I would say. The Cody is still alive and well. Okay. enterprise. Okay.

Starting point is 00:05:53 But we've sort of moved away from it for non-enterprise use cases. So there's still plenty of enterprise customers that use it quite heavily for, I would say, like non-agentic AI coding assistance. And that's still a common feature among a lot of large companies. There's some companies out there that even have like a legal prohibition on anything agentic. However, you want to define that. You can't say agent inside.

Starting point is 00:06:21 some of some orgs. What's the alternative word to agent if you can't say agent? Oh, I think sometimes people say like workflow and that means a certain, you know, it has a different, there's no like single source of authority on what agent or workflow or any of these terms mean, but there is a different like connotation or zone of what that definition means. And so, like, I think some more regulatory or, like, legally sensitive orgs, they're okay with workflows, but not anything that's, quote, unquote, like, fully agentic. And I think it's, it's just like, it's hard to pin these down. And at some point, it's not worthwhile

Starting point is 00:07:03 discussing it too much depth, because at the end of the day, it's like, you know, a certain customer has this requirement, and, you know, we're fine meeting that. They determine they need that. And, you know, we're happy to serve them where we can. But long story short is, Cody is still alive and well in the enterprise. It's used for non-agentic workflows. But outside of the enterprise, the reason why we built AMP outside of Cody was really, we felt that there was this very big technological shift that was happening. If anything, it's like, you know, the whole, like, Gen A.I. phenomenon,

Starting point is 00:07:38 there's multiple waves of technology that I've actually gotten lumped into one umbrella term that people call Gen AI. But we view the agentic, we view agents as like fundamentally a different technology and a different application architecture than the pre-agent, more like chat-based AI. And so that to us was like we really needed to think from first principles in terms of how we built this thing in order to fully, in order to unlock the potential of agentic models. And so we didn't want to be hampered by the design constraints and the way that we built Cody originally because that was very much. for the, like, chat LLM world. And if anything, a lot of the best practices for building with chat-oriented models,

Starting point is 00:08:22 it's like the inverse with agendic LMs. Big shift, big waves changing as we have this gen AI world. And we've been, what, three-ish years deep into this phenomenon, as you have said, right? Just under three years? Yeah, just under three. It feels like a decade. I don't know about you.

Starting point is 00:08:43 I feel like it's been a decade. Yeah, it's wild. I'm excited. Are you excited about this change? I mean, I feel like I've had like peak hype and then a drop down to reality, then peak hype, then drop back down to reality. And I feel like now I'm just sort of like peak hype all the time. Yeah. I mean, definitely excited about the technology.

Starting point is 00:09:06 I think it's like got huge potential. I would say like I was never like excited to the point where I believed in the sort of like AGI myth or like this, this never. that it would eliminate all need for work or, you know, kill us all. I thought that was always like a fairy tale told by people overactive imaginations. But definitely very excited about the potential for this to eliminate toil for all kinds of knowledge work, not just developers, but especially developers, because that's who we sort of build for. Yeah. I feel like it's a, it's obviously a force multiplier. I still feel like it's this weird genie in a bottle where you have to conjure it in certain ways. You have to hold it delicately. I feel like even in my own experience, it kind of, it's smart in some cases and then really dumb in some cases. I have some questions for you about that to sort of demystify when I feel like, gosh, I've, we've done this before and you've been really good. And now suddenly you have no idea how to commit to my Git repository. You know, like, that's,

Starting point is 00:10:09 that's the most basic function that you could possibly get is like get commit dash m blah like you know that's pretty easy but i feel like i've had moments where i've had like you know how to commit to a get repository right and i'm not speaking it to that way but in my brain i'm like wow this thing was really smart and now it's not really smart so i feel like there's waves of that even in my own personal usage yeah i feel like so like the committing to get i think that's i mean at least for like amp that's largely solved now it's been a while I've seen like a class of error. I'm not blaming somebody else.

Starting point is 00:10:43 Okay, gotcha. Like, I think those sorts of things, like, I have, like, very high confidence that, like, it could just do now. But I get what you're saying. There is, like, the spectrum where some things that you, a human, would consider difficult,

Starting point is 00:10:55 it can just, you know, almost one shot. Whereas other things where a human, you know, it'd be like a two-line change or whatnot. And it will just, we would call it, like, doom looping where it's just like iterating

Starting point is 00:11:07 over and over again. It can't figure it out. even though you're saying like, okay, now try it like this. Try it. You're trying to like nudge it and it just like just won't get it. There's certainly that phenomenon that's still at play. And it's almost at the point where it's like a lot of this is it reduces to the underlying model intelligence. And so I think the proper way to view this is it's, you know, part of it is it's like, okay, okay, like there's some improvements that we need to make the product in terms of like how we present the answer to the user

Starting point is 00:11:35 and help people like prompt in a way that gets you to the sweet spot. of the model. But there's also like users have to view this as like a skill set they need to acquire as well. So like the more you use agent decoding tools, the more you develop an intuition for

Starting point is 00:11:52 what it can do and can't do. And so you can almost like tell from talking to a person how much they've used these tools in terms of like what they express their frustrations as. Like a lot of newbie users, especially the ones that are like

Starting point is 00:12:09 from the onset skeptical, for whatever reason. They will say, like, oh, you know, I asked it to do this thing that ought to be very easy, but it fell flat on its face. And then you asked to see what their prompt look like, and it's like five words. And it's almost like at that point, it's like, you know, what do you expect to do? Like, it's not a mind reader.

Starting point is 00:12:27 There's almost like an information, like, you put a certain amount of bits in, and from that, the model has to tease out what your intention was. And, like, it can certainly accept, you know, instructions that are like five words. words long, but if it's only five words, and it's going to be very much, like, you know, based on, like, what its prior behavior wants to do. So, like, if you have, like, a five-word prompt to create, like, a simple React app or a simple game or whatnot, it could probably

Starting point is 00:12:52 do that, but it's not going to be probably what you intended, unless what you intended is, like, the median React app or the median JavaScript game that's represented in its training set. Whereas the more expert users, they still have complaints, but it's oftentimes around very specific things. It's like, okay, I know what it's good at. I'm not complaining about the fact that it can't do these things anymore. I can still do those things. It's fine. It's more around like, hey, I would love to be getting more out of this tool, but there's certain bottlenecks that are kind of constraining how much I can use it. I think one very important bottleneck that still remains is code review. So, like, I think most expert users of coding agents realize that

Starting point is 00:13:37 like, you know, they're very powerful and they want to be using them heavily, but you can't trust them fully. Like, you still got to read through the code that they emit and understand it because otherwise the slop will creep in. And they'll make subtle changes that if you don't catch and are not wary of, we'll add complexity and, like, nuanced bugs, subtle bugs to the code. And so this process of like the human understanding the code that's been generated and ensuring that doesn't do anything very incorrect. or not according to their intentions, that's an important part of the process. And it's quickly becoming, I think, like one of the key bottlenecks

Starting point is 00:14:16 in this process among the folks that are really, really trying to push the frontier of what they could do with these tools. Yeah. My case in particular, I'll call out the tool. It was not AMP. Just so you know. It was Claude. And it was around 11.30 p.m. 11.45 p.m. Central Standard time literally last night and the example was I'm obviously you know using cloud code

Starting point is 00:14:46 well not obviously I'm using cloud code on the commandant so I'm using their their CLI tool so in my directory cloud opens up their CLI tool and then there you go in my case I think it may have been a recent context window clearing either way just feel like like simple tasks were getting harder and harder for it to do. Like it was almost like I had to like I like it just got unsmart. I don't want to say the opposite work because it's not cool to do that. But it was it had become unsmart basically. And this specific example was SSH into this known machine we're working with.

Starting point is 00:15:24 Like it has a name in my hierarchy. Like it's clear. There's context for what this machine is, how we've been interacting with this machine and how it would work. And I said S is H in this machine and just check on. the memory state because this is something we had been doing. And it suddenly says, I can't SSH into machines. I'm not able to do things like that.

Starting point is 00:15:46 And I'm like, no, no, no. Your Claude code, that's your first job as Cloud Code is to be able to traverse anywhere I can go in my terminal sessions. Yeah. And you should be able to S. And because I've got a, you know, I've got a SCH key and this is my own machine. And so this is a known thing. And I had to remind it was like, oh, you're right.

Starting point is 00:16:07 Right. Oh, you're right. I can do this. And so that's the very specific example versus just this. I feel like sometimes there's drift in its ability. And that could be a clod thing. It could be maybe it was a certain time and usage was peaked. And maybe these models get less smart whenever there's peak usage because there's maybe less memory to go around. I have no idea. I feel like that part of the world in the in the usage is so black box. And maybe it's black box. honest to you too, but I feel like in that case, I was like, you know how to SSH and yes, you can, is like, oh, you're right, I can. Yeah. Yeah, it's interesting. I think there was like, I don't know when this is going out, but there was a recent report that Anthropic posted where they were, apparently they had rolled out

Starting point is 00:16:54 like a quantized version of the model over the past couple of days, which actually did yield degraded quality. And so this is kind of like confirmed the conspiracy theories in people's minds where it's like, you know, are they nerfing the model quality? And it turns out like they had rolled out some changes that they would, you know, aimed at improving efficiency

Starting point is 00:17:14 but did actually have a tangible impact on model quality. So I wonder if, like, you know, that happened in your case. In our case, you know, we do use the Claude family models heavily underneath the hood, but we have a couple levers that we can pull that can help address this issue. You know, one is we actually use multiple inference providers that provide Claude.

Starting point is 00:17:38 And there's actually periods of time where, like, one provider will have different, like, up-time characteristics. So, like, if the model is completely down from one provider, we can switch over to the other and stay online

Starting point is 00:17:50 while other people are using tools that are tied to just one provider can't use that tool. And that also goes to the model quality, right? So, like, if we notice quality degradation from one provider, we can cut over to the other, and still be consuming the model at sort of like full fidelity and full quality.

Starting point is 00:18:10 And then the second lever that's just now coming online is we're starting to play around with more and more additional families of models. So we already make use of a variety of models for specific use cases and capabilities within AMP. And we're also constantly trying out different models as the main agentic driver. And I could see it like a future where if we start to notice, you know, degraded quality, or higher error rates from one model family, or maybe it just goes completely offline, we could cut over to another model family

Starting point is 00:18:43 as a fallback, provided it's good enough at the core capability of driving the agentic workflow that we need. Let's talk about, I guess, how it works, if you don't mind. Can you... Your CTO source graph, so you should know these things. At least the read-me version, I'm assuming you're super deep.

Starting point is 00:19:05 I don't want to assume a lot of stuff, but I figured your position gives you the ability to go probably as deep as anyone might be able to, aside from maybe the core team and literal, you know, implementers who are working on this day to day. But give me as best you can to the world, you know, to the public, how AMP works from the architecture at like a hosted service level. You talked about being able to determine degradation.

Starting point is 00:19:32 Give me as best you can to lay of the land that can be public. Yeah, so I'd say there's no weird rocket science here. I'd say like at the core, at a very high level, AMP operates the same way that every other agent does, which is the way I'd describe is it's a four-loop wrapping an agentic L-LM. So you take an LM like Claude or GPD-5 or any of the other like agentic LLMs that are now online,

Starting point is 00:20:02 you put that in a four-loop, And what the loop looks like is you take the user input, you feed it in the model, the model will come back with some combination of tool calls and like written response, like, you know, thinking things out or responding to the user query. And the tool calls are just text. They describe like, hey, you told me that you had access to these tools. You know, maybe it's grep, maybe it's a read file tool, and maybe it's a tool to edit a file. I want to invoke the tool of these arguments. and then our application logic goes and executes the tool call. So, like, given the spec that we get from the model,

Starting point is 00:20:39 we execute the tool, we get the response, and then that response gets fed back into the next iteration of the loop. And that just keeps looping until there are no more tool calls, at which point the model generates a final response. And that's usually either, like, the answer to the user's question, or it says, I made these edits to the code in these files, and these are the changes that I made. gives you a summary of those changes.

Starting point is 00:21:02 So at a high level, every single agent in the world follows that architecture. It's like the agentic equivalent of the ChatGPT wrapper architecture that was so prevalent, you know, a year 18 months ago. Like beyond that layer, you know, there's nuances. There's nuances in terms of like what tools we provide, the models, the different prompts that we use

Starting point is 00:21:25 for the main agent, and then there's also sub-agents. So sub-agents are a special class of tools where you call a tool, but underneath the hood, it's just its own, like, nested for loop. So it's its own, like, agentic loop doing a more targeted task. So you could have just like an agent calling itself in sort of like a recursive fashion or might call a domain specific agent. Like we have an agent that's dedicated to code-based search, which has been optimized for finding and locating relevant context within the repository. And so there's a lot of tuning that we do to make sure that all those pieces operate well together, that we can

Starting point is 00:22:04 kind of eliminate the dumb class of errors like, you know, oh, I can't SSH into this thing, or I can't read this file. Smooth all those out because those are big disruptors to the user experience. And then we find pieces to fill in the gaps for, you know, what blocks the agent from getting further or what helps it correct, course correct itself when it doesn't immediately do the right thing off the bat. And then there's sort of this client server architecture that we have where part of it always has to be on the server

Starting point is 00:22:36 because the best models are still server bound right now because local models have not yet gotten to the point where they're fast enough to run locally. But in addition of that, AMP also has service side store for like all the threads. So we call

Starting point is 00:22:52 the agentic interactions that people have with AMP threads. So every kind of like session you have with AMP to accomplish a task is a thread. And all those threads get synced to the server. And the benefit of that is that you can go on AMP and see the threads of all your teammates and how they're using AMP, which is really helpful because coding agents, using them effectively, in our view, is a high ceiling skill. Like you said before, it's not trivial, right? Like there's certain things, especially if you're a newbie, where you expect it to be able to do something, you know, especially if you've come in consuming a lot of the hype around AI. You're like, oh, you know, it's magic. And then it doesn't do the thing. It doesn't read your mind. And so it's really helpful to have this team-wide view of how other people are using this tool to great effect, because then you can go and like learn what their best practices are. So that's another big component of the AMP architectures. We do have the service side component that stores a thread information.

Starting point is 00:23:56 so you can share with your teammates and also you can go back and revisit previous threads in case you're like, oh, like what was that thing that I was learning about like a couple of days back, let me go up and look that thread and see what actually what it answered or what it actually did there.

Starting point is 00:24:14 Is the thread simply kind of a chat history? It's not really context. It's just more what was printed back and forth kind of thing. It includes every single conversation from the assistant or it includes every single message from the assistant or the user and all the tool calls and all the tool results so it's basically like the entire interaction it's like here's what you asked to do yeah it's like a transcript here's what you asked it to do here are the tools that

Starting point is 00:24:39 it called here's the results of those tool calls like I read this file read that file it listed to this directory it made this edit and then you asked it to do this and then it's just like the entire message history interesting so you got client server, you obviously have your own CLI, so you can install it I think, can you install it via brew on a Mac? I can't recall if I did it if it was via MPM.

Starting point is 00:25:06 The preferred way is via NPM for now. Gotcha. Okay. So you've got that client architecture, that is your own CLI, which by the way is just stunning. It's beautiful. I love the way y'all did that. Thank you. Yeah. Really love the nod to NeoFetch.

Starting point is 00:25:22 At least I think it was a nod to NeoFetch, the opening splash screen. I could be wrong. The orb? Well, you know, when you launch, you know how you launch Amp. So when you first, for the users who may not, or listeners who may not, I'm calling them users. Y'all are listeners still yet. You're not users yet.

Starting point is 00:25:39 But, you know, when you launch Amp, if you've ever used NeoFetch on Linux, when you NeoFetch on any given box or machine you've SISH into, it essentially gives you this sort of bigger logo on the left and a list of details on the right hand side. that amp splash page reminds me of NeoFet. I'm not sure if there was a nod there or homage there. To my knowledge, no, but I'm also not the one who created that splash page. So I don't know if that was an inspiration. I think we're just, like, part of what we're trying to do, I think like command line coding agents are all the rage right now for a good reason. I think, you know, it's super versatile.

Starting point is 00:26:19 And I think now in part because of, of AI, people are really pushing the boundaries of what you can do inside a terminal-based UI. And so now there's all these, like, great new libraries and frameworks that are doing things that are visually, like, stunning inside the terminal. Whereas, like, I don't know, two years ago, you asked me to describe, like, the prettiest command line tool, and they all kind of look the same, right? It's all mostly text. Each top, maybe. Yeah, yeah, exactly. But now, I think.

Starting point is 00:26:53 because the terminal is so versatile, right, like you can deploy a coding agent in a huge number of places if it's in a terminal form, right? Because you don't have to install, like, a graphical interface. You don't have to, like, go through any of the loopholes you have to do. Or not the loopholes, but you don't have to jump through any of the hurdles to set up the coding agent if it's based on the command line.

Starting point is 00:27:18 And so I think we, along with everyone else, are trying to make AMP as pretty as possible in the terminal. Because today you want to, yeah, you want to use something that's delightful, right? You know, there's a lot, I can gush about a lot, but I think that just the animation of that orb initially was really cool. And I think it's changed over time. I can't tell because I never snapped a perfect memory shot of it in my brain. But I think it's changed over time and it's gotten different colors and maybe even the shape

Starting point is 00:27:47 has changed. But it's cool. It's gotten better. And I do want to call out that, so we just shipped a new terminal UI like yesterday as of this recording. And it's actually using a new terminal UI framework that was built in-house specifically for AMP. And one of the things that's immediately obvious if you use it is that there's no flicker. So like we're using ink beforehand, which is very popular. And, you know, frankly, like very, very great framework.

Starting point is 00:28:14 But one of the things that people notice with ink is that it flickers a lot. And the Flickr gets worse in some terminals over others. So, like, if you're using it inside T-Mux, as I often do, like, the Flickr is very noticeable. And that's something that we're able to eliminate as a very non-trivial technical task. And special shout-out to Tim Culverhouse on our team because he basically just joined the team.

Starting point is 00:28:36 He's, like, a terminal UI expert. He did a lot of work on Ghostie, that terminal, working with Mitchell Hashimoto and folks there. And he's just been, he basically rewrote the entire terminal UI and it just launched. And it's really solid, I think. This date, just so everybody knows, we're recording on September 3rd. So yesterday it was September 2nd, if you can't do calendar math. I did it for you there.

Starting point is 00:29:08 I'm not sure if I noticed that the UI has changed much, but I have noticed the Flickr, and I thought it was because, like, anybody doing this, I've, I don't think I've shut my machine down in, like, a month or so. I'm just afraid to shut it down. I'm kidding. I've shut it down since then, because I've learned. But, like, for a while there, I was like, because of threads, like, going back to this thread and the transcript, I was, you know, because the things you're putting into it is, like,

Starting point is 00:29:41 as you said, it's a skill set that you learn. And so I feel like as you get better and better, you want to understand what you did a week ago or a session ago or two days ago or how you frame something so that you can keep doing a version of that or build upon it. And so this history to me is really important. And I've been like collecting little text files basically because I didn't realize or, you know, I don't think everybody has your gumption where, hey, these threads, these transcripts are kind of important to Spalunk.

Starting point is 00:30:13 If you're a week or so past or what the, the true conversation was or what, you know, the user asked or the, I think you called them, not an agent. The agent is the thing, but what did you call the person? You had a name for the person. Oh, I usually just say the user or the user. The user. Okay, cool. You know, me as the user, I was like, you know, I've been collecting little prompts and little things like that. Like, nothing major, but just things I'm doing frequently.

Starting point is 00:30:40 Like, you know, I've come to this, you know, my usage pattern has gotten to this point where I feel like I need to give, I need to define roles similar to how you build out a team. I have a Linux platform engineer or a, you know, a Go style enforcer who understands really good Go, idiomatic Go code. And then a style guide for my Go code. And then I have a, you know, a Go style reviewer, because you were talking about code review before. And so I will give them different roles, and I feel like that works for me. And but as part of getting there, it was condensing more and more of what I would kind of keep is this massive prompt. And I would just sort of move that larger prompt for me into a history file of sorts where this is a role, assume this role, here's your task, and then, you know, go forth with that new role. And I feel like that gives it some really good context versus.

Starting point is 00:31:40 some magical prompt, I've been carculting again and again and again. So I've learned a skill set, if you can call it that, to better be an agent babysitter. Yeah, I think, you know, I talked to a lot of users and a lot of people have some variant of that where it's almost like they've built up a library of prompts for specific, you know, task types. I think a lot of people have built up some sort of template for listing out. It's like a combination of like here's how to structure your plan. First of all, like generate a plan at all. If you want to do like a longer running task, generate a plan, here's a couple pointers about how to structure that. And then maybe there's a couple pointers about the codebase

Starting point is 00:32:29 as well. We have introduced this thing called agents.m. I think it's like pretty standard now across many coding agents and now there's finally a standard. I think more and more people are hopping on to this using the agents dot MD file as a standard for context. So, you know, Ample consume that,

Starting point is 00:32:49 but a lot of people add additional context beyond that that are, you know, maybe not specific to the code base, but specific to their like personal, like the things that they personally feel they want the agent to do. And so they'll copy and paste that in to sort of like steer the agent to do

Starting point is 00:33:05 and behave the way they want it to behave, especially over these longer running tasks. How do you feel about the user experience, the way a user interacts? So you said you interact with a lot of different users who have different ways and all that good stuff. I've been working on a, since I've been building my skills to be an agent babysitter.

Starting point is 00:33:27 I've kind of come to this idea. I'm calling it Agent Flow. It's an internal name to my own brain. I'm a team of one. You know, I've got a team around here, change all. But the things I'm working on are just sort of like fun tools that solve my own itch and my own problems. And so my interactions is not team-based.

Starting point is 00:33:44 You know, so I'm not leveraging your threads and sharing those threads with other people. It's simply just a team of one doing it. And so I've come up with this idea of agent flow. I would say the protocol is called document-driven development. And the implementation is called agent flow. How do I get agents to understand how I want to flow with it? You know, we vibe and we flow. And so I'm like, okay, agent flow sounds kind of cool.

Starting point is 00:34:06 I know that's trademarked or not, but I like it, okay? I've kind of given you a glimpse of it where I define roles. I obviously leverage your agent file. I think Claude has its own version of it. And so, you know, maybe you can centralize it to just simply agent versus clod.md. But essentially, how do I want my agent to know about my codebase, to know about the things, the tasks I'm giving it? Yeah.

Starting point is 00:34:31 And, you know, obviously, which role to take on, which had to put on, so to speak, and maybe a style guide of sorts that represents how do we write go code or how to write rust code? Like, what is the blessed idiomatic way of doing that? And then I think the additive to that that really sort of solidifies agent flow for me, and you sort of mentioned it was this, how do you give it its list? How do you give it its direction? And I just barred it from the great language of Python. They have PEPs, Python enhancement proposals.

Starting point is 00:35:08 And they have a whole system around drafting PEPs. And so I took the same acronym PEP, and I just said that this is a project enhancement proposal. And so for every new thing, every new idea I have, it begins as riffing or vibing as you would. But it's all about developing this document called a PEP. and the peps have numbers so it could be pep dash zero zero three one uh maybe i'll need 10 000 at some point but for now i feel like you know four you know numerals there is is just fine but that's the essence for the most part is like draft peps so that i understand full context let's let's dream and build and think and that thing has status is it has all the artifacts that pep lives in its own directory uh you can the agent can add more

Starting point is 00:35:58 things like maybe it completed something. So it's a completion report. We complete this PEP, but don't just update the PEP document. Give me a whole separate document that is about the completion of it. And how did it work? So I can go back and learn and read later on. You know, you even have things like knowledge base. So I have it draft knowledge-based articles.

Starting point is 00:36:20 So once we're done with a PEP and we've done X, Y, or Z, not just the completion report inside that PEP, but more like, what do we learn? that is now an institutional knowledge for you as an agent or any other agents or any of the team members I bring on later on. Even internal blogs or what I'm calling builder logs. It's still blog. It's B-L-O-G-S plural. But I'm calling on builder log. So this is more of an internal free form. But just go and dream what happened here. And I'll go back and read those as my agent babysitter hat. And I learn with it how to direct it. And so this entire. flow that I've sort of like learned as a skill set I've been calling agent flow. What do you think?

Starting point is 00:37:04 Yeah. I think that fits with the way that a lot of people use AMP actually. I think the more sophisticated intentional users have a very intentional design process that often starts with the human generating a lot of tokens describing what they want. And sort of like the length of the length of the complexity of the tasks that you want it to handle, there's kind of like a linear correlation between the complexity of the task and how much direction you want to give it. So if I'm doing a thing where I know exactly what I want and I want the agent to get as far as it can on its own, then I will do things that are very similar to what you described. Like, hey, I want you to generate a plan. It's got to have

Starting point is 00:37:53 these properties. Make sure that each step is annotated with the relevant context files that you'll want to address when you're actually implementing this plan. And maybe I do a couple iterations on the plan itself, you know, if there's any course corrections, like, oh, you know, use this technology, not that technology, or use this library, not that library. And then once that plan is generated, then you have another thread that then it goes and says, like, okay, go and execute this plan or do like steps one through two of this plan and then let me review it. And so that's a very interesting. potential workflow that there's a certain set of our users that are all about trying to do as

Starting point is 00:38:36 much as possible in a fully automated fashion. And the ones that are trying to do that are essentially doing exactly what you're doing. They're trying to build these agentic, they're trying to build this workflow around like the human being, the product manager describing it at a high level what they want and the agent essentially like filling all the details. I will say that that is an important set of use cases and I find myself doing that from time to time but there are a lot of other workloads that are like I would say like me personally

Starting point is 00:39:09 I run threads using the workflow that you described like relatively frequently but not like not necessarily every day because oftentimes especially now a lot of the work they do is more exploratory there is a different workflow that ends up being like you get less out of each thread but you probably end up doing it more frequently because it ends up being like less mental energy up front and also a better fit for use cases where you're in more

Starting point is 00:39:42 like exploration mode or you're trying to learn alongside of what the agent is doing. So like there's tons of development tasks where like I myself only have a half-baked idea what I'm looking for, right? It's like, oh, you know, maybe something, maybe a sub-agent that, you know, uses a different search tool would be interesting. But, like, do I have, like, a clear, exact vision for what that should look like? No. Like, I want you to first do, like, a spike.

Starting point is 00:40:07 And so in that case, it's much more informal. Like, I'd say if you look at most of my threads, most of them don't involve a specific, like, plan generation step. It's more like, hey, where's the part of the code base that pertains to this? and can you give me an explanation of how it currently works? And then I have this idea. Help me think through a couple possibilities

Starting point is 00:40:26 of how I would go about implementing this and what frameworks I'd use. Okay, great. Can you do a quick spike? Don't implement everything yet, but just these endpoints because I want to play around with it. And so it's kind of like ping pong back and forth

Starting point is 00:40:37 between me and the agent where part of the side effect of getting it to do stuff is that I'm kind of like learning and acquiring domain knowledge alongside of it. So, yeah, like most of my workflows are probably somewhere between

Starting point is 00:40:51 those two points along the spectrum. It's like either very like, you know, casual, you know, still thoughtful prompts, but, you know, not following a specific structure all the way to, like, I know exactly what I want because this is very similar to something I've done before or like the features is very clear in my mind, in which case I try to like front load more context and see how far I can get it without any intervention or correspondence from me. What's up, friends? I'm here with Kyle Galbraith, co-founder and CEO of Depot. Depot is the only build platform looking to make your builds as fast as possible.

Starting point is 00:41:40 But Kyle, this is an issue because GitHub Actions is the number one CI provider out there. But not everyone's a fan. Explain that. I think when you're thinking about GitHub Actions, It's really quite jarring how you can have such a wildly popular CI provider, and yet it's lacking some of the basic functionality or tools that you need to actually be able to debug your builds or deployments. And so back in June, we essentially took a stab at that problem in particular

Starting point is 00:42:10 with depot's get-up action runners. What we've observed over time is effectively get-up actions when it comes to actually debugging a build is pretty much useless. The job logs in GitHub Actions UI is pretty much where your dreams go to die. Like, they're collapsed by default. They have no resource metrics. When jobs fail, you're essentially left playing detective, like clicking each little drop-down on each step in your job to figure out like, okay, where did this actually go wrong?

Starting point is 00:42:38 And so what we set out to do with our own GitHub Actions of Observability is essentially you built a real observability solution around GitHub Actions. Okay, so how does it work? All of the logs by default for a job that runs on a Deerbubbates. Depot GitHub Action Runner, they're uncollapsed. You can search them, you can detect if there's been out of memory errors. You can see all of the resource contention that was happening on the runner. So you can see your CPU metrics, your memory metrics, not just at the top level runner level,

Starting point is 00:43:06 but all the way down to the individual processes running on the machine. And so for us, this is our take on the first step forward of actually building a real observability solution around GitHub actions, so that developers have real debugging tools to figure out what's going on in their builds okay friends you can learn more at depot.dev get a free trial test it out instantly make your builds faster so cool again depot.dev I think what you just described to is the exact way I work as well and the way I would change or maybe even suggest you to and maybe you do this is I'll what I call riff I

Starting point is 00:43:49 riff for a little bit with it, you know, I'll explore, you know. I might ask a question. I might ask it to help me think through some things. And there's some this, there's some back and forth on thinking that I just call riffing. And everybody calls that riffing, right? Yeah. But at some point, there's some clarity that begins to form. Sometimes it even begins to do some of the work. I'm like, okay, I kind of like that. But hang on, pause. Don't do that. Now that we have a little bit of clarity, draft this pep. That way now all this riffing has a place to put things. So it has a drawer or a folder to put stuff into. Sometimes if I even have a half-bake idea, a half-bake idea, I will still, like if I'm in flow and I'm thinking

Starting point is 00:44:36 through something and it's working and it comes back, I might just say, quick idea, throw this into a pep, but let's continue. That way, it's sort of like a to-do list. And I'll come back and I review what I'm calling my peps. And it becomes like this place where eventually it has a home and these peps have statuses.

Starting point is 00:44:57 So I literally just borrowed Python's statuses. You have final, you've got superseded, you've got all these different flavors of statuses. And I'm like, I've already thought through this all. Let's just borrow their ways, right? I love that, yeah. And so I just kind of like keep

Starting point is 00:45:13 these peps as like this constant I don't want to call it a junk drawer necessarily but it's a place to put something I'll riff for a bit I'll explore for a bit but the moment there's clarity or any sort of visibility I want the model to kind of put all the context we've gathered into this thing

Starting point is 00:45:30 I don't even go back and read it right away because I just hope I suppose hope and maybe this is where I'm rubbing the the lamp a little bit hoping the genie does it is put this idea there with enough fruition to put the right things there.

Starting point is 00:45:45 So when I come back to it, I can go deeper. And we can turn this into a real deliverable or real spec. I treat that as like that. Yeah. Do you commit those to the Git repository? Are they in some like temporary folder? I mean, right now it's not open source. I've wondered about that.

Starting point is 00:46:01 Like should all this be there? And I feel like maybe as, as the one of the earliest humans living in this three and a half, you know, almost three year old world where these. This new phenomenon is on my seal, on my command line. I almost feel a requirement to future humanity to commit this to memory in a way, you know? So I'm not really sure. I'm in the blurred line there whether it should or shouldn't. But I think from my use case, it makes sense to have it all in the same repository,

Starting point is 00:46:34 to have it all in the same place. My agent flow, the way I describe it at least, puts all this stuff, not in the root, but in a folder call. Admin. One, I wanted to be at the top of the stack when I look at my, I still use Z. I still use a code editor, but it's mostly a code viewer for me. I'm editing some things. I'm not doing a lot of coding in there, but I still do some stuff, but it's largely, you know, pushed to the, to the model, to the agent. I put all that in an admin folder. So you got admin slash PEP, admin slash knowledge, admin slash builder logs, you know, all these different things there, or even the role files. This all live in the admin

Starting point is 00:47:13 folder, which is sort of like me as the product owner, CEO of this directory that is now a project, that's how I look at it. Like, this is all admin stuff. And I put it there. Yep. Makes sense. I like that.

Starting point is 00:47:29 I like having kind of like a, it's almost like an audit log for how the code was generated. Absolutely. I feel like that might, it might be an area of untapped potential there because I think one of the things that would be helpful, especially if you're going back and reading it later or if someone needs a review that code is to see the kind of audit trail of like, oh, okay, like, this was the

Starting point is 00:47:52 plan that generated this code. And this is what the high level intention was. If anything, like, that's almost, you know, if you could stab your fingers and have that included as part of every, like, pull request, you almost want to do that because it gives you just, I feel like a lot of time in code reviews often spent trying to piece together what the high level intention was, especially if the person on the other end didn't, you know, underspecified what they're trying to do. And so you almost have to play like archaeologist a little bit and trying to like piece together what the high level intent was and where to start. And if you get access to the plan, then it kind of gives you a pointer right into what parts of the code you should read first. Well, I was moving so fast, I was like, I've got to capture this exhaust. I don't want to slow down or read it, but I want it to get created.

Starting point is 00:48:44 So when I can slow down and think about some of these things, it's there. And so maybe the way I looked at these builder logs, essentially, was that they were the what and why. You know, the long-term knowledge, the institutional knowledge went into a knowledge-based thing. And they may be connected, but the intention is different. In a builder log, I kind of want the agent, the developer, to tell me what they encountered along this journey. What thorns did they get stuck on? Where did they get cut? Where do they apply a Band-Aid?

Starting point is 00:49:15 How do they apply whatever? And what are they trying to do? What were the blockers? What were the learnings in there? And some of that stems back into a maybe knowledge-based article. The knowledge-based article is more like how to use it, why we use it this way kind of thing. Whereas the build-a-log is more like the journey of the developer during. implementation. What are all the things that you encountered there? And honestly, it's been

Starting point is 00:49:40 paramount, I think, to, to, like, I don't want people to think that I'm sitting here, like, literally, like, glued to my screen, watching this thing constantly. This flow lets me become a babysitter. It lets me hang, hang with my kids while doing this in the background. My computer's over there. I go check on it every 10 or 15 minutes. Push yes, push no. You know, commit that to memory kind of thing uh you know i can meet i can be making dinner out in the backyard barbecueing whatever you know and still babysit this thing with this kind of flow so it's it's sort of allowed me the ability to when i want to keep maybe making progress because i feel like this stuff is just so i don't want to call it addictive but tantalizing like i've never been

Starting point is 00:50:26 able to do this kind of thing at this level of x 10x 20x productivity level ever in my life right yeah it's it's absolutely incredible and i think one of the the first like what's like wow moments or like uh points of realization for me with with amp too it's like this is very early on this like back in like um probably like april or something it's like i had one of these weekend experiences where i was taking care of our kid we have like a two year old at home so he's like running around he's an agent of chaos and so it's like it's very difficult to to be in the same room with him and do anything for longer than like two minutes at a time. And I was literally running back and forth between like playing with him and trying to like get the spike of this

Starting point is 00:51:11 like feature up and running. And I ended up like getting to a point where I'm like, oh, this is actually good. And I'm going to push this like over the course of like a Sunday. And it's like without the benefit of this tool, it would would not have happened. Because, like, in order to write anything trivial, you need kind of like that mental space to get into flow and to, like, think through the things and, like, page in all the context. And every interaction, like, resets that. Whereas with the agent, if the thing that you're doing is, like, within the capability of the agent and you're able to remain at that high level, you can absolutely do this, like, async thing where it's like, okay, like, let me tell you what I want you to do. Let me give you some detail. let me maybe give you some pointers for how to validate it

Starting point is 00:52:01 and verify that you're doing the right thing and then I'll fire it and then I'll go and like do something else and then when the like amp has a built-in notification sound it kind of like pings you. It's like a solar sound. And it's almost like Pavlovian now where like I will hear it and I'm like

Starting point is 00:52:18 oh like it's time to go and like check on what the status of that thread was and we'll go see what the agent hath wrought and see if it's good or if it needs more adjustment. I found that AMP is the one that I can most easily set loose on a task. I told you I've described my pep ways, you know, and so everything that is long running is not me prompting it necessarily.

Starting point is 00:52:46 It is, you know, this riffing to a pep that has clarity, it's got, you know, conviction, it's got pros and cons. It's got whatever context is necessary. even if my team was formed with humans it would be it would be you know it's human readable it's English it's yeah it's not some random language that is hard to understand it is yeah it is just normal way to do it I find that that methodology to me lets me not have to keep the spaghetti together in my brain yeah it all kind of falls there and it doesn't really matter what it is, it's just in that flow and it just sort of works.

Starting point is 00:53:32 Yeah. I like that. Yeah, I appreciate you saying that. I think, you know, I don't know if we have, we definitely do not have like one weird trick that makes it work better in certain cases than others. All I can say is that we are very heavy dog fooders of our own product. Everyone on the team is using AMP to write probably like north of 80 to 90% of the code that they generate.

Starting point is 00:53:57 And as part of that, there's just a lot of iteration that we do on the prompts, on the tool definitions, on the way that subagents combine that I think has compounded over time, you know, every day and week of development usage, has led to an experience where it is able to get farther than some of the other coding agents out there on these long running tasks. One thing I've done recently, and we can keep going deeper on how we use it if you'd like to, I'm having fun with this, is I told you about the roles. And so rather than just to find a good go developer role, let's just call it, right? Or a I need to deploy a Linux, I need a Linux platform engineer kind of role. Yeah. I got into this place where I'm like, you know what, I'm just tired of repeating myself the same things. It's kind of got a little boring. So I was like, well, let me create a product manager.

Starting point is 00:54:51 And so now I've taught my product manager what I think I know, at least the way to prompt it. And it had become at least in a long-running pep where we had a clear goal, it had multiple phases. Yeah. And again, I'm not trying to be glued to my screen. I'm not trying to be stuck to this black mirror. I'm trying to walk away and do my life or move to a different tab and keep doing my real job, which is not writing software at all. Like my job is not writing software necessarily. it is talking about software or, you know, defining relationships or nurturing relationships or all the things I do, you know.

Starting point is 00:55:28 And so when I had defined this product manager role, especially in this long runny PEP, I would have the product manager review the output from the other agent. So I had two agents going at once. In this case, one of them was AMP and the other one was Claude Code because that one's cheaper for me. You're really expensive, by the way. Well, we do the usage-based pricing thing. I know. I know. It's still, it's costly.

Starting point is 00:55:55 We can talk about that. I think it's costly. And that's okay. I'm not fouling you for it. But AMP is, I think, I'll pause here and just say, I think AMP's one of the best, if not the best agent out there. And I think your cost is justified. I just don't like it.

Starting point is 00:56:13 So. We wish we could offer intelligence at, had a cheaper rate. I think certain things are just, you know, dependent on the, you know, the models where they are, what size of model you need for a certain level of intelligence and the cost of GPU inference. One of the things that we haven't done that others have done is price amp at below cost. And, you know, that's for two reasons. One is we are in this for the long term, we've been building developer tools for more than the past decade. And so we want to build a sustainable business around this. And I think that a lot of the other tools that I think

Starting point is 00:56:55 there's this notion of like, hey, if we can sell a dollar's worth of tokens for like 70, 80 cents to grab market share and that somehow that will mean that we can lock in users to our coding agent. But I very much don't think that's the case. Like coding agents have very little lock in. very easy to switch and try like a new tool. It's so easy, yeah. And so I don't think it's a good business decision for us. And two, it also introduces a perverse incentive. So you brought up like model quality degradation earlier in the conversation.

Starting point is 00:57:31 And one of the things that we never want to be in a position of having to do is like nerfing the models that we're using because we need to keep the costs under whatever flat rate price. most of our users are paying. So I think that, look, you're right. Like, all things considered cheaper is better. I definitely feel the same pain as a user. But at the end of the day, I think we're making the right trade-off in terms of what's sustainable and what incentivizes us to build the best quality. And I think the real trade-off here is not, you know, whether you're paying X or Y dollars

Starting point is 00:58:13 for the coding agent, it's how much time you're saving as, as human, and what more you can build. And if that's your barometer of comparison, then the difference in cost of agents is really like a rounding error in terms of the time that you're saving and the additional value that whatever your building can bring. Provided what you're building is something that you can put an economic price tag on. Yeah, I think my frame of reference and maybe even me justifying my comment. So if I was a business who employed multiple people with multiple salaries and I can augment them or enhance them, then yeah. In that case, I'm already paying large salaries. And so

Starting point is 00:58:58 this is a nominal additive to that existing metric. In my case, I'm just tinkering. So it's expensive for tinkering. They're not open source yet, but I plan to open source. The tooling, I'm just like literally just trying to play. I'm trying to explore, trying to learn. I wholeheartedly believe that what you said earlier, which is that playing with command line level agents like AMP, like cloud code, like, you know, codex, et cetera, is a skill set to be learned and flexed. If you throw it five words because you think it knows everything, you will get a website that doesn't do what you think it should do. It won't be using the language that you should be using. It won't have the deployment target you think it should have. you need to be specific just like you would

Starting point is 00:59:45 with any other team member like the context is still and I've actually learned more about how I think I could be a better human by how I interact with these machines and I'm just going to call machines that's just the simplest term to use because you realize how much

Starting point is 01:00:00 I've always harped on the idea of of expectation and clarity I can't I can't value beyond for not doing X if I haven't described X to you. I have to describe X to you so that you have clarity on what I expect from you.

Starting point is 01:00:21 And if you don't, and if we have an agreement on that clarity, my expectation is for you to understand that agreement, come back with that, you know, that expectation of sorts. But if I haven't given you that, how can I expect you to come back with any version of what I want if the clarity hasn't been met? So that's how I've always been, you know, how I've been, I think, a better human in my life is having the understanding. But I think it's gotten even more clear to me how important that is, how important context is because just like with freem of reference here, I called you to expensive. Necessarily you got defensive. It's good. You should do that. You defended my fact that is a fact to me.

Starting point is 01:01:00 But then I clarified my frame of reference is that I'm not already paying developers. And so this is a. cost center for me because I'm just trying to play and tinker. And so I feel like, so just to give that there, I think it's taught me how to, how important context is in all relationships, whether machine or not. In all aspects, how important is clarity? If something is not clear, how can you come back to me with frustration, anger, disappointment? If the expectation of what you wanted from me was not clear and we didn't have some baseline of an agreement that's what you're asking these machines to do right you're asked them to agree to make you productive agree i mean theoretically agree right my opening

Starting point is 01:01:47 you know pushing amp into your into your command line terminal or whatever that's the agreement right um yeah i kind of pause there because i i just feel like we kind of went on a little bit there but the frame of reference really is like i think that amp is the way i would leverage the way i think of AMP in my own workflows. I don't have the cash flow or the cash to make you my daily driver. But what I can do is when I know I have a longer running really important thing that requires deep, very good execution, I will draft a very clear pep, as I've already described to you, and I will set AMP loose on it for a little bit. Not the whole thing, or I'll define it very clearly. That's where I'm leveraging AMP personally because of that reason, of those

Starting point is 01:02:32 reasons. Yeah, well, first of all, I'm curious as to, like, how you're prompting AMP, because so, you know, it's one thing if you're using AMP heavily and you're, like, coding, you know, a full work day every day of the week. In that case, like, if I look at my usage, I'm probably racking up maybe on the order of, like, low hundreds of dollars per month. But if I look at just for, like, by weekend, like, side projects, probably, for me, at least, it probably comes, like, sub-100.

Starting point is 01:03:02 You know, in some cases, like, if it's something simple, it's like, you know, a couple dollars here and there. It's like for the price of a coffee, I could get, like, a mini app that, like, scratches a niche. And so for me, it's like, yes, like the, if you use it heavily, at some point, the, the token cost will exceed whatever, like, you know, flat rate folks charge. But in terms of, like, what you're getting out of it, I feel like I'm using it. I use AMP on my personal credit card for personal side projects and it's not really like an issue for me it's like for the price of eating out

Starting point is 01:03:41 that probably covers like a month's worth of personal usage for me but I do see like some users I will say this like the way that people prompt and the behavior patterns around when people start new threads is very different and we all

Starting point is 01:04:02 almost want to put out like a guide for how to use tokens more efficiently. Here's one thing I've noticed. So like if you talk to like very senior, if you're talking to like very senior engineers on our team, you look at their threads. Most of their threads air on the side of being very targeted and like short and sweet. You know, not like trivially short, like not like one message and then you're done. But it's like I want to do a very specific thing. Go and do it.

Starting point is 01:04:30 Okay. On the next one, new thread. because it's a new task. Or, you know, like you implemented step one of the plan, now you're done, step two, new thread. What I noticed among the, like, people coming from non-technical background, so we have a ton of people on our go-to-market team, for instance, who are building, like, side projects and, like, games with AMP.

Starting point is 01:04:50 I look at their threads, and it's like so-and-so, you got, like, 200 messages. It's like a thread that's, like, 200 messages long. You fill up the context window. You know, now the underlying model supports a million tokens of context. each incremental, yeah, we have prompt cash and all that, so it keeps the cost low to a point. But it's like if you filled the context up to the point

Starting point is 01:05:12 where you're occupying hundreds of thousands of tokens, first of all, like each incremental request, you're not going to get the highest quality because there's a lot of stuff in the context we know that could confuse the model. And you're paying the cost of like consuming as input the entire previous, you know, 100 messages in the thread.

Starting point is 01:05:34 And at that point, it's almost like, you almost want to, like, ping people. I'd be like, hey, you know, I know that, like, this may be the thing that feels most natural to you, but I would actually recommend you should treat threads as like these sort of like one and done, like rip off notes, you know, rip them off frequently. Rather than, like, do the whole, like, you don't need to build a whole app inside a single thread.

Starting point is 01:05:58 In fact, I would probably recommend against doing. doing that because you will get lower quality, higher latency, and more cost if you do that. Yeah, I would concur emphatically. Please teach people how to be more efficient because, you know, as you described before, it's the skill set you learn. So I'm learning. And the whole even reason why I'm even playing with it is not because I'm trying to really build a bunch of software. I'm just trying to tinker.

Starting point is 01:06:28 I've got a, you know, one particular it I'm scratching. is here it changed. We obviously produce video podcasts, right? Like it's video first. It's on YouTube. The weight of media is very heavy. When we were audio only, it was maybe a four to six gig project. When we moved the video, there were easily 15, 20, 25 gigs for every episode as a normal.

Starting point is 01:06:54 And then because we have a desire to keep the long life of these projects, we have a full system to do all this. yeah and this goes back i mean more than a decade we've been doing this practice a version iterative version of what we're doing today uh every episode has its own directory it's got its own workflows and statuses etc but when the show's done by and large it's done forever for the most part unless we go back to it remaster it or unless we go back to it and reference content within it which then we want to pull that original source uh and then extract it as a tool would let us like Adobe Premiere or Adobe Audition. Those are the two tools we use in our kind of primary workflow.

Starting point is 01:07:37 So I use a technology called 7Z. It's a pretty well-known archiving tool. It's an algorithmic format that I've been using. It generally shrinks things like that we do around half. So if it's a 20, 25-gig thing, it's in the 10 to 12-gig artifact long-term. And I will compress the entire thing. for a while there this thing was just a simple bash script to be a smarter thing on top of 7sie because I just kind of I forgot how to use I wasn't going to keep going back to the documentation of 7Z on how to use it and so I just simplified it by writing a bash script like any accurate right and then over time that bash script got more and more sophisticated and then about maybe five weeks ago or so we were on a podcast talking about cloud code and I was like, you know what, I haven't played with that enough yet.

Starting point is 01:08:32 I've been largely in this hype cycle, not really playing with it much. And I thought, okay, let me open up a cloud right now and just where can I set it free real quick in this call? And I said, just improve this bash script. And in the moment, that's all I said, like four or five words. It was not make this amazing. Here's the CLA. I want you to. Here's all the flags.

Starting point is 01:08:51 I want you to implement. None of that. It was just improve it. Show me how I can improve this thing. And I was just flabbergasted with how good. How much improvement there was to it. And I was like, oh, my gosh, I have missed, I've been missing out. I have got to go as deep as possible on this thing because this is a crucial tool for me.

Starting point is 01:09:09 I use it on the daily, if not, you know, several times a day. I'm archiving, moving things for, for us and whatnot. And I don't like to spend a lot of money on hard drive. So I want to compress and store forever. And so we have a Trunaz server here on our land. All that goes over there. We have this whole entire flow. and so this this bash script really has been my itch and so i've now developed a tool called

Starting point is 01:09:37 seven's arch or seven z arch whichever pronunciation you like and it is now my brains on top of seven z it is sophisticated enough for me to point it at a directory examine it know its media and set the compression algorithm to be best fit for what's in there nothing like that of my bash script. And so because of like this, these agents and what I'm doing, I'm able to now make that thing, uh, what it is. And so those are my, those are the things I'm using it for. It's not like I'm an everyday software developer. It's those kinds of things. Back to help me be more efficient. My gosh, yes, I've shared with you my exhaustive workflow. I think I am using way too many tokens. I'm definitely not treating threads, especially with AMP like throwaways. Because I, in my experience when the context window collapses the thing forgets everything and so I thought as a user

Starting point is 01:10:34 I'm thinking I need to keep this context window as clear as what I'm working on because if not it forgets it doesn't how it doesn't how to be what I've asked it to be for this task and I kind of get kind of get like lost in it it doesn't know what it do so shoot me some of your threads

Starting point is 01:10:50 I think one of the cool things about amps you could share your threads with other people and so like oftentimes we get users sharing their threads and saying like, hey, you know, this is how I built this or like, hey, you know, can you help me improve and understand what I could do better? Yeah. I would love to take a look at how you're using it and maybe give you, like, best practice tips from what we've learned.

Starting point is 01:11:12 For sure. I will say, like, my intuitions are almost like inverse from yours. Like, if anything, I, like, clean, starting fresh is great because you know the context window is is clean. The more things that you accrue in the context over time, the more possibility for confusing the model there is. And so, like, the quality, if anything, it's like, back when the context window limit for Claude's sonnet was around 200K, I think, like, one of the things we noticed was there's severe, you started to notice a little bit degradation, actually around like 70K and then a much steeper degradation past like 120 tokens. And, you know, it was

Starting point is 01:11:59 always one of those things where it's like, it was vibe-based. So we never built that directly into the product saying like, oh, don't go past 120 because, you know, sometimes, you know, there's a legitimate use case that does involve, you know, take advantage of a lot of the previous context. So it's a matter of user taste. And, you know, plus we're all still like learning in real time, so we didn't want to be too prescriptive. But I think maybe, you know, now we're at the point where we've seen enough where we will be offering some more visual indicators around like how much of the context has been used and where the sweet spot starts to end for some of these models. Now it's like you can go up to a million. And I think the quality has gotten better

Starting point is 01:12:39 beyond 70K now. But it's still to the point where all things considered, I prefer the kind of kind of like rip off note, let's start a fresh thing. And it's almost like the context that you wanted to remember, hopefully is captured within the files that it's modified. So it's like, you know, make this edit to the code. Now it does this thing. Okay, now let's start a new thread and say like, okay, okay, now it does the thing that I just asked it to do,

Starting point is 01:13:08 but now make this other modification. It's almost like, you know, as a human, when you get commit, you're kind of saying like, okay, I'm committing this, this is one atomic change, and now let me start from a clean slate to do the next incremental thing, where I don't have to worry about all the other stuff, all of the other changes that I made to do the previous thing. It's sort of like mentally clear to have the separation. I think, you know, the rough analogy applies to agents as well, where if you want to be like clear and precise with what you're doing, it's, It's almost better to have this habit of starting new threads for each sort of like isolated or targeted tasks that you want to accomplish.

Starting point is 01:13:55 Yeah, this is definitely something I think that is needs to be more clear because obviously for me, it was not. And here's me, you know, calling you out on the podcast, telling you're too expensive. And here's me, the user who doesn't understand how to use your thing. And so it's not really expensive. I was using it in an expensive way. I was using it inefficiently. And I would say that, like, first of all, like, we're building the product. We need to build a product experience that makes sense to users.

Starting point is 01:14:27 And so it's like never the user's fault, right? At the same time, I think what... I would say this is definitely your fault. I was joking around. Yeah, it's definitely our fault. At the same time, the tension that we notice is that, again, like, coding with agents is a high-ceiling skill. And so there are legitimate use cases that do involve filling up the context window. And so what we don't want to do is we don't want to be overly prescriptive.

Starting point is 01:14:56 Because again, like the persona that we have in mind when we're building an AMP is it's really ourselves as a proxy for like a professional engineer whose job it is to like code every day and use this stuff like frequently as like the main way to generate the code they produce. and there is this tension where like the more things that you build into the agent, the more things you build in the UI that add these sort of like, that are like prescriptive, like you should do it this way or that way. It runs in direct tension with the desire to create a power tool that lets people use it in the way that they want to use it. And so it's kind of this constant struggle where like we do see a good amount of people using it in ways that, you know, we don't use it ourselves internally. Some of them are like

Starting point is 01:15:46 weekend vibe coders, and it's, you know, they're getting value out of it in the way they're using it. It's not the way that I would use it. And, you know, I think we'll put out some, like, blog posts to say, like, here are the general best practices that we consider are good, but we also don't want to overly constrain people because, one, like, there might be a power user out there that's getting value out of it in a way that we don't realize. And two, this is all still something that we're learning in real time, right?

Starting point is 01:16:16 Like, the era of coding agents, I think we're like, what, like six, seven months into it? And so the last thing we want to do is, you know, like guardrails are good, but guardrails can also yield blind spots as the technology evolve. And the last thing we want to do is like obstruct the view of our user base,

Starting point is 01:16:38 ourselves from unlocking like a usage pattern that is really powerful that you don't see because the product is telling you to do something that we consider that is like my personal best practice but may not you know generalize to all the different users and code bases out there yeah well i think as a response to that one thing i would suggest is definitely don't change the way amp works i would say if anything add maybe a slash command that is not so much buried, but enabled if a user wants to go and read documentation and do an alternate version. So I would keep things the way they are. But I would definitely educate on how the context window works, because that is something to, to me, I have

Starting point is 01:17:26 I can assume as a, you know, a smart person. I can begin to assume, but I have no idea. And as we just described, we're all learning these things. And this is a new versioning thing. So we're sort of all learning on the fly. This is definitely a skill set to be developed, but I have no idea how the context window works. I don't know how to leverage it. So when you just said, there are times whenever filling the context window makes sense, I don't know when that makes sense.

Starting point is 01:17:56 I don't know when it makes sense to thread and rip it off and start new. I don't even know how to create new threads. I don't even know, aside from maybe, you know, back at the literal command line, inside of AMP. And Claude has this two where you can resume and you can continue and stuff like this.

Starting point is 01:18:13 So this is not a amp only thing. This is a, you know, kind of an agent level, you know, common thing. Maybe threads is something

Starting point is 01:18:19 uniquely to you, but, um, and the way you can share them and stuff like that. But, um, I'm not,

Starting point is 01:18:26 I'm not ripping off my threads and starting new. I don't even know how to create them. I do not how to list them. I do know how to see the idea of them. I assume I have to go and copy and paste it. And continue it if I want to. So that is, there's no CLI or 2E for perusing my threads and doing things at least in like a command line. So this is just early days for you.

Starting point is 01:18:47 I'm sure you'll build that eventually in there. But that context window and how to leverage threads is black box to me personally. Yeah. No, I totally get that. And it's something that we've been actively thinking about for a while now. I think we're one of the first to ship the context window meter. So just like showing you how much of the context window has been filled up. That's like an important indicator.

Starting point is 01:19:08 I think now I've seen it pop up in a couple of other coding agents. But that to us is like it's one signal that's useful to look at. Because the higher that percentage goes, the more costly each additional message becomes. You take a latency hit as well. And you start to see quality degradation at some of these key falloff points. I will say for the folks listening, you know, like my personal best practice for using agents is again, Like, the analogy that I draw when explaining this to people is the context window is sort of analogous to the human brain's working memory. So, you know, it's like, yeah, there's the old adage, like, you can't think of more than, like, five or seven.

Starting point is 01:19:51 I forget what the exact number is, but, like, there's only a certain amount of, like, distinct concepts that you can have in your brains, kind of like L1 cache or a working state that are, like, immediately instantaneously accessible at a given time. And the context window is essentially that but for LMs. So the more you try to cram in there, at some point, it's got to pick and choose which pieces of that it's actually going to pay attention to. And then it has to effectively ignore the rest. The more you cram in there, the more chance that it will become confused because the salient piece in the prior thread history is not actually being attended to when it responds to the next request from the same. the user. And this is not something, it's not just like long-running threads where this is an issue. It's also an issue with things like MCP servers. So like MCP servers are, MCP is like very hot right now. And I think it's great that this protocol, I mean, we partner with Anthropic

Starting point is 01:20:50 on the initial kind of versions of this. I think it's a great protocol. And it's great that it's gotten basically everyone in the world thinking about, hey, how can I, how can I, you know, any service that exists, how can I build an MCP server to provide tools and context to large language models? Like that aspect of it is awesome. But I think that there is this kind of like learning curve you have when it comes

Starting point is 01:21:15 to MCP servers where a lot of the MCP servers in existence now weren't written intentionally and they weren't written for a specific application. As a consequence, a lot of the tools that they inject are largely irrelevant to task at hand. And each

Starting point is 01:21:31 additional tool definition becomes a of context that gets placed in the context window. So the last thing you want to do is you don't want to turn on like a dozen MCP servers, each of which injects a dozen or two tools, tool definitions, into the context window. Now all of a sudden you're talking about like 100, 200 tool definitions that off the bat, you're paying in terms of token cost, latency, and degraded quality if they're not relevant to the thing that you're trying to do. Yeah.

Starting point is 01:21:59 Babysitting is getting harder. I'm telling you. I mean, this is all leaning back to, I mean, it makes me rethink my agent flow model, not in terms of like changing it drastically, but more like, okay, I have to be more efficient with props. I have to be more efficient with maybe a role definition for whatever the role might be because it seems like that's kind of token heavy. And now it seems to be more clear that this context, I literally had no idea how the context window worked beyond no. what it does until this conversation. And I didn't think about it like that, that the more you have in your prefrontal cortex or that L1 cache, like you mentioned,

Starting point is 01:22:40 yeah, you're going to be confused. They call it focus for a reason, right? Take your to-do list, pull one off, focus on that. So that's analogous to thread, right? A new thread, focus, context, which role in my case or somebody else's case, which pep execute done rip it off new thread

Starting point is 01:23:04 clear context to me now that makes a lot more sense I would love it if you can educate the world more so on this context when and how it works because I've spent too much money with you

Starting point is 01:23:17 yeah we'll put out more material actually just as you were saying that another analogy that comes to mind beyond the human brain analogy there's also an analogy that can be drawn between threads and functions. So, like, if you're a developer, you're writing code.

Starting point is 01:23:33 You know it's an anti-pattern to have, like, a thousand-line-long main function, or like 10,000 line-long main function, right? Because there's just too much to think about. There's too many ways in which those pieces could interact. And so what do you do? You compose that long main function into various sub-functions, each of which, hopefully, is no more than, like, a dozen

Starting point is 01:23:56 or a couple dozen lines long. And that helps you encapsulate things so that you can reason about it. I think the thread is essentially the agentic analog to the function call. And so, like, you can make agents do very powerful things with relatively short threads using the flow that you describe, right?

Starting point is 01:24:18 Like, if you have a planning thread that generates a plan, well, to generate the plan, you know, often doesn't require filling up the entire context window to generate a high-quality plan. But once you have the plan, you don't have to use the same thread that generated the plan to go and implement the plan.

Starting point is 01:24:33 The plan is an artifact that you can then feed in as an argument to another thread to say, like, okay, go implement step one or two or three of the plan. And that thread doesn't have to know how the plan was written. All they need is the plan, because the point of the plan is that,

Starting point is 01:24:47 you know, once you have the plan, that's what you're following. It stands alone, yeah. Yeah, exactly. So maybe that's like another analogy that could resonate with a developer audience. For sure, for sure. Well, I've enjoyed that part of the conversation.

Starting point is 01:25:03 I know that while I didn't plan to actually go this deep on how to actually babysit an agent, maybe we could talk about what it takes to raise an agent. I love this podcast, y'all did. I think if I understand, too, like the way AMP came about was, I don't want to call it accidental because I think you all are very, very deliberate. with what you do but but it seems like in my understanding you can clarify this uh queen your your co-founder and ceo uh had this idea and i think thorsten uh torsten ball a friend of ours as well over a weekend put it together i'm loosely paraphrasing what i thought was the inception of event

Starting point is 01:25:44 yeah um what has it been like to to go from there if that part of the story is true to raising an agent building a podcast around it how has this new journey for you all been because I feel like for me, you know, something I've been preaching to a lot of the different brands we work with is that, you know, I love it that you sponsor a podcast. That's amazing. Don't stop doing that obviously because that's what sustains our business. That's what makes this show even stay here right now. But I think that you all should have your own channel. You should be creating your own content. And that's kind of what you've done with raising an agent. Can you talk about inception of AMP and raising an agent what you all been

Starting point is 01:26:24 doing around media and creation and software and core team, et cetera? Yeah, so the Inception Amp is basically as you described it. It was, you know, Thorsten rejoined the team, and he came in and was trying to figure out what to work on. And I think we'd been talking about wanting to do more agentic stuff for a while. You know, I think like the first week he was back, he tried, like, basically I told him, like, hey, can you go and like take some of these principles and try to make Cody do more organic things. And then he

Starting point is 01:26:54 did that for a week and basically came to the conclusion, which, you know, I shared at the very beginning, which is like there's so many design constraints that assume, you know, certain things about the model that it feels like you're going against the grain of what everything

Starting point is 01:27:10 else in the application wanted to do. So essentially he and Quinn went off and basically spun this up. It was like, let's just, you know, do a spike and see where it goes. And that first iteration, I think he's got like insanely great taste and great intuition for how to build quality software that wraps new technology in a way that is tasteful, but, you know, unleashes the power of the underlying technology while keeping the user experience really good. And then it just sort of like compounded from there.

Starting point is 01:27:43 It's like he built that and folks started trying it out internally. I tried it out internally. I was like, wow, this is really powerful. like that first moment where it's like, hey, go and change the background color of this random component. And then it goes and finds like the exact component and makes it and you didn't do anything past that.

Starting point is 01:28:03 It was like one of those like holy crap moments. And what we quickly realized is like how many of the things that we had to kind of like rethink from first principles? Like I think any time like a new, amazing disruptive technology comes online, you really have. have to force yourself to think from first principles because a lot of the rules of thumb and best practices that you've kind of imbibed over the years were based on assumptions that

Starting point is 01:28:37 may or may not hold true any longer. And so part of that was like, okay, we got to go back to first principles and learn what the models are capable of and what they can do within an application scaffold. that we're building in real time, we're going to have to relearn this in a real time. How can we do that in a way that shares those learnings with people using what we're building and also engages people in a way

Starting point is 01:29:04 that gets them to share feedback back to us? Because we don't have a monopoly on insights. I think everyone recognizes that we're all kind of figuring out what agents can do in real time. And so we've learned a ton from our user community in terms of how people are using it. You know, the workflow that you just described, the agent workflow, the agent flow

Starting point is 01:29:26 workflow, that's something that we, or I at least, heard from some of our users for the first time. So, like, that's not something that I came up with on my own. It's like some of the users who are using it heavily reached out and said, like, hey, we're using it in this fashion. And then I was like, oh, maybe that's something I could learn from. But a lot of those people, they reach out because they listen to this podcast and they see that we're kind of like openly sharing what we're figuring out in real time. And it's a very raw, unadultered podcast.

Starting point is 01:29:57 It's literally like two, sometimes maybe three people on the AMP Corps team just like riffing about some of the topics that we've been thinking about recently, some of the challenges, some of the idiosyncrasies, some of the weird things that we've come across,

Starting point is 01:30:13 and some of the insights. I think the most recent episode was one we did on evaluating different models and trying to plug them into both the main agent and various subagents and things we notice around that. And so it's not a polished production at all. I don't know if I don't think we'll start like our own channel around that just because like it's very ad hoc. It's like we only put out an episode. There's no like regular release schedule. It's only like when we've shipped something or we've noticed something that's particularly interesting. Then we'll kind of like get

Starting point is 01:30:46 get together the relevant people. And it's like, okay, let's riff for an hour on that and talk about it. So it's not like a, there's no regularly scheduled programming. It's more of just, hey, if we learn something cool, let's share it with others and use that as a way to get more smart people who are very on the forefront to figure out what agents can do, you know, talking to us because there's a ton to learn from out there. This most recent episode you're mentioning was episode 8, if I'm correct. Yeah. 13 days ago. Yep.

Starting point is 01:31:20 You sat down with Camden when you record team members on the AMP team. Currently sitting at 674 views. I would call that not enough, in my opinion. And it's not your fault. I don't think it's your fault. I think if you keep doing this, you'll see the dividends get paid. But, you know, I think, you know, I think, you know, I don't. look at what you are doing like this, and I can't tell if it is or is not, you know, weekly,

Starting point is 01:31:47 biweekly. I don't think, I think what we've, you know, by and large, what we've called a podcast. I think there's still this, what we do here. The change log is a podcast. There is a rhythm to it, Monday, Wednesday, Friday. There's a rhythm to it. But I think in a brand world like yours, what I call brand world, I don't think you have to be that way.

Starting point is 01:32:08 I think YouTube changes the model for you where you can. that can be your first. This is kind of like some real-time advice, by the way. Yeah, I know. I appreciate it. I think there's certainly things are going to be doing better on the awareness front for sure. Yeah. I think you're doing it right, though. So I don't think you should change much aside from maybe give it more of a first-class citizen structure in terms of what you're doing, but don't feel like you have to be there on the weekly. Don't feel like you have to be there on some sort of ceremony. Also, don't take three months, but don't feel like you have to abide by this weekly must show up must must podcast because the audience expects a release

Starting point is 01:32:44 I think there's a lot of things that you that you and others are doing inside of organizations that need you need to have some cycle a content engine but it doesn't have to be raising an agent content engine it can be source graph or AMP at large things that you should be talking about as a leading brand on this frontier just for sure and the last Last time I checked, YouTube is not charging you any money to publish, right? So it's free. Yes. The only cost to you is your time.

Starting point is 01:33:15 The only cost to you is the literal cost to produce it. It is literally free to you to produce and publish content on YouTube. Yeah. I would say if there's anyone listening to this that is good at just handling all the publishing stuff, it literally is basically just like me or Thorsten or someone else on the AMP Corps team that's just like recording and editing and pushing these things out. And so I feel like if we had like a halfway decent person who was just like, okay, you guys just talk and we'll handle the publishing and the editing and all that, that might help us release on a, but you're right. Like, you know, as a business, we should probably be more intentional for that.

Starting point is 01:34:03 For us, it's just like a fun way to engage users and people using coding agents at the moment, which, yeah, like, we could probably be getting more juice out of those conversations. I will say that, like, you know, for us, the quality people who, like, listen to it and then reach out, and they're like, oh, hey, you know, I heard you guys talking about this. Like, that in itself has been worthwhile so far, but yeah, yeah, I don't know. We absolutely should be more intentional about getting it out to a bigger audience. I don't think you should feel bad about that, really. My advice is not to shame you by any means or to make you feel like, you're doing great. Don't stop doing that. The reason why you show up and it's fun is what makes it enjoyable to watch.

Starting point is 01:34:46 So don't change how you assemble the team to sit down and have the conversation. Don't change the fun aspect of it. I would just treat some of the production and orchestration and timing a little bit more careful. because source graph deserves it and I think you if you have some sort of rhythm it's a little easier to be whimsical when you show up because you have a some sort of cycle that the brand itself is publishing content you know somebody kind of in charge of that and there is an easy button out there for you and I'll share it with you after this podcast but there is an easy button I'll help you press I say keep doing it I'm enjoying the I'm enjoying the pod that's

Starting point is 01:35:29 why i brought it up because i think you know i think calling it raising an agent uh only shines a light on the fact that this is burgeoning that we're all exploring together and that you're kind of learning in real time and sharing almost in real time too with the core team and it began as sort of a skunk works kind of podcast just talking about what you've done and i think you have an opportunity to to blossom it to just a a little bit more than that so mainly just encourage you to keep doing it because i'm enjoying the content personally cool Appreciate you saying that. Yeah, absolutely.

Starting point is 01:36:02 I think even calling it raising an agent is just, it's genius. I don't know who came up with that, but it's genius. I think Thorson, credit goes to him for, for that. Something needs to be done with him because he is the brainchild behind this. I think even the, you might remember this, I emailed you a few months back when the AMP code website was, I would say, in its very, very infancy. and the copy, the web copy, the copywriting on that I think I misspoke at the time. For whatever reason, I thought I had seen a Slack message

Starting point is 01:36:38 where Quinn was like I wrote it, but it was actually Thorsten. Yeah. And he's an amazing writer. He puts out like a weekly newsletter too called Joy and Curiosity, where he just talks about cool and delightful things that he's played with both like technical and non-technical in the previous week that's just, it's amazing.

Starting point is 01:36:59 Like, he has a rare combination of, like, super sharp, great technical skills, but also great communicator and, frankly, like, great writer, which, like, that alone, it's too rare a skill these days. Like, very good quality writing from the heart that is clearly articulated and, like, every word you can tell was thoughtful, intentional. in the age of, you know, L-LM-generated text, it's increasingly rare. And I think it's a very unique and it's a good skill to have. Yeah, I'm a big fan of Thorsten. We had him on the podcast Go Time a long time ago. That's where I first met him. He'd written a couple books since then.

Starting point is 01:37:48 And then whenever we were changing the cast members out to kind of invite more folks into the Go-Time podcast world, he was on my list. And Zai really wanted him to be involved. He ended up not being able to have as much time. I think because he got employed by you all. And he's like, now I'm totally focused. This and that. And I was like, oh, my gosh, source graph is amazing. You should totally be milking that opportunity and doing all.

Starting point is 01:38:11 And this is way before this world we're in. Now, this is when it was code search, you know, code intelligence, which is not something to be, not a pejorative, but like, wow, how much progress. That's why I've been so enamored by you personally and Quinn and source graph. I've seen your story arc since 2014 when we interviewed you in a dark room at GopherCon asking you five questions like that's like I've seen you since then

Starting point is 01:38:36 and like the beyond who you are is still the same person but like what you've been able to accomplish on this thread pull of like just help developers be more intelligent with their code bases have more clarity and understanding be more efficient with your processes understand a code base more quickly you know get up to speed

Starting point is 01:38:52 and ship something sooner like this iterative approach to where you're at is why I'm just like so so curious, I suppose, about what you're building and why I care so much about what you, how you showcase what you build, you know? No, totally. I mean, we've been building dev tools for, you know, the vast majority of our professional careers.

Starting point is 01:39:13 And I think it's just, I mean, both Quinn and I are just insanely passionate about this area. It's like, you know, if, if you just like let us code all day in a cave, that would be a great life in my view. And I think, like, you know, building DevTools for as long as we've been building them, we've learned a lot both about how individual developers work, the diversity of preferences and skills and tools out there, but also at the team level, like, how organizations build software and all the challenges and bottlenecks that are in the

Starting point is 01:39:56 that process. And I think there's nothing I'd rather be doing with my life. Because like developer tools, it's one of those very rare areas where it's fun and intellectually stimulating. It's also like the economic impact is huge, given the importance, the increasing importance of like software and driving the world. But, you know, like the theory behind computer science and now, you know, machine learning and AI. I don't know. There's almost like a spiritual element to it where you can see glimpses of maybe some of the fundamental laws around like knowledge and information that govern the universe. And so for that reason, it's just it's insanely rewarding and fun area to work in. Yeah. Man. You know, I really struggle with that idea of going into a cave

Starting point is 01:40:50 encoding all day. And actually I have a playlist on on YouTube called Working Beats. It's a glimpse into the Adam world. And I just, as I hear cool beats on YouTube, I store them away. If I'm going to work to them, I store them in my working beats playlist.

Starting point is 01:41:06 And I go back to it. And there's lots of cool beats on YouTube from all sorts of places. So I don't subscribe to one channel. I just make a playlist. And I let the algorithm help me in discovery, right? That's how you're supposed to do it. And so I put this one in there

Starting point is 01:41:20 the background like in on a YouTube video is always what there's a video right and so on this one it's not just the music but it's this background and it's this developer with like these three large monitors which is just amazing code on all of them I'm like man that's beautiful right but then this developer is sitting at this really cool desk in this really minimalistic room with these super huge windows like the they got the desk the monitors and these massive windows with this view and the view like a d-h-h-h view like have you ever seen him And the view is this massive, beautiful mountain scape. And I'm like, that is so weird.

Starting point is 01:41:59 I love it visually. But then you think about it, right? That developer is staring into the black window to the black mirrors, they call it, right? Hammering away code. Now, I'm with you. I'm like, yes, take me there. I love that. I get joy in that.

Starting point is 01:42:11 There's more life than that, of course. Yeah, yeah. But then I'm like, this is juxtaposed to this beautiful view. Right? And software is not making the mountain, right? Software may make the company possible that makes the boots that allow a person to put them on and climb the mountain, but it's not making the mountain. So it's like this really weird, you know, beautiful view, but juxtaposed against each other. It's like, well, here's a, here's developer, three huge monitors, coding, loving life, while this beautiful mountain scape is, is off the view.

Starting point is 01:42:48 Yeah. I don't know. But, you know, like, the way that you, I don't know, to what I would say that is like the way you experience the mountain. You know, the photons bouncing off, you know, from the star to the mountain into your eyeballs and then how they get translated into these signals that get assembled into what we call consciousness. You know, there's certain, like, fundamental, I don't know, the forces of nature at work there. think, you know, one of the cool things about the latest advances in AI is now you can actually see some of these come into something that's like tangible to everyone, you know, like what is

Starting point is 01:43:26 intelligence, what is reasoning, what is, you know, thinking. You know, I said earlier, like, I'm not like AGI Pilled or anything like that. I'm not like a Dumer. I don't think it's, it's Nirvana. I think these are very much like tools. But I do think that like we figured out in in the transformer architecture and some of the other like emerging architectures that are coming out how to capture what is essentially happening in some part of your brain. There's some sub-region in your brain that I think is very, like, whatever it's doing is very well pattern-matched by what the neurons in a transformer are doing when they're taking input and producing output that is, you know, what we would call like semantically correct

Starting point is 01:44:14 so yeah I don't know I just think it's all insanely cool yeah I mean I I would be that if I could have that view I would have that view

Starting point is 01:44:24 so I'm not knocking by the means I was just thinking like you know as you said I would I would be happy to code in a cave all day or be in a cave coding all day I can agree with that

Starting point is 01:44:34 and I can empathize with it and I can have desires around that and then I'm like well there's also more to life than that and I'm not saying that you don't think that by any means, because we both have children.

Starting point is 01:44:47 I'm sure we both feel the same way about our families and our wives and our children and our lives and our lives and stuff like that. I think we're in a unique position in time and history, which isn't that always the case for every human being to be in a unique position in time and history? It's always the case, right? It's funny how history works. Every year is unprecedented, right?

Starting point is 01:45:06 That's right. Yeah. I think in particular to us, like we've been on the struggle bus, I would call it, as developers, key by key stroke by stroke character by character putting into the machine to get the depth tool out or to get the thing out

Starting point is 01:45:20 and now we've gotten to a point where we can say okay now my one actually equals 20 you know where it's a force multiplier now you take one atom or one beyond and now we're I'm able to contextually be able to hold

Starting point is 01:45:36 and produce more in various lanes because of it it's a force multiplier so we're in that unique world so I'm with you I I struggle with a desire because I want to go deeper and it is tantalizing because I can produce way more than I've ever been able to before or even explore caverns I've never explored before right I'm going into new regions I've personally never explored before you may have somebody else may have but not me and so I like that idea here's what I want to end on because I know we're getting close to time I want to call them skeptics I kind of want to call them resistors. They're not AI skeptics, and maybe you can disagree or agree with this,

Starting point is 01:46:19 but I want you to give, because I think you've got a unique, you know, perspective, a unique frame of reference on where things are at because you're building some of these tools and where we may be going and even tap by your own personal taste

Starting point is 01:46:35 and desires for where we can go. Speak to what I would call the resistors, that they don't, know how to handle it. They're aware of it, obviously. They're not resisting it necessarily, but they're just not sure they want to let go of the old way. They're not sure how to conjure it.

Starting point is 01:46:53 They're not sure how to get the magic out of the genie in the land. They're not sure how to rub it or they're using the context window wrong. They're holding it wrong. And they're just not leaning in. And they're not seeing it as a skill set to build or the skill set to build for the future of where software development and program is going. speak to that world as clear and as open as you like to. Yeah, that has a great prompt.

Starting point is 01:47:19 I'm thinking about like... It wasn't a question. It wasn't open-ended. Like, just I gave you some points. Two things. One, I'll speak kind of like philosophically, and then I'll speak on just like a practical level. So like philosophically, I think like a lot of the skeptics come from a place that a lot of developers come from, which is a point of skepticism and especially in an area that's,

Starting point is 01:47:42 as hyped up as AI is, there's sort of like a natural reaction. Like, if there's this much noise around what it supposedly can do, that's often a contraindicator against the actual substance of the technology. And there, I would agree with

Starting point is 01:47:57 a lot of them in saying that, like, the space is very much overhyped. Like, the people saying that it's going to solve all our problems, humans are going to be out of the job, or we won't have to work, or, you know, it could go off the rails and kill us all. Like, that is just, it's like, those statements like that are, to me, they're like not even wrong.

Starting point is 01:48:16 They're like so far outside the band of like what we should even be talking about with respect to this technology that I can understand the frustration that like a lot of people have where they're like, look, you know, people are saying like, this is, this is going to replace humans. And then I go and use it and can't even figure out how to SSH into, you know, my remote machine, right? I don't know how to do that. I can't do that. Yes, you can. Yeah. And so, like, I would say philosophically, like, what the technology is, it's not consciousness, it's not a human replacement. It's not a God. What it is is it's a universal pattern matcher. Like, if you go and watch Demis's, like, Nobel lecture, he's the head of Deep Mind, received a Nobel Prize for the work that they did on that team in protein folding. It's like a very short lecture. It's like 30 minutes. But he says something very insightful there. where it's like the transformer architecture,

Starting point is 01:49:14 what it really does is it allows you to fit any pattern that is either observable in nature or that you can synthetically generate. So as long as you have a way to collect the data from observing nature or generate that data in an environment that closely enough approximates what you're getting after, what the transformer model architecture does is allows you to train a model that,

Starting point is 01:49:41 fits that pattern. They're great like curve fitters or pattern matchers. And so that is not, you know, Nirvana, but it is a useful tool. And so if I hand you a universal pattern matcher, and I say, I've trained it on all these different workflows,

Starting point is 01:49:57 you know, like coding, the coding workflow, there's patterns that emerge from that. A lot of the tedious stuff is very like, you know, patternful in the way that, you know, what you do, there's this sort almost like a rote that you learn when doing it.

Starting point is 01:50:12 You as a human understand what these patterns are. Now I'm handing you this technology that can fit those patterns as long as they're represented in text and as long as they're represented an environment where you can validate what is a good pattern

Starting point is 01:50:24 or what is a batter pattern in that universe. That's the mindset with which you should approach these technologies and that puts you in a mindset of like not being, like you don't want to go into trying these tools as a mindset of automatic skepticism. I feel like a lot

Starting point is 01:50:40 the more prominent voices out there, it's almost like they're going into trying these tools with the intent to show that it's all hot air. That's the wrong mindset. I think you should go in with a mindset of like, hey, this is amazing new technology, is a universal pattern matcher.

Starting point is 01:50:56 How can I explore what's possible with this? Let me put myself in like explore, try new things mode. And that segues into my practical advice, which is like if you want to experience the wow, in a way that's like tangible and delightful and also practical, I think the first thing I would do is pick an app or pick a domain that's like somewhat outside of your wheelhouse

Starting point is 01:51:21 as a developer. You know, maybe you're a hardcore systems engineer, but you've never built an iPhone app. But, you know, you have a three-year-old kid and you want to build, you know, a game that teaches him how to like spell basic words or something like that. Go and sit down, you know, it could be with AMP.

Starting point is 01:51:39 It could be with any of the other like coding agents, for this particular task, I think most of them that you've heard of will do a pretty good job of building a basic app that's outside of your wheelhouse to do something that is simple to a lot of people out there, but hard for you. And if you do that, I think probably with like 98% confidence, you'll have a good experience with the proper mindset. And that will then kind of like motivate you to try the technology in various settings and increasingly complex settings to see what it's capable of. I like that. I mean, I think that's spot on with my own recent experience is that I've gone just a little outside of my wheelhouse. I've built a couple of CLI tools. I've learned about CLI patterns that are obvious like dash dash help, dash version.

Starting point is 01:52:31 Those are pretty common ones. But all these different ways to leverage CILIs. There's known patterns out there. And I've done a version of what you've said. I've had a personal itch. And in my case, I've justified the spend because I'm like, well, I've got to learn. I've got to do these things anyways. And so it's cool to do that.

Starting point is 01:52:49 But I want to build a tool or something I can actually use day to day that maybe gets better. And my hope is that eventually Seven's Arch or Seven Z Arch is shareable to the world. I want it to be open source. It is not an atom tool. It's a world tool. And hopefully, you know, if anybody likes the way I'm compressing and the way that we have to compress large media directories, how many YouTubers are out there, right? You will eventually use

Starting point is 01:53:13 my tool beyond your team will because, hey, you produced this awesome podcast called Raising an Agent, or you do things on YouTube. So you have maybe a desire to keep these artifacts long term. And I say, hey, compress them, save some file size, you know, that makes sense. But I've done just that. I've stepped

Starting point is 01:53:29 outside. It's a useful tool to me. And because I have the itch and because I'm the user who understands how I want to work, I'm able to more clearly learn all about the go world, all about CLI tooling, all about maybe even the Tui world and the exploration there. I can do all those things, but I can also improve my own workflows, my own tools, and then share those with the world. And for a while there, I was really,

Starting point is 01:53:55 like maybe about a month ago, I was actually really, really sad because we've obviously built this podcast around software development, but specifically this burgeoning and now the way called open source. And back in 2000, When we first started this podcast, open source was moving so fast. GitHub was one year old, wasn't owned by Microsoft, you know, and it was moving fast. And it was so hard to keep up. And our tagline was open source moves fast, keep up. We've let that go because it's sort of snarky, but it was kind of core to our original DNA, right?

Starting point is 01:54:33 But open source moves fast. It's hard to keep up. But my hope is that there's more open source. But for a bit there, I was really, really bummed. thinking, wow, if we can just generate new code so quickly, does the value of the patterns captured in open source at large, do they become less important? Does it become less important to structure those patterns into projects, into communities, into whatever, and that's how open source works.

Starting point is 01:55:00 Currently, does that change because now we can generate so quickly, does the pattern called open source no longer become the same value? And for a bit, I was really bummed. And now I feel very hopeful that the future of open source is maybe even more brighter because, one, you may have an influx of users to open source that's already out there. So discovering good tools. And so that's great for open source. Hypothetically, you know, a maintainer may feel the pain because it's a slop,

Starting point is 01:55:27 but that's a whole different podcast and subject. Yep, yep. But then you also have all these new builders that can start to scratch their own edge and want to share the thing they made. I'm so hopeful for open source now, where I thought before maybe, so that you could generate it so easily, it would become less valuable. Yeah.

Starting point is 01:55:43 I think it's so hard to predict what the net effects of this will be. Because there's a certain aspect of it which makes libraries less necessary. You know, when common pieces of functionality are like auto-generatable. There's a lot of cases where even with an amp,

Starting point is 01:56:07 there's like one or two packages that we built internally. We built our own like TUI framework for the latest, the new TUI of AMP. And it's like, you know, would we have done that prior to

Starting point is 01:56:22 coding agents being a thing? It would have been a lot more expensive. It would have taken a lot longer to ship that. But now we can do that because we're able to move much more quickly. But at the same time, I do think that there's, there remains a use case

Starting point is 01:56:39 for having libraries, I just think that the nature of which libraries are really popular is probably going to change. I think prior to agents and AI, a lot of the most popular libraries in existence were some form of middleware or a piece of abstraction that helped make using a particular API or technology

Starting point is 01:57:01 or a piece of physical hardware more accessible. And so that was like a big problem that libraries like open source packages and libraries solve for you. I think now that there's kind of like less demand for that form of package, but still a large amount of demand for libraries that, you know, are robust, they're well tested, that provide, like you still need some amount of abstraction, but maybe like fewer layers of abstraction over an underlying capability. I don't know if that makes sense.

Starting point is 01:57:37 Like, I would say, like, my, you know, I'll say, you know, strongly stated opinion, but maybe weekly held, given how quickly things are changing, is that we'll see far fewer libraries that are, like, purely just like, hey, you know, I, this is a neat way to read files in Node.js or something like that. And much more libraries like, hey, you know, there's this new piece of hardware that got developed. Or maybe there's, like, some new, like, biotech technology, biotech technology. that is available now and like someone built an API for that and it exposes like all the key hooks and because software is so cheap now code is is cheaper to generate with with agents than with you know humans manually hand coding everything we'll just see a lot more things expose software endpoints and and we'll see a lot richer playground for people building software to be able to like hit different things that that do things either you know in cyber space or in the physical world. Cyberspace. That's cool, man. I haven't heard that word in a while, man. That's cool. Cyberspace.

Starting point is 01:58:46 All right. Well, ampcode.com. I am a fan. And to be clear, I use all agents. I am not agent agnostic. I want to use everything. And I know that's the cool thing with AMP is I don't have to choose the model.

Starting point is 01:59:02 We barely touch on that. But I love that I could just use it knowing that it's a high quality output kind of tool, that you're tuning to help me not have to think about swapping models or limitations. It's always that. And I want you to help me learn how to use my context window better, more so the news of the podcast. Yeah, because I'm probably inefficient and overspending as a result. So now I know. Now I know. Yeah. To those listening, like, my recommendation is you should absolutely sample the field of coding agents. Like there's so many and I think you should find the one that

Starting point is 01:59:36 like fits the best with you. I think like we mentioned earlier in this conversation, like the switching cost is so low. But the only way you're going to find the best one for you is if you actually go and like try the different ones in existence. It's funny how much like hype and like high level conversation there is about like this and that. And at the end of the day, it's like it's never been easier just to like go try these things. So firsthand experience, I think is it heavily outweighs whatever the latest Twitter influencer is saying about the landscape.

Starting point is 02:00:12 So, you know, try AMP, let us know what you think, but also try a bunch of other agent-ac coding tools as well. Yeah, try them all. Try them all. Anything left in closing, anything over the horizon, anything that is maybe a sneak peek or a tease or anything you can share in closing. I will say, when is this coming out?

Starting point is 02:00:32 Is it going to be like sometime in September? Not next Wednesday, but the Wednesday after that. So literally it's going to ship on September 17th. Okay, cool. I think by then some of this stuff will be out. But I think right now an area of active exploration for us is just experimenting with all the different new models that have come online.

Starting point is 02:00:52 There's a lot of great models that are really good at tool use now that occupy different places along the latency, intelligence, perado curve. And so one of the things that we're doing is we're playing around with all these different models and seeing how well they function in either sub-agents that AMP uses, like the Oracle for thinking or the search sub-agent for discovering context or just like the generic sub-agent, which conserves the context window of the main agent. But I think within, like by the release date of this podcast, we'll have deployed some of these new models.

Starting point is 02:01:33 into different places in the application. They should help speed things up. And I think one of the things that's an active area of consideration for us is like up until now, part of the experience of using a coding agent has been just like waiting, waiting for it to get done. Because the token throughput is at a certain level.

Starting point is 02:01:55 And I think we're seeing a lot of positive signs that will allow us to bring down that latency by using different models, the best model that we can find. for each task in the next couple weeks. And so I think that will drastically improve the experience. And it may also push the latency past the point where there's some sort of inflection point where if we can get the latency below a certain level,

Starting point is 02:02:18 it will change the kind of nature of how it feels to use a coding agent. Let's put it that way. Yeah, absolutely. I love the principles you built on. We didn't touch on those. But I love the principles you built on. Keep doing what you're doing. Don't change the thing.

Starting point is 02:02:33 just do more of it and share the what's the wise in the house on that awesome raising agents podcast. Make it more frequent if you can. I don't think it's necessary, but I think definitely elevated to that first class experience of intentionality and production level. I think that's a pretty easy button to push. And it doesn't take a lot for you and the team involved in sitting down and hosting and talking to do. You can employ people around you to have that context instead of you. there you go. All right, Bejong, thank you so much for your time here, man. It's always a pleasure.

Starting point is 02:03:06 I always appreciate talking with you. Thank you so much. Thanks for having me on, Adam. Always a pleasure. And yeah, let's keep talking. You know, I'm a firm believer in just sitting down for a deep conversation with a nerd. And Beong is a filender like me, like you, just holding tight to where we're at, this ever-changing, wow. how do we get here moment in software development i don't know about you but everything is changing everything has changed and everything will be changing so lots of change change change log that sounds kind of familiar but amp is one of my favorite agentic tools i use augment code i've tried codex i use cloud code pretty much regularly that is my main driver but i'm trying

Starting point is 02:03:56 them all and i think you should too but one thing i love about amp and one thing i've experienced with is it seems to be a cut above the rest. It seems to be better. It seems to think differently, think better, plan better, execute better. And I can just make a plan and set it loose and just sit back. And that's kind of cool. It's not a one shot. It is a very thorough, very deep, takes me lots of time. Let's say I'll sit down for 40-ish minutes and just make a plan, just planning, thinking through, orchestrating, thinking about the user experience, thinking about the implementation details. And then once I feel good about the next step, I set it loose. And that's so cool. Well, we just got back from our friends Hangout, the HQ of Oxide. Wow. Oakland, Emeryville,

Starting point is 02:04:48 Brian Cantrell, Steve Tuck, and the rest of the team at Oxide, beyond, beyond impressive. So stoked to have had a chance to go over to their internal conference, peel back the layers of oxide, their culture, their team, their mission, all the things. I'll be sharing that with you next week. Again, a big massive thank you to our friends at Fly. They are awesome. Our friends over at depot, depot.dev. And of course, our friends at CodeRabbit, AI-driven, code review. Now in your CLI, man, it is wild.

Starting point is 02:05:24 It is so wild. check them out codrabbit dot AI and to the beat freak in residence himself brake master cylinder bringing those beats

Starting point is 02:05:34 banging banging love your BMC okay this show's done we'll see you on Friday Game on.

The Changelog: Software Development, Open Source - Flowing with agents (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.