Latent Space: The AI Engineer Podcast - Cline: the open source coding agent that doesn't cut costs

Starting point is 00:00:05 Hey, everyone. Welcome to the Lidden Space Podcast. This is Celacio, partner, and CTOA Decibel. And I'm joined by my co-host, Wix, Fondero, Small A.I. Welcome, welcome. And today on the studio with a nice two guests from Klein, Pash, and Soud. That's right. Yes. You nailed it.

Starting point is 00:00:21 Let's go. I think that Klein has a decent fan base, but not everyone has heard of it. Maybe we should just get at, like, an upfront. Like, what is Klein maybe from you? And then, like, you can modify that as well. Yeah, Klein's an open source coding agent. It's a VS code extension right now, but it's coming to JetBrains and NeoVim and the CLI.

Starting point is 00:00:43 You give Klein a task, and he just goes off and does it. He can take over your terminal, your editor, your browser, connect to all sorts of MCP services, and essentially take over your entire developer workflow. And it becomes this point of contact for you to get your entire job done, essentially. Beautiful. Pash, what would you modify or, what's another way to look at Klein that you think is also valuable?

Starting point is 00:01:08 Yeah, I think Klein is the kind of infrastructure layer for agents, for all open source agents, people building on top of this like agentic infrastructure. Klein is a fully modular system. That's the way we envision it. We're trying to make it more modularized so that you can build any agents on top of it. So with the CLI and with the SDK that we're rolling out, you're going to be able to build fully agentic systems for any. thing, not just coding. Oh, okay. That is a different perspective on client that I had. So, okay,

Starting point is 00:01:41 let's talk about coding first, and then we'll talk about the broader stuff. You also are similar to Ader. I don't know who comes first in that you use the plan and act paradigm quite a bit. I'm not sure how well known this is. To me, I'm relatively up to speed on it. But again, like, maybe you guys want to explain why different models for different things. Yeah, I'm going to take the cred for coming up with plan act first. Okay. Clown was the first to sort of come up with this concept of having two modes for the developer to engage with.

Starting point is 00:02:13 So just in like talking to our users and seeing how they use Klein where it was really only an input field. We found a lot of them starting off working with the agent, coming up with a marked on file where they asked the agent to put together some kind of architecture or plan for the work that they want the agent to go on to do. And so we would find that that people would. just came up with this workflow for themselves just organically. And so we thought about how he might translate that into the product.

Starting point is 00:02:41 So it's a little bit more intuitive for new users who don't have to kind of pick up that pattern for themselves and can kind of direct and put in guard rails for the agent to hear to these different modes whenever the user switches between them. So, for example, in plan mode, the agent's directed to be more exploratory, read more files, get sort of understanding and fill up its context with any sort of relevant information to come up with a plan of attack for whatever the task is the user wants to accomplish.

Starting point is 00:03:10 And then when they switch to act mode, that's when the agent gets this directive to look at the plan and start executing on it, running commands, editing files. And it just makes working with agents a little bit easier, especially with something like Klein, where a lot of the times people's engagement with it is mostly in the plan mode, where there's a lot of back and forth, there's a lot of extracting context from the design. developer, you know, asking questions, you know, what do you want the theme to look like, what pages do you want on the website, just trying to extract any sort of information that the user might not have put into their initial prompt. Once the user feels like, okay, I'm ready to let the

Starting point is 00:03:50 agent go off and work on this. They switch to act mode, check auto-approved, and just kick their feet up and, you know, get coffee or whatever and let the agent get the job done. So, yeah, most of the engagement happens in the plan mode and then act mode. they kind of just have a peripheral vision into what's going on, mostly to course correct whenever it goes in the wrong direction. But for the most part, they can just rely on the model to get it done. And was this the first shape of the product, or did you get to the plan act iteratively?

Starting point is 00:04:21 And maybe was this the first idea of the company itself, or were you exploring other stuff? It was a lot of, especially in the early days of the client, it was a lot of experimenting and talking to our users and seeing what kind of workflows came up, that they found that were useful for them and translating them into the product. So plan and act was really a byproduct

Starting point is 00:04:39 of just talking to people in our Discord, just asking them what would be useful to them, what kind of prompt shortcuts we could add into the UI? I mean, that's really all plan and act mode is. It's essentially a shortcut for the user to save them, the trouble of having to type out, you know, I want you to ask me questions and put together a plan.

Starting point is 00:04:59 The way that you might have to and, you know, some of the other tools, you have to be explicit about, I want you to come up with a plan before, you know, acting on it or editing files. Incorporating that into the UI, just saves the user the trouble of having to type that out themselves. But you started right away as a coding product. And then this was part of, okay, how do we get better UX, basically?

Starting point is 00:05:21 Exactly. Yeah. What was the model evaluation at the time? So I'm sure part of like the, we need plan and act is like maybe the models are not able to do it end to end. And when you started working on that part of thing, where were the model limitations, what were the best models, and then how is that evolved over time? Yeah, when I first started working on Klein, this was, I think, 10 days after Cloud 3-5 Sonic came out, I was reading Anthropics model card addendum, and there was this section about agentic coding and how it was so much better at this step-by-step accomplishing tasks. And they talked about running this internal test where they let the model run in this loop where it could call tools. And it was obvious to me that, okay, they have some version, they have some application internally that's really different from how the other things at the other things like co-pilot and cursor and Ader.

Starting point is 00:06:11 They didn't do this for like step-by-step reasoning and accomplishing tasks. They were more suited for the Q&A and one-shot prompting paradigm. At the time, I think it was June 2024, Anthropic was doing a Bill with Cloud Hackathon. So I thought, okay, this is a really cool new capability that none of the models have really been capable of doing before. And I think being able to create something from the ground up and take advantage of kind of like the nuances of how much the models improved in that point in time. So for example, Cloud 35 was also really good at this test called Needle in a Haystack, where if it has a lot of context in its context window, for example, 90% of its 200K context window is filled up, it's really good at picking out granular details in that context, whereas before Cloud 3-5, it really pay a lot more attention to whatever was at the beginning or the end of the context.

Starting point is 00:07:09 So just taking advantage of kind of the nuances of it being better at understanding longer context and it being better at task-by-task, sorry, step-by-step accomplishing tasks and building a product from the ground up, just kind of let me create something that just felt a little bit different than anything else that was around at the time. And some of the core principles in building the first version of the product was just keep it really simple. Just let the developer feel like they can kind of use it however they want. So make it as general as possible and kind of let them come up with whatever workflows works well for them. People use it for all sorts of things outside of coding.

Starting point is 00:07:45 Our product marketing guy, Nick Baumann, he uses it to connect to a Reddit MCP server, scrape content connected to an ex-MCP server and post tweet. essentially, even though it's a VS code extension and a coding agent, MCP kind of lets it function as this everything agent where it can connect to whatever services and things like that. And that's really a side effect of having very general prompts just in the product and not sort of limiting it to just coding tasks. I was at a conference in Amsterdam and I built my whole presentation, my whole slide deck using this library.

Starting point is 00:08:20 It's like a JavaScript library called SlideDev. And I just asked Klein like, hey, like here's like my style guy. guidelines. I wrote like a big Klein rules document explaining like how I want to style the presentation in slide dev. I told Klein like the agenda I kind of recorded using this other app called Limitless like transcribed my voice in a text about like my thoughts just like stream of consciousness about what I was going to talk about for this conference for my talk and Klein just went in and built the whole deck for me. So you know, Klein really can do anything. In JavaScript. In JavaScript. In JavaScript.

Starting point is 00:08:56 In JavaScript, yeah. Yeah. So it's kind of a coding use case. It was kind of a coding use case, but then making a presentation out of it. But it can also run scripts, like, do like data analysis for you and then put that into a deck, you know, kind of combine things. And being a VS code extension is kind of this like, it gives you these interesting capabilities where you have access to the user's OS, you have access to the user's terminal, and, you know,

Starting point is 00:09:21 you can read and edit files. Being an extension, it reduces. a lot of the onboarding friction for a lot of developers or they don't have to, you know, install a whole new application or have to, you know, go through whatever internal jumping through hoops to try to get something approved to use within their organizations. So the marketplace gave us a ton of really great distribution and is sort of like the perfect conduit for something that needs access to files on your desktop or to be able to run things on your terminal, to be able to edit code and to take advantage of VS codes, really nice UI and show you like,

Starting point is 00:09:54 diff views, for example, before and after it makes changes to files. Wouldn't you tempted to fork VES code, though? I mean, you could be sitting on $3 billion right now. Well, no, I actually like pity anybody that has to fork Vs code because Microsoft makes it notoriously difficult to maintain these forks. So a lot of resources and efforts go into just maintaining, keeping your fork up to date with all the updates that VS code is making. I see.

Starting point is 00:10:23 Exactly, because they have a private repo and they just sync it. There's no, like... Exactly, exactly. And there's one of those kinds of open source projects. Right, and VS code's moving so quickly where I'm sure they run into all sorts of issues, not just in, you know, things like merge conflicts, but also in the back end. They're always making improvements and changes to, for example, their VS marketplace API and to have to like reverse engineer that and figure out kind of how to make sure that your users

Starting point is 00:10:48 don't run into issues using things like that is, I'm sure, like a huge headache for anybody that has to maintain a VS code fork. And it also, you know, being an extension also gives us a lot more distribution. It's not that you have to use us or somebody else. You can use Klein in cursor or in windsurf or in VS code. And I think Klein complements all these things really well in that, you know, we get the opportunity to kind of figure out and work really closely with our users to figure out what the best agentic experience is,

Starting point is 00:11:15 whereas, you know, cursor and windsurf and co-pilot have to think about the entire developer experience, the inline code edits, the Q&A, sort of all the other bells and whistles that go into writing code, we get to just focus on what I think is the future programming, which is this agentic paradigm. And as the models get better, people are going to find themselves using natural language, working with an agent more and more and less being in the weeds

Starting point is 00:11:40 and editing code and tab autocomplete. Yeah, just like imagine how many like resources you would have to spend maintaining a fork of VS code, where we can just kind of stay focused on the core agentic loop optimizing for different model families as they come out supporting them. You know, there's so much work that goes into all this that maintaining a fork on the side would just be such a massive distraction for us that I don't think is really worth it. I feel like when you talk, I hear this distinction between we want to be the best thing

Starting point is 00:12:12 for the future programming and then also, this is also great for non-programming. As this something that has been recent for you where like you're seeing more and more people use the MCP servers, especially to do less technical thing, and that's an interesting area, or do you feel like programming is the highest kind of economic value thing to be selling today? I'm curious if you can share more. In terms of economic value, programming is definitely the highest cost of benefit for language models right now. I think, you know, we're seeing a lot of, you know, model labs recognize that, opening-eyed, anthropic or taking coding a lot more seriously than I think they did a year ago.

Starting point is 00:12:50 What we've seen is while yes, the MCP ecosystem is growing and a lot of people are using for things outside of programming, the majority use case is mostly developer work. There was an article on Hacker News a couple of weeks ago about how a developer deployed a buggy cloudflare worker and used a sentry MCP server to pull a stack trace and ask Klein to sort of fix the bug using the stack trace information, connect to a GitHub MCP server to close. the issue and deploy the fix to Cloudflare all right within Klein using natural language, never having to leave VS code, and it sort of interacts with all these services, otherwise the developer would have had to have the cognitive overlap of having to, you know, figure out for himself and leave his developer environment to essentially do what the agent could have done just all in the background, just using natural language. So I think that's kind of like where things are headed is the application layer being connected to sort of all the

Starting point is 00:13:49 different services that you might have had to interact with before manually, and it being this sort of single point of contact for you to interact with using natural language. And you being less and less in the code and more and more a high-level understanding of what the agent's doing and being able to course correct. I think that's another part of what's important to us and what's allowed us to kind of cut through the noise in this incredibly noisy spaces. I think a lot of people have really grand ideas for where things are heading, but we've been really maniacal of about what's useful to people today. And a large part of that is understanding

Starting point is 00:14:23 the limitations of these models, what they're not so good up, and giving enough insight into those sorts of things to the end developers so that they know how to course correct, they know how to give feedback when things don't go right. So, for example, Klein is really good about, giving you a lot of insight into the prompts going into the model, into when there's an error,

Starting point is 00:14:45 why the error happened, into the tools that the model's calling, we try to give as much insight into what exactly the model is doing at each step in accomplishing a task. So when things don't go wrong, where it starts to go off in the wrong direction, you can give it feedback and course correct. I think the course correcting part is so incredibly important

Starting point is 00:15:03 in getting work done. I think much more quickly than if you were to kind of give a background agent work, you come back a couple hours later, and it's just like totally wrong, and it didn't do anything that you expected it to do, and you kind of have to retry a couple times for it gets it right. I think the

Starting point is 00:15:17 Sentry example is great because I feel in a way the MCPs are like cannibalizing the products themselves. Like I started using the Sentry MCP and then Century Release here, which is like their issue resolution agent, and it was free at the start. So I turned it

Starting point is 00:15:33 on in Centry. I was using it. It's great. And then they started charging money for it and I'm like, I can use the MCP for free. Put the data and my coding agent and it's going to fix the issue for free and send it back. I'm curious to see, especially in coding, where you can kind of have this closed loop, where, okay, artist MCP is going to become the paid AI offering so that then you can plug it in. And

Starting point is 00:15:56 this client going to have kind of like a MCP subscription where like you're kind of fractionalizing all these costs. To me, today feels like it doesn't make a lot of sense the way their structure. Well, yeah, we were like very early on. We like, we've been bullish on MCP from the very beginning. And were you a launch partner? A funnierceiver with MCP, I think. Sorry to interrupt. Yeah, no worries. I think when Anthropic first launched MCP and they made this big announcement about, you know, this new protocol that they've been working on and open sourcing it, nobody really understood what it meant.

Starting point is 00:16:27 And it took me some time really digging into their documentation about how it works and why this is important. I think they kind of took this bet on the open source community contributing to an ecosystem in order for it to really take off. And so I wanted to try to help with that. as much as possible. So for a long time, most of Klein's system prompt was, how does MCP work? Because it was so new at the time

Starting point is 00:16:49 that the models didn't know anything about it. And how to make MCP servers. So if the developer wanted to, you know, make something like that, it'd be really good at it. And I'd like to think that, you know, client had something to do with how much the MCP ecosystem

Starting point is 00:17:03 has grown since then and just getting developers more insight and sort of awareness about how it works under the hood, which I think is incredibly important in using it, let alone just developing these things. And so, yeah, when we launched MCP in Klein, I remember our Discord users just trying to wrap their heads around it.

Starting point is 00:17:22 And in seeing clients build MCP servers from the ground up, they're like, okay, they started to connect the dots. This is how it works under the hood. This is why it's useful. This is how agents connect to these tools and services and these APIs and sort of save me a lot of the trouble of having to do this sort of stuff myself. Those were like the early days of MCP, even people were still trying to wrap their heads around it. And there's like a big problem with discoverability. So back in like February, we launched the MCP marketplace where you could actually go through and have like this one click install process where Klein would actually go through looking at a read me. That's like linked to a GitHub, install the whole MCP server from scratch and just get it running immediately.

Starting point is 00:18:04 And that was like I think around that time, that's when MCP really started taking off with like the launch of the marketplace where people were able to discover MCPs, contribute. beat to the MCP marketplace. We've listed over like 150 MCP servers since then. And like the top MCPs in our marketplace have over, you know, hundreds of thousands of downloads, people using them. And, you know, there's like really notable examples where you mentioned like how are people, like, it's like kind of eating existing products, but at the same time we're starting to see like this ecosystem evolve where people are monetizing MCPs. Like a notable example of this is 21st dev magic MCP server where it injects some taste into this coding agent into the LLM where they have this library of beautiful components and they just inject relevant examples so that Klein can go in

Starting point is 00:18:57 and implement beautiful UIs. And the way they monetize that was like a standard API key. So we're starting to see developers really like take MCPs, build them in, have distribution platforms like the MCP marketplace. incline and monetize their whole business around that. So now it's almost like you're selling tools to agents, which is a really interesting topic. And you can do that because you're in VS code,

Starting point is 00:19:24 so you have the terminal, so you can do NPX run the different servers. Have you thought about doing remote MCP hosting, or do you feel like that's not something you should take over? Yeah, we haven't really hosted any ourselves. We think that's, we're looking into it. I think it's all very nascent right now. now, the remote MCPs, but we're definitely interested in supporting remote MCPs and

Starting point is 00:19:48 listing them on our marketplace. Another part, I think, what sort of local MCP servers and remote MCPs is most of the remote MCPs are only useful to connect to different APIs, but that's only a small use case for MCPs. A lot of MCPs help you connect to different applications on your computer. For example, there's like a Unity MCP server that helps you create, you know, 3D objects, and from within BS code, there's an Ableton MCP servers. You can make songs using something like Klein or whatever else uses MCPs. We won't see a world where these MCP servers are only hosted remotely. There will always be some mix of local MCP servers and remote MCP servers.

Starting point is 00:20:30 I think the remote MCP servers do make the installation process a little bit easier with something like an OAuth flow and just authenticating a little bit not as painful as having to manage API keys yourself. But for the most part, I think, think the MCP ecosystem is really in its earlier days. We're still trying to figure out this good balance of security, but also convenience for the end developer so that it's not a pain to have to set these things up. And I think we're still in this very much experimental phase about how useful it is to people. And I think now that it is seeing this level of market fit and people are coming out with these sorts of articles and workflows about how it's totally

Starting point is 00:21:08 changing their jobs, I think there's going to be a lot more of resources and efforts that go into the ecosystem and just building out the protocol, which I think there's a lot on Anthropics Roadmap. And I think the community in general just has a lot of ideas. And our marketplace in particular has given us insights into some ways that we could improve it, things that, you know, developers have asked from it that where we're kind of thinking about how do we, you know, what is the MSP marketplace of the future look like? And for us, that's, it's going to be a combination of, you know, well, there's a lot of our users are very security conscious. And there's a lot of ways that MCP servers can be pretty dangerous to use if you don't trust the end developer of these things.

Starting point is 00:21:50 And so we're trying to figure out, you know, what is a future look like where you have some level of confidence in the MCP servers you're installing? I think right now it's just, it's too early and there's a lot of trusting the community that I don't think a lot of, you know, enterprise developers or organizations are quite willing to do yet. So that's something that's top of mind for us. there's an interesting tension between the Anthropic and the community here. You basically kind of have a M-CP registry internally, right? Honestly, I think you should expose it. I was looking for it on your website and you don't have it. Like, the only way to access is to install client.

Starting point is 00:22:24 But there's others like Smithery and all the other guys, right? But then Anthropic has also said they'll launch a model registry at some point or MCP registry at some point. Some point. If Anthropic launched the official one, would they just win by the fault, right? because, like, would you just use them? I think so. I think the entire ecosystem will just converge around whatever they do. They just have such good distribution.

Starting point is 00:22:46 And they're, yeah, they came up with it. Yeah, exactly. Cool. And then I wanted to, I noticed that you had some, like, really downloaded MCPs. I was going by most installs. I'm just going to read it off. You can stop me any time to comment on them. So top of this file system MCP.

Starting point is 00:23:03 Makes sense. Browser tools from Agent Deskai. I don't know what that is. Sequential thinking. That one came out with the original MCP release. Context 7. I don't know that one. That's a big one.

Starting point is 00:23:13 What is it? Context 7 kind of helps you pull in documentation from anywhere, and it has, like, this big index of all of the popular libraries and documentation for them. Okay. And your agent can kind of submit, like, a natural language query and search for any documentation. It just says everyone's docs. Yes. And apparently Upstash did that, which is also unusual because Upstash is just.

Starting point is 00:23:37 normally Redis. Get tools. That one came out originally. Fetch. Browser use. Browser use, I imagine competes with browser tools, right? I guess. And then below that, playwright. So there's a lot of, like, let's automate the browser and let's do stuff. I assume for debugging. Firecrawl, Puppeteer, Figma. Here's one for your perplexity research. Is that yours? Well, yeah, I forked that one and listed it. But yeah, that's another very popular one where you can research anything. So people want to... People want to emulate the browser.

Starting point is 00:24:10 I'm just trying to learn lessons from what people are doing, right? They want to automate the browser. They want to access Git and file system. They want to access docs and search. Anything else that you think, like, is notable? There's all kinds of stuff where it's like, you know, there's like the Slack MCP

Starting point is 00:24:25 where you can send, you know, that's actually one workflow that I have set up where you can like automate repetitive tasks in Klein. So I tell a client, like, okay, pull down this PR, use the GH command line tool, which I already have installed, using the terminal to pull the PR, get the description of the PR, the discussion on it, and get the full diff as like a single command, non-interactive command, pull in all that context,

Starting point is 00:24:50 read the files around the diff, review it, ask a question, like, hey, do you want me to approve this or not with this comment? And if I say yes, approve it and then send a message in Slack to my team using the Slack MCP, for example. Oh, use it to write. Yes. I would only use it to read. Yeah, I know. It's, you know, people, like, I love it. You know, I love being able to just, like, send an automated message in Slack or whatever.

Starting point is 00:25:14 You can also, like, set it up, like, set up your workload however you want where it's like, okay, Klein, please ask me before doing anything, you know, just make sure you're asking me to, like, approve before you send a message or something like that. Yeah. Okay, just to close out MCP side, anything else interesting going on in MCP universe that we should talk about? MCP off was recently ratified? I think monetization is a big question right now for the MCP ecosystem. We've been talking a lot with Stripe. They're very bullish on MCP, and they're trying to figure out like a monetization layer for it. But it's all so early that it's kind of hard to really even envision where it's going to go.

Starting point is 00:25:55 Let me just put up a straw man and you can tell me what's wrong with it. Like, how is this different from API monetization, right? Like you sign up here, make an account, give you a token back, and then you use the token, they charge you against your usage. No, like, I think that's how it is right now. That's how like the magic MCP, the 24th Dev guys did it. But we're kind of envisioning a world where agents can pay themselves for these MCP tools that they're using and pay for each tool call. And you can't deal with like a million different API keys from different products and like signing up for all this. There needs to be like a unified kind of payment layer.

Starting point is 00:26:33 Some people talk about like stable coins, how like those are coming out now that agents can natively use those. Stripe is they're considering this like abstraction around the MCP protocol for payments. But like I said, it's kind of hard to really tell where it's going to, how that's going to manifest. I would say like we, I covered when they launched their agent toolkit last year a few months ago. It seemed like that was enough. Like it didn't seem to need stable coins except for the. fact that they take like 30 cents every transaction. Have you seen people use the X 402 thing by Coinbase to make?

Starting point is 00:27:09 It's basically like you can do a HGCP request that includes payment in it. What? Yeah, yeah. It's been around forever, the 402 error that's like payment not accepted or something, right? So yeah, we've seen some people talking about that, like more like natively building that in. But yeah, nothing. Yeah, no one's really doing that right now. Anything you're seeing on, like, are people, like, making MCP startups that are interesting?

Starting point is 00:27:39 Mostly around re-hosting local ones and do remote, and then basically do, instead of setting up 10 MCPs, you have, like, a canonical URL that you put in all of your tools and then expose all the tools from all the servers. Yeah. There's, like MCP that run, so on these tools. Yeah. But I think it kind of has the same issues of how do you incentivize people? to make better MCPs, you know, and will it be mostly first party or will it be third party? Yeah, exactly. Like your perplexity MCP was the foreto.

Starting point is 00:28:06 What was wrong with the perplexity one? With MCPs and installing them locally on your device, there's always a massive risk associated with that. And when an MCP is created by someone that we have no idea who they are, at any point, they might, you know, update the GitHub to, like, introduce some kind of malicious stuff. So even if you, like, verified it when you were listing it, you might change it. So I ended up having to fork a few of those to make sure that we lock that version down. Oh, okay. So this is just like you're just forking it so that you don't change it without, without an honest. Interesting.

Starting point is 00:28:40 These are all the problems of a registry, right? Like that you need to ensure security and all that. Cool. I'm happy to move on. I would say like the last thing that's kind of curious is like if Anthropic hasn't come come along and made MCP, what would have happened? What's the alternative history? Like, would you have come with MCP?

Starting point is 00:28:56 So we saw some of our competitors who have kind of working on their own version of plug-and-play tools into these agents. They kind of had to natively create these tools and integrations themselves directly into their product. And so I think anybody in the space would have had to just do the laborious work of having to recreate these tools and integrations for. So I think Anthropics just saved us all a lot of trouble and tapped into the power of open source and community-driven development. allowed individual contributors to make an MCP for anything people could think of, and really take advantage of people's imagination, and a way that I think is necessary right now for us to really tap into full potential of this sort of thing. We've had, I think, a dozen episodes with different coding products.

Starting point is 00:29:43 By the way, this episode came directly after he tweeted about Cloud the Code episode where they were sitting right where you're sitting. Thanks for sharing the time about rag. Can you give people maybe the main? matrix of the market of, you know, you have like fully agentic, no ID. You have agentic plus ID, which is kind of yours. You have IDE with some co-piloting. How should people think about the different tools and what you guys are best at or maybe what you don't think you're best at? I think what we're best at and like our ethos since the beginning is just meet the developers

Starting point is 00:30:15 where they're at today. I think there is a little bit of insight and handholding these models need right now. And the IDE is sort of the perfect conduit for something like that. You can see the edits it's making, you can see the commands that it's running, you can see the tools that it's calling. It gives you the perfect UX for you to have the level of insight and control and be able to course correct the way that you need to to work with limitations of these models today. But I think it's pretty obvious that as the models get better, you'll be doing less and less than less than that, less and less of that and more and more of the initial planning and prompting and sort of have the trust and confidence that, you know, the model will be able

Starting point is 00:30:52 to get the job done pretty much exactly how you want it to. I think there will always be a little bit of a gap in that these models will never be able to read our minds. So there will have to be a little bit of making sure that you give it the most comprehensive and sort of like all the details of what you want from it. So if you're a lazy prompter, you can expect a ton of friction and back and forth for you really get what you want. But I think we're all learning for ourselves as we work with these things, kind of the right way to prompt these things and to be explicit about what it is that we want and kind of how they hallucinate the gaps that they might need to fill to get to the end result and how we might want to avoid something like that. So what's interesting about CloudCode is

Starting point is 00:31:34 there isn't really a lot of insight into what the agent's doing. It kind of gives you this like checklist of what it's doing at holistically at a high level. I don't think that really would have worked well if the models weren't good enough to actually produce work that people were generally happy with. We're kind of there. And I think the space has to catch up to, okay, maybe people don't need as much insight into these sorts of things anymore. And they are okay with letting an agent kind of get the job done. And really all you need to see is sort of the end result and tweak it a little bit before it's really perfect. And I think there is going to be different tools for different jobs. I think something like totally autonomous agent that you don't

Starting point is 00:32:13 have a lot of insight into is great for maybe scaffolding new projects. But for, for kind of the serious, more complex sorts of things where, you know, you do need a certain level of insight or you do need to kind of have like more engagement. You might want to use something that does give you some more insight. So I think these sorts of tools complement each other. So for example, writing tests or spinning off 10 agents to try to fix the same bug, you know, might be useful for a tool that doesn't require too much engagement from you. Whereas something that requires a little bit more creativity or imagination or extracting content. from your brain requires a little bit more of insight into what the model's doing and a back and forth that I think client is little better super forward.

Starting point is 00:32:57 Visibility into what the agent is doing. That's like one axis. And then another is autonomy, like how automated it is. And we have a category of companies that are focusing more on the use case of people that don't even want to look at code, which is like, you know, the lovables, the replets, where it's like you go in, you build an app, you might not even be technical, and you're just happy with the result.

Starting point is 00:33:22 And then you have kind of stuff that's kind of like a hybrid where it's for engineers, it's built for engineers, but you don't really have a lot of visibility into what's going on in a hood. This is like for like the vibe coders where they're fully, you know, letting the AI take the wheel

Starting point is 00:33:38 and building stuff very rapidly. Lots of open source fans and, you know, people that are hobbyists enjoy, coding in this in this manner. It is really fun. And then you get to like serious engineering teams where they can't really give everything over to the AI, at least not yet. And they need to have high visibility into what's going on every step of the way and make sure that they actually understand what's happening with their code. You're kind of handing off your production code base to this not a terministic system and then hoping that you catch it in review if anything goes wrong.

Starting point is 00:34:15 Whereas personally, the way I use AI, the way I use Klein, is I like to be there every step of the way and kind of guide it in the right direction. So I know every step of the way, like as every file is being edited, I approve every single thing and make sure that things are going in the right direction. I have a good understanding as things are being developed where it's going. So like this kind of hybrid workflow really works for me personally. But, you know, sometimes if I want to go full yolo mode, I go ahead and just auto-approve everything and just. just step out for a cup of coffee and then come back and, you know, review the work. My issue with this as an engineer myself is that we all want to believe that we work on the complex things.

Starting point is 00:34:58 How have you guys seen the line of complex change over time? I mean, if we sat down having this discussion 12 months ago, complex was like much easier than today for the models. Do you feel like that's evolving quickly enough that like, you know, in 18 months, it's like you should probably just do phologic for like 75% of work, 80% of work? Or do you feel like it's not moving as quickly as you thought? I think what was complex a couple years ago

Starting point is 00:35:25 is totally different to what is complex today. Now I think what we need to be more intentional about are the architectural decisions we make really early on and how the model kind of builds on top of that. If you have kind of a clear direction of where things are headed and what you want, you kind of have a good idea to how you might want to lay the foundation for the code base that you're producing.

Starting point is 00:35:46 And I think what we might have considered complex a few years ago, algorithmic challenges, that's pretty trivial for models today and stuff that we don't really necessarily have to think too much about anymore. We kind of give it a certain expectation or unit tests about what we want, and it kind of goes off and puts together the perfect solution. So I think there's a lot more thought that has to go into tasteful architectural decisions that really comes down to you having experience with what works and what doesn't work, having a clear idea for the direction of where you want to take the project

Starting point is 00:36:17 and sort of your vision for the code base. Those are all decisions that I think is hard to rely on a model for because of its limited context and it's inability to kind of see your vision for things and really have a good understanding of what you're trying to accomplish without you, you know, putting together a massive prompt of, you know, everything that you want from it. I think what we were, you know, what we spent most of our time working on a couple years ago has totally changed and I think for the better. I think architectural decisions are a lot more fun to think about them, put into their algorithms. It kind of frees up the senior software engineers to think more architecturally. And then once they have a really good understanding of what's,

Starting point is 00:36:57 what the current state of the repository is, what the current state of the architecture is. And when they're introducing something new, they're really thinking at an architectural level. And they articulate that to Klein. And that's also, there's like some skill involved there. And some of that can be mitigated with like asking follow-up questions, being proactive about clarifying things on the agent side. But ultimately, you need to articulate this new architecture to the agent. And then the agent can go down and down into the minds and implement everything for you. And it is more fun working that way. Like personally, like I find it a lot more engagement and just think on a more architectural level. And for junior engineers, it's a really good paradigm to learn about the code base. It's

Starting point is 00:37:40 kind of like having a senior engineer in your back pocket where you're asking Klein like, hey, can you explain the repository for me if I wanted to implement something like this? What files would I look at? How does this work? It's great for that as well. If you're moving on from competition, I have one last question of competition. So there's Twitter beef with RU code. I just want to know what the backstory is. Because you tweeted yesterday, somebody asked RUCOD to add Gemini, CLA support, and then you guys responded, just copy it from us again. And they said, thank you.

Starting point is 00:38:11 We'll make sure to give credit. Is it a real beef? No, no. A friendly beef. I think we're all just having fun on the timeline. There's a lot of forks. It's like 6,000 forks. Yeah, there's like, if you search Klein in the VS Code marketplace, it's like the entire page,

Starting point is 00:38:28 just like forks of Klein. And there's like even forks of forks that, you know, came out and raised like a whole bunch of money. And it's, yeah, it's the top three apps on OpenRodder are all. Klein and then Klein fork, Klein fork. Yeah, it's funny. Yeah, billions of tokens getting sent through like all these forks. There's like there's like fork wars and 10,000 forks and all you need is a knife, you know.

Starting point is 00:38:53 No, it's exciting. I think they're all really cool people. We've got people in Europe forking us. We got people in China making like a little fork of us. I think Samsung recently came out with like a, was a Wall Street Journal article where they're using Klein, but they're using like their own little fork of flying. kind of isolated, you know, we encourage it. Do you have any regrets about being open source? Not at all. I think Klein started off as this like really good foundation for what a coding agent looks like, and people just had a lot of their own really interesting ideas and spin-offs and

Starting point is 00:39:25 concepts about, you know, what they thought, you know, that they wanted to build on top of it was. And just being able to see that and see the excitement around just in the space in general has just been, I think, inspirational and has helped us kind of glean insights into what works and what doesn't work and incorporate that into our own product. And for the most part, I think for the Samsung and all the organizations where there's a lot of friction and being able to use software like this on their code bases, it reduces that barrier to entry, which I think is incredibly important when you want to get your feet wet with this whole new, agentic coding paradigm that's going to completely upend the way that we've written software for decades. So in the grand scheme of things,

Starting point is 00:40:03 I think it's a net positive for the world and for the space. So no regrets. In a lot of ways, like, you know, it's us and the forks. We were kind of there originally when we were like the only ones with this like philosophy of keep it being simple, keeping things down to like the model, letting the model do everything, not cutting on, not trying to make money off of inference, going context heavy, reading files into context very aggressively. And kind of going back to Claude I was actually like, it was really nice to see that they came out and they validated our whole philosophy of like keeping things as simple as possible. And that kind of goes in with like the whole rag thing, which is like, rag was this early thing in like 2022. You started getting these

Starting point is 00:40:49 vector database companies. Context windows were very small. This was like a way of people called it like, oh, you can give your AI infinite memory. It's not really that, but that was like the marketing that was sold to the venture backers that were like investing in all these companies. And it became this narrative that really stuck around. And like even now, like we, we get like potential like, you know, enterprise perspective, like they're going through like the procurement process. And it's almost like they're going through like a checklist asking like, hey, do you guys do like indexing like of the code base and doing rag? And I'm like, well, why? Like, why are you like, why do you want to do this? I think Boris said it said it very well on this exact podcast where

Starting point is 00:41:32 we tried rag and it doesn't really work very well, especially for code. is like the way RAG works is you have to like chunk all these files across your entire repository and like chop them up in a small little pieces and then throw them into this hyperdimensional vector space and then pull out these random chugs when you're searching for relevant code snippets and it's like fundamentally it's like so schizo and like I think it actually distracts the model and you get worse performance than just doing what like a senior software engineer does when they first they're introduced to a new repository where it's like that you'll look at the folder structure, you look through the files, oh, this file imports from this other file, let's go take a look at that, and you kind of agentically explore the repository. That's like, we found that works so much better. And there's like similar things where it's like, like the simplicity always wins, like this bitter lesson where fast apply is another example. So cursor came out with this fast apply, like they called the instant apply back in July of

Starting point is 00:42:31 24, where the idea was models at the time were not very good at editing files. And the way editing files works in kind of the context of an agent is you have a search block and then a replace block where you have to match the search block exactly to what you're trying to replace. And then a replace block just swaps that out. And at the time, models were not very good. It was like, I forget like GPT they were using under the hood at the time wasn't very good at formulating these search blocks perfectly and it would fail oftentimes. So they came up with this clever work around. to fine-tune this fast-apply model where they let these frontier models at the time,

Starting point is 00:43:09 they let them be vague, they let them output those lazy code snippets that we're all very familiar with where it's like rest of the file here, like rest of the imports here, and then fed that into this fine-tuned fast-supply model that was like probably like a Quinn 7B or something quantized, very small, dinky little model.

Starting point is 00:43:28 And they fed this lazy code snippet into this smaller model and the smaller model we fine tuned to output the entire file with the code change is applied. And that, you know, one of the founders of Adder said this really well in like very early GitHub discussions

Starting point is 00:43:44 where he said like, well now instead of worrying about one model messing things up, now you have to worry about two models messing things up. And what's worse is the other model that you're giving, that you're handing your production code to, this like fast apply model, it's like, it's a

Starting point is 00:44:01 tiny model, its reasoning is not very good. It's maximum output tokens, you know, there might be 8,000 tokens, 16,000 tokens. Now they're training like 32,000 tokens, maybe. And a lot of the coding files, like we have a file in our repository that's like 42,000 tokens long. And that's longer than the maximum token output length of one of these smaller fast supply model. So what do you do then? Then you have to build workarounds around that. Then you have to build all this infrastructure to like pass things off. And then it's making mistakes. It's like very subtle. mistakes too where it's like it looks like it's working but it's not actually what the original frontier model suggested and it's like slightly different and it introduces like all of these subtle

Starting point is 00:44:41 bugs into your code and what we're starting to see is like as AI gets better the application layer is reducing you're not going to need all these clever workarounds you're not going to have to maintain these systems so it's really liberating to not be bogged down with rag or with fast apply and just focus on this core agentic loop and maximizing diff edit failures. Like in our own internal benchmarks, Claude Sonnet 4 recently hit a sub 5%, or like around actually 4%

Starting point is 00:45:12 diff edit failure rate. At the, like, when Fast Apply came out, that was way higher. That was like in the 20s and the 30s. Now we're down to 4%. Right. And in six months. How does it go to zero?

Starting point is 00:45:24 Well, it's going to zero. Like, as we speak, it's going to zero every day, you know? And I was actually talking, with the founders of some of these companies that do fast apply. They were trying to kind of work with us. Their whole bread and butter is fine-tuning these fast supply models and, you know, like relase and morph. And I had a like a very candid conversation with these guys where I was like, well, there's a window of time where fast supply was relevant. Curser started this window of time back in

Starting point is 00:45:49 July. How much time do you think we have left until they're no longer relevant? Do you think it's an infinite time window? They're like, no, it's definitely finite. Like this era of fast supply models, it's definitely coming to an end. And I was like, well, how long do you guys think? They were like maybe three months, maybe less. So I still think there's some cases where rag is useful. You know, if you have a lot of human readable documents, a large knowledge base of documents where you don't really care about like inherent logic within them.

Starting point is 00:46:18 Like, sure, index it, chunk it, do retrieval on it. Or fast applies. Like maybe if your organization, you're forced into using like a very small model, that's not very good at search and replace, like a deep seek or something, you know, maybe use a fast-apply model. I think RAG and fast apply were these just tools in a toolkit for when models weren't the greatest at large context or search and replace if editing. But now they are extra ingredients that could make things go wrong that you just don't need anymore. There was an interesting article from Cognition Labs about, you know, multi-agent orchestration. I'm getting right into it.

Starting point is 00:47:00 It's like you're on autopilot for us. That's cool. Yeah. It's a great article, by the way. Yeah, it's great article. They talked about how, you know, when you start working with different models, different agents, there's a lot that gets lost in the details. And, you know, the devil are in the details.

Starting point is 00:47:15 Those are the most important things. And making sure that it doesn't, you don't have the agents for, like, running in loops and running to the same issues again. And have sort of like all the right context. And so I think being close to the model, throwing all the context you need at it, not taking these cost-optimized approach to pulling in relevant contexts

Starting point is 00:47:35 to something like RAG or a cheaper model to apply edits to a file. I think ultimately, yes, it's more expensive, asking a model like Claude Sonet to do sort of all these sorts of things, to grep an entire code base and to fill up its entire context. But you kind of get what you pay for.

Starting point is 00:47:54 And I think that's been another benefit of being open source, is that our developers, they can peek under the commona. They can see where their requests are being sent, what prompts are going into these things. And that creates a certain level of trust where when they spend $10, $20, $100 a day, they know where their data is being sent,

Starting point is 00:48:12 when model is being sent to, what prompts are going into these things. And so they get comfortable with the idea of spending that much money, get the job done. It's like not making money off of inference. I think the incentives are so, they're so relevant in this discussion because, you know, if you're incentivized, you know, if you're charging, you know, $20 per month and you're trying to make money on that, you're going to be offloading all kinds of important work to smaller models or optimizing for costs with rag, like retrieval with rag, not reading the entire file, but maybe reading like a small snippet of it.

Starting point is 00:48:46 Whereas if you're not making money off inference and you're just going direct, you know, users can bring their own API keys, well, then all of a sudden you're not incentivized to cut down on. cost, you're actually incentivized just to build the best possible agent. We're starting to see this trend of the whole industry is moving in that direction, right? You're trying to see like everyone open up to pay as you go models or pay directly for inference. And I think that is the future. What's the client pricing business model? Right now, it's bring her an API key. Essentially just whatever pre-commitment you might have to whatever inference provider, whatever model you think works best for your type of work, you just plug in your Anthropic or OpenAI or OpenRouter, whatever it is, API key and decline, and it connects directly to whatever model you select.

Starting point is 00:49:37 And I think that level of transparency, that level of we're building the best product, we're not focused on sort of capturing margin on, you know, the price obfuscation and clever tricks and model orchestration to, you know, keep cost low for us and optimize for higher profits. I think that's put us in this unique position to really push these models to their full potential. And I think that's shown. I think that's you get what you pay for. Throw a task incline and it gets expensive. That's the cost of intelligence.

Starting point is 00:50:11 It's the cost of intelligence. So yeah, the business model right now is you get to choose kind of where it's open source. You can fork it. You can choose where your data gets. You can choose who you want to pay. A lot of organizations we've talked to get some. some, you know, a certain level of volume-based discounts with, with these providers. And so they can take advantage of that through client, which is helpful because client

Starting point is 00:50:31 you got pretty expensive. And, uh, yeah. Wait, so, I mean, I'm still not hearing how you make money. Like, you said you don't, huh? Why? Why make money? Yeah. Uh, because you have to pay your salaries. No, that's the, that's the, a lot of people ask us that and I always just throw the why at them, but it's, um, they just sound like the partyful guys. Partifle is like, the real answer is enterprise. So, that's the real answer is enterprise. Which we can say because you're, you know, we release this when you launch it. Yeah. Yeah.

Starting point is 00:50:58 So you want to talk about Enterprise? Yeah. I think being open source and bring an API key has given us a lot of easy adoption in these organizations where things like data privacy and control and security are top of mind. And it's hard to commit to sending their code and plain text to God knows what servers training their data to do, training their data on models that might, you know, output their IP to. random users, I think people are a lot more conscious about where their data is getting sent and what's being used to it. And so it's given us this opportunity to say, okay, nothing passes through our own servers. You have total control over the entire application where your data gets sent. And that's given organizations that, you know, we've been talking to over the course

Starting point is 00:51:45 of the last couple of months, this sort of like easy adoption. And I think this opportunity for us to work more closely with them and say, you know, what are all the things that we can do to help with adoption and the rest of your organization. Essentially, how can we pour gasoline on sort of the evangelism that, you know, people have for Klein in these organizations and spread the usage of agent decoding, I think, at an enterprise level. Well, yeah, what's crazy is, so we had, we open source Klein, people really liked it. Developers were using it within their organizations.

Starting point is 00:52:17 Their organizations were kind of like reluctantly okay with it because they saw, like, we're open source and we're not sending their data. anywhere. They could use their existing API keys. And then we launched, like, on our website, like a contact form for enterprise. Like, if you're interested in an enterprise offering hit us up. And we had no real enterprise product at the time. And it turned out, like, we just got this massive influx of big enterprises reaching out to us. And, you know, we had a Fortune 5 company come up to us and they were like, hey, we have hundreds of engineers using Klein within our organization. And this is a massive problem for us. This is like a fire that we need to put out because we have

Starting point is 00:52:58 no idea what API keys they're using, how much they're spending, where they're sending their data. Please just like let us give you money to make an enterprise product. So the product kind of just evolved out of that. Right. Right. I mean, it really just comes down to more of listening to our users. So right, after we put out this page, we just had a lot of demand. for sort of like the table stake enterprise features, the security guardrails and governance and insights that sort of like the admins in these organizations need to reliably use something like Klein. Yeah, we've gone a lot of people wanting us to sort of give them two things. Invoices just to help with like all the budgeting and spending the thousands of dollars. All the Europeans.

Starting point is 00:53:42 Yeah, just the other thing which I thought was a little bit surprising was some level of insight into the benefit that clients providing them. So it could be our SAGE or lines of code written because it allows these sort of like AI forward drivers for adopting these sorts of tools and organizations to take that as a proof point and go to the rest of their teams and say, this is how much clients helping me. You need to start adopting this so we can keep up with the rest of the industry. This is for like internal champions to prove their ROI.

Starting point is 00:54:10 Exactly. Okay. Use as sort of evidence for this, you know, to justify the spend. Yeah. but also to promote the product in these organizations. We can do this afterwards, but we'd like to talk to those and actually feature some of them, what they're saying to their bosses on the podcast so that we can get a sense.

Starting point is 00:54:27 Because oftentimes we hear, we only talk to founders and builders of the dev tool, but not the end consumer. And actually, we want to hear from them, right? Like about how they're thinking about it, what they need. Kind of cool. One thing I wanted to ask to double click on is the relationship between open router

Starting point is 00:54:44 and then like your enterprise offering, right? So my understanding is currently everything runs through open router. Not everything. So you can bring API keys to OpenAI, Anthropic, Bedrock. And then you have a direct connection there. The user has a direct connection there. But everything else would run through open router. And so basically the enterprise version of Klein would be you have your own open router

Starting point is 00:55:06 that you would provide visibility and control to that enterprise. Yeah, that's for like the self-hosted option, right? Like there's a lot of enterprises where they're okay with not self-hosting, but as long as they're using their own bedrock, API keys and stuff like that, whereas the ones that are really interested in like self-hosting or like that want to be able to manage their teams, there would be like this internal router going on. The curious thing here is like, what if, what does model cost just go to zero? Like Gemini code just comes out and it's like, yeah, guys, it's free.

Starting point is 00:55:41 Well, yeah, it would be great for us. Yeah, it'd be great for us. So our thesis is inference is not the business. You will just never make money on inference. Yeah. We want to give the end user total transparency into price, into, which I think is like incredibly important to, you know, even get comfortable with the idea of spending as much money as you do.

Starting point is 00:55:57 I think the price obfuscation in this space has given developers this reluctance to opt into usage based to plans. And we're seeing a lot of people kind of converge on this concept of, okay, maybe have like a base plan just to use the product. but sort of get out of the way of the inference and respect the end developer enough to give them the level of insight into, not just the cost, but the models being used

Starting point is 00:56:20 and give them more confidence in spending however much it takes to get the work done. I think you can use tricks like rag and faster plan, things like that to keep costs low, but for the most part, there's enough ROI on coding agents where people are willing to spend money to get the job done. And for a truly good coding agent,

Starting point is 00:56:43 the ROI is almost hard to even calculate because there's so many things that I would have never even bothered doing. But then now I have Klein and I could just like do this weird experiment or do the side project or you know fix this random bug that I would have never even thought about. So like how do you measure that? Yeah. One variant of this problem, we're about to move on to context engineering and memory and all the other stuff. One variant of this I wanted to touch on a little bit was just background agents and multi-agents. So the instantiations of this, now I would say our background agents is, it would be codex, for example, like spinning up, you know, one PR per minute or Devon or cognition. So would you ever go there?

Starting point is 00:57:26 That's one concrete question I can ask you. Like, would there be clined on the server, whatever? And then the other version is still on the laptop, but more sort of parallel agents. Like kind of the Camban is currently very hyped right now. people are making like combat interfaces for cursor and also for cloud code just anything like in the parallel or background side of things. So we're releasing a CLI version of Klein and using the CLI version of client, it's fully modular. So you can ask Klein to run the CLI to spin up more clients or you could run Klein in some kind of cloud process and a GitHub action, whatever you want. So the CLI is really the form factor for these kind of fully autonomous agents.

Starting point is 00:58:12 And it's also nice to be able to tap into an existing client, CLI running on your computer and be able to take over and steer it in the right direction. So that's also possible. But what do you think, sad? I don't think it's an either or I think all these different modalities complement each other really well. So the codex, the Devons, cursors background agent, I think they all sort of accomplish the same thing. they, if we were to come out with our own version of it, I'd say that it would be a foundation for how other developers could build on top of it. So Nick's older brother, Andre, he's sort of thinking 10 years ahead.

Starting point is 00:58:50 And it always kind of blows my mind a little bit about some of the ideas that he has about where the space is going. But we recently had a discussion about building this open source framework for coding agents for any sort of platform, building the SDK and the tool necessary to bring Klein to, you know, Chrome as an extension, to the CLI, to JetBrains, to Jupiter notebooks, to your smart car, whatever it is. But to build the... Your fridge. Exactly. To put to... Microwave, maybe.

Starting point is 00:59:21 Yeah, exactly. I mean, this is what we saw kind of like with, you know, the 6,000 forks, you know, top of Klein is we sort of like put together this foundation for how this community of developers, we sort of put together this foundation that this community of developers could, like, build on. top of and sort of take advantage of, you know, their experiments and imagination and their creativity about where the space is headed. And I think looking forward, building an open source foundation and the building blocks for how we bring something like Klein to things that go outside the scope of software development or, you know, VS code extension, I think that'll open up the door to things that, you know, ultimately complement each other really well, but it'll never be sort of this like either-or thing. I think background agents are good for certain kinds of work. And

Starting point is 01:00:04 parallel canband, multi-agents might be good for when you want to experiment and iterate on five different versions of how a landing page might look. And then something like a back and forth with a single agent like Klein works really well for when you want to pull context and put together a really complicated plan for a really complex task. And I think all these different tools will ultimately end up complementing each other and people will kind of develop a taste and an understanding for what works best for what kind of work. But I think just looking 10 years ahead, we at the very least, sort of be at the frontier of providing sort of the building blocks for what the next thing is after background agents or, you know, multi-agents. I was going to go into context engineering kind of

Starting point is 01:00:45 like topic du jour. I think that this is kind of similar-ish in a thread to Ragh and how Rags is a mind virus, which I love, by the way that, the way that you phrased it. Yeah, you have in your docs context management. You also have a section on memory bank, which is kind of cool. I think a lot of people are trying to figure out memory. Let's just start at the high level and then we're going to memory later. What, you know, what does context engineering mean to you? Context engineering mean to me. Meets prompt engineering. Yeah. Right. Like, I mean, so I think like there is a lot of art to like what goes in there. I think that really is like the 80, 20 of building a really good agent. It's like figuring out what goes into the context. And, you know, I think interplay between MCP and your system client, you know, recommend.

Starting point is 01:01:33 problems, I think is what is ultimately making a good agent. Yeah. I think context management is like one part of it is what you load in to context. The other part of it is how do you clean things up when you're reaching the context window, right? How do you curate that whole lifecycle from zero to maximum context window? And the way that I think about it is there's so many options on the table. And there's so many risks to misdirecting the agents or distracting the agents. There's ideas about, you know, rag or other kinds of forms of retrieval.

Starting point is 01:02:16 That's one idea. There's the agenic exploration. That's another idea that we found works much better. And it seems like the trend is generally for loading things into context. It's giving the model the tools that it can use to pull things into context, letting the model decide what exactly to pull into context. letting the model decide what exactly to pull into context, as well as some hints along the way, kind of like a map of what's going on, like ASTs, abstracts, abstracts, potentially what tabs they have open in VS code. That was actually in our internal kind of benchmarking that turned out

Starting point is 01:02:49 to work very, very well. It's almost like it's reading your mind when you have like a few tabs open. It stresses me out because like sometimes then I'm like, I have like unrelated tabs open and I have to go close them before I take off the thing. I wouldn't think too much about, especially when you're using Klein. Klein does a pretty good job of just navigating that. But I definitely, there are edge cases, right? There's edge cases for everything. And it's kind of like, okay, what's like the majority use case is like, you know,

Starting point is 01:03:13 when are you starting a brand new task and you don't have a single tab open that's relevant to it? Obviously, in the CLI, you might, you don't have that little indicator. So you have to think outside the box for that. So that's like for reading things into context. And then for context management, is when you're approaching the full capacity of the context window is how do you condense that? And we've played around with this kind of naive truncation very early on where we just like throw out the first half of the conversation. It's common.

Starting point is 01:03:47 And there's problems with that, obviously, because it's like kind of like you're halfway through a book and you're like you start reading halfway through. Right. You don't know anything that happened beforehand. And we like to think a lot about like narrative integrity is like every task incline is kind of like a story. It might be a boring story where it's like this lonely coding agent that's just, you know, determined to help you solve, you know, whatever it is. Like the child, like the big thing that the protagonist needs to overcome is like the resolution of the task. Right. But how do we maintain that narrative integrity where every step of the way the agent can kind of

Starting point is 01:04:23 predict the next token, which like predict the next part of the story to reach that conclusion? So we played around with things like cleaning up duplicate file reads. That works pretty well. But ultimately, this is another case where it's like, well, what if you just give the model, like what if you just ask the model, like what do you think belongs in context? Another form of this is summarization, which is like, hey, summarize all the relevant details. And then we'll swap that in. And that works really, really well.

Starting point is 01:04:50 Yeah. Double-clicking on the AST mentioned. That's very verbose. When do you use that? Right now it's a tool. The way that it works is when Klein is doing sort of the agentic exploration of trying to pull in relevant context, and it wants to sort of get an idea of what's going on in a certain directory, for example, there's a tool that lets it pull in all the sort of language from a directory.

Starting point is 01:05:15 So it could be the names of classes, the names of functions, and that gives it some idea of, okay, here's what's going on in this folder. And if it seems relevant to whatever the task is trying to accomplishes, then it's sort of like, zooms in and starts to actually read those entire files into context. So it's essentially a way to help it kind of figure out how to navigate through large code bases. Yeah. We've seen some companies working on, it's like an interesting idea. It's like an AST, but it's also a knowledge graph.

Starting point is 01:05:43 And you can run these discrete deterministic, almost like actions on this knowledge graph where you could say like, hey, find me all the functions that find me all the function of the code base and find me all the functions that aren't being used. used and delete all of them. And the agent can kind of reason in this almost like SQL like language working with this knowledge graph to do these kinds of global operations. Like right now, if you ask a coding agent to go through and remove all-unuse functions or do like some kind of large refactoring work, in some cases in my work, but very oftentimes it's just going to struggle a lot, burn a lot of tokens and fail ultimately. Whereas with these kinds of tools, it can actually

Starting point is 01:06:25 operate on the entire repository with these kinds of query, like short little query statements. I think there is a lot of potential in something like this, where it's like the next level beyond the AST, and it's like a language for querying this kind of knowledge graph. But like we've seen with like the Cloud 4 release is these frontier model shops, they tend to train on their own application layer. And you might come up with like a very clever tool that in theory would work really well, but then it doesn't work well with Claude 4 because Cloud 4 is trained to grab. Right. So that's another interesting phenomenon where it's like you're expecting these frontier models to become more generalized over time, but instead they're becoming more specialized

Starting point is 01:07:13 and you have to like support these different model families. Just to wrap on the memory side, memory is almost the artifact of summarization. So you summarize the context and then you kind of extract some Besides, any interesting learnings from there, like things that are maybe not as intuitive, especially for code. I think people grasp the, like, memory about humans, but, like, what are memories about code bases and things look like? I think memories right now, for the large part, are mostly useless. I think the kinds of memories that you might want the coding agent to hold on to are, you know, specific quirks about how, you know, your team works in the project or certain rules, like, only use. like Camel case, for example. It's better to place those sorts of things in like a general sort of like guideline or rules file, for example. But I found that this idea of like asking the agent, at least coding agents to like hold on to certain memories about the project or like how

Starting point is 01:08:09 you work or things like that are you mostly have to like force it to store those things into memory. And I don't think people, they don't want to have to think about those sorts of things. So it's something we're thinking about is how can we hold on to the tribal knowledge that these agents learn along the way that people aren't documenting or putting into rules files without the user having to go out of their way to sort of force them to store these things into a memory database, for example. Those were kind of like workspace rules or tribal knowledge, like general patterns that you use as a team. But then there is like in our, we ran this like internal experiment where we built this. to-do list tool where there's only one tool where you could just write the to-do and every time you could rewrite the to-do from scratch and we would passively as part of every like not every message but like every once in a while we would pass in this context of what the latest state of this to-do list is

Starting point is 01:09:11 and we found that that actually keeps the agent on track after multiple rounds of context summarization and compaction, and it could all of a sudden build an entire complex kind of task from scratch over 10x, the context window length. And in internal testing, this was like very, very promising. So we're trying to flesh that out. And I think something like that, we had earlier versions of the memory bank, which actually are like Nick, Nick Bowman, our marketing guy, came up with this memory bank concept where it was like, Klein rules where you would tell Klein like, hey, whenever you're working, have the scratch pad of what you're working on. And this is like a more built-in way of doing that. And I think

Starting point is 01:09:58 that also might be very, very, very helpful for the agents to just have like a little scratch pad, like, hey, what have I done so far? What's left? Specific app file mentions, like what kind of code we're working on, general context, and passing that off between sessions. Yeah. Any thought on CloudMD versus AgentsMD versus AgentMD. I built an open source tool called Agents 927, like the XKCD that just copy paste this across all the different file names. So all of them have access to it. Do you think there should be a single file? There's also like the ID rules versus the agent rules.

Starting point is 01:10:34 There's kind of like a lot of issues. I actually think it's fine that each of these different tools have their own specific instructions because I find myself using a curse rules and a client. Cline rules separately. When I want client the agent, I want him to work a certain way that's different than how I might want, you know, cursor to interact my codebase. So I think each tool is specific to the kind of work that I do and I have different instructions for how I want these things to operate. So I think, I've seen like a lot of people complain about it. And I get that it could make codebases look a little bit ugly. But for me, it's been like incredibly helpful for

Starting point is 01:11:07 them to be separated. I noticed that you said him. Does this client have a gender? Yeah. Okay. Does he have a whole backstory? Yeah. So Klein is a play on CLAI and editor. Because it used to be Claude Dev and now it's client. Yeah. I feel like Klein kind of stands out in the space for being a little more humanized

Starting point is 01:11:27 than something like, you know, a cursor agent or a co-pilot or a cascade. And I think there's Devin, which is a real name, you know. Claude is a real name, I guess. Yeah. Yes, I've been, I think we've all been intentional about just sort of humanizing it. at least in working with kind of gizzy more confidence in it and that I could like lean on it a little bit more. There's there's kind of a of a trust building with I think with an agent and the humanizing aspect of it I think has been helpful to me personally. And this goes back to like

Starting point is 01:11:57 the narrative integrity. It's just it's actually really important I think to anthropomorphize agents in general because everything they do is like a little story. And without having a distinct kind of identity, you get worse results. And when you're developing these agents, that's kind of how we need to think about them, right? We need to think that we're like crafting these stories. We're almost like Hollywood directors, right? We're putting all the right pieces in place for the story to unfold. And yeah, having an identity around that is really, really important. And Klein, you know, he's a cool little guy. He's just a chill guy. He's just a chill guy. He's helping us out. You know, he's always like happy to help or if you told him to not be happy he can be very grumpy you know

Starting point is 01:12:42 so that's great awesome uh i know you're hiring you are you have you're 20 people now you are aiming to 100 you have a beautiful new office what's your best pitch for working a client a lot of hiring right now is um so far it's been just friends of friends people in our network people that we've worked with before that we we've trusted and that we know can can show up for like this incredibly hard thing that we're working on. And there's a lot of challenges ahead. And I think the problem space is probably the most exciting thing to be working on right now. Engineers in general love working on things that make their own lives easier.

Starting point is 01:13:20 And so I couldn't imagine working on something more exciting than a coding agent. And, you know, it's a little biased. But I think a large part of it is it's an exciting problem space. We're looking for really motivated people that want to work on challenges, like figuring out kind of like what the next 10 years looks like. and building kind of the foundation for, you know, what comes next after background agents or multi-agents and really help in sort of defining how all this shapes up. We have this, like, really excited community of users and developers.

Starting point is 01:13:50 I think being open source has also created a lot of goodwill with us, where a lot of the feedback we get is, like, incredibly constructive and helpful in shaping our roadmap and the product that we're building. And working with a community like that is, like, one of the most fulfilling things ever. Right now, we're kind of in-between office. is, but doing things like go-karting and kayaking and things like that. So it's a lot of hard work, but we make sure to have fun along the way. So, yeah, no, like Klein is a, it's a unique company because it really does feel like we're all just like friends building something cool. And we work

Starting point is 01:14:25 really, really hard. And the space is, it's not just competitive. It's like hyper competitive. There's, like, capital is flowing into every single possible competitors. We have forks to forks, like I said, raising tens of millions of dollars. And we're growing very rapidly. We're at 20 people now. We're aiming to be at 100 people by the end of the year. And being open source, it has its own challenges. It's like people, we do all this research.

Starting point is 01:14:52 We do all this benchmarking work to make sure our diff editing algorithm is robust, the way we're working with these models to optimize for the lowest possible diff edit failures. And then we open source that. And then we post it on Twitter and someone's like, oh, thanks so much for open sourcing that. going to go and like raise much money with like our own product with it. But the way that I see it is like this is, you know, let them copy. We're the leaders in the space. We're kind of showing the way for the entire industry and being an engineer and building all this stuff is super exciting.

Starting point is 01:15:23 So working with all these people is just amazing. Okay. Awesome. Thank you guys for coming on. Yeah. Thank you. Thank you. This is so much fun.

Latent Space: The AI Engineer Podcast - Cline: the open source coding agent that doesn't cut costs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.